At the Society of Actuaries (SOA) inaugural Predictive Analytics Symposium last month, RGA experts took a leading role in exploring the latest trends, tactics, and technology fueling the predictive analytics revolution. “Featured in 11 different sessions, the breadth and depth of the topics that RGA covered were remarkable and showcased our team’s thought leadership in many areas,” said Peter Banthorpe, Senior Vice President and Head of Global Research and Data Analytics. “I am proud of the analytics team we have built at RGA and excited to see how we can help move our clients and the industry forward in this rapidly evolving space.”
RGA presentations at the event included:
1. Keynote Presentation:
Banthorpe joined a panel of industry peers to discuss predictive analytics from their unique perspective as company leaders in this area.
2. Building a Data Science Team
Banthorpe also presented on what it takes to build a data science team, including hiring and retaining talent and how to get a job in this new area. The session explored the need to bring different skillsets to the table, such as data engineers, data scientists, and data storytellers. It also examined how actuaries fit into the picture and how developing mutually beneficial relationships among actuaries and the data science team can significantly elevate an organization.
3. Using Machine Learning for Accelerated Underwriting
Dihui Lai, Senior Data Scientist, discussed how insurers are applying predictive analytics to accelerate underwriting through machine learning. The presentation defined the need for accelerated underwriting, outlined the machine learning process, and compared several options for model structures and their related performance. Lai emphasized the need to validate the results for any model by understanding the complex variable impact, applying diagnostic analysis, monitoring shifts in distribution in the applicant population, and comparing model decisions with human underwriting.
4. Languages of Predictive Analytics: A Tower of Babel?
In this session, Jeff Heaton, Lead Data Scientist, considered the pros and cons of the many programming languages available, from domain specific (DSL) options such as R and MATLAB/Octave to general purpose languages such as Julia and Python. His conclusion: certain computer languages are better suited for certain tasks within predictive analytics. Data scientists should develop a deep understanding of as many languages as possible in order to apply the right language to the right situation.
5. Dangers of Over-fitting in Predictive Analytics
Sometimes, simple mathematical equations outperform very complex ones in predictive analytics situations. Data Scientist Rosmery Cruz described how overfitting, the process of capitalizing on idiosyncratic characteristics of a given sample, can yield model results that don’t really exist in the actual population. She then walked through these steps to take to validate predictive models and minimize overfitting:
- Make research design decisions before analyzing the data
- Where applicable, use subject matter knowledge to inform data aggregation (i.e., age groups)
- Limit the exclusion of data
- Consider the limitations of your data (i.e., sample size) and build simpler models
- Apply validation techniques such as the test set method to measure out of sample performance directly
6. Building Blocks of Predictive Analytics
Richard Xu, Vice President and Actuary, Head of Data Science, led attendees through the basic foundations of predictive analytics, distributions and models to use, and pitfalls to avoid. He stressed the importance for insurance companies to bridge the gap between actuaries and data analysts. After all, actuaries already have a solid foundation in industry knowledge, data processing, and modeling, so the opportunity to extend that into new data analytics skills is one they need to take advantage of. Data analytics is here to stay and promises to fundamentally change the insurance industry; actuaries could and should lead this change.
7. TensorFlow Workshop
Heaton returned to present a technical, hands-on workshop on using TensorFlow, Google’s free deep learning neural network tool, in conjunction with Keras, an open source neural network library. Attendees were instructed to bring their laptops to learn by doing. The session also explored the anatomy of a neural network, the meaning of deep learning, and types of machine learning algorithms, among other related topics.
8. Decision trees, CARTs, Random Forests
In his second presentation, Lai guided attendees through tree algorithms, from the single tree CART model to ensemble models, including bagging, random forest, and gradient boosting machine (GBM). The session concluded with an accelerated underwriting model use case and the following guidelines for using tree algorithms:
- Random forest and GBM are both important players in the machine learning world
- It is critical to pick an algorithm that fits to the inherent structure of a dataset
- Use a simple model if possible
- Use ensemble (complex) models only if necessary
- Control model complexity to avoid overfitting (validation process is your best friend)
- Build intuition by analyzing the model: variable importance, marginal analysis, etc.
9. Tidy Data: The Offensive Line of Predictive Analytics
Like the offensive line of a football team, creating tidy data is not the most glamorous component of predictive analytics. However, it is the crucial starting point of any successful modeling project that protects you from being blindsided by a bad model later in your process. In this session, Kyle Nobbe, Assistant Actuary, Global Research and Data Analytics, co-presented on the foundations and best practices of tidy data through two case studies: 1) a traditional morbidity experience study using third-party Asian health data, and 2) an industry survey on contract holder surrenders from U.S. annuities.
10. Visualization: A Picture Speaks a Thousand Words
Brad Lipic, Vice President of Data Strategy, joined fellow SOA presenters in a visualization exercise in which a fictional life insurance company sought to make it quicker and less labor-intensive for new and existing customers to get a quote while maintaining privacy boundaries. Together, the room applied data visualization software in an effort to show how a predictive model could accurately classify risk using a more automated approach. The point: visualization can help companies better understand the predictive power of data in existing assessments, enabling them to significantly streamline the application process.
11. Data Privacy Issues
Big data has attracted the attention and ire of regulators and privacy advocates globally – from the European Commission to the US's FTC and state lawmakers. However, with only some laws in place, many gaps in the legislative landscape remain. In this presentation, Keith Carlson, Executive Director and Deputy Data Privacy Officer, Operational Risk, emphasized that this is a time to be proactive and strategic about compliance. Approaches such as privacy by design, privacy-enhancing technologies, self-regulation, and ethical standards will help the industry to maximize its analytics endeavors while ensuring their longevity even under future legal scrutiny.
If you would like to explore any of these topics further with RGA’s data analytics team, please contact us.