Data Solutions
  • Articles
  • May 2023
  • 5 minutes

Assessing Comorbidities Requires a Data-Driven, Multidisciplinary Approach: Part II

A woman using an ipad to assess comorbidities
In Brief

Part I of this two-part article series presents a simple but effective approach to understanding the relationship between two conditions. In Part II, RGA's Mike Cusumano explores more complex data analyses and explains why cross-functional collaboration remains the key to success. 

The more detailed the data analysis, the more important clear communication, and collaboration become. Actuaries and data analysts should consult domain experts, including underwriters and medical directors, to shape the analysis and help find alternative ways to assess the problem. For example, age is often a significant factor when exploring the relative severity of conditions and comorbidities. In RGA’s analysis of diabetes and coronary artery disease (CAD), for example, the adverse synergy outlined in Part I persists across all ages, but the intensity of the synergy varies. Younger ages exhibit greater excess mortality for those with comorbidity relative to those with just one of the conditions. 

Disease severity is another key factor. Determining the degree of severity requires risk assessment expertise to guide the analysis and interpret results. With diabetes and CAD, RGA underwriters and medical directors advised that the subset of CAD with a more severe manifestation, myocardial infarction, may have a different relationship with diabetes. Indeed, those with a myocardial infarction diagnosis proved to have a much more heightened adverse synergy than those with all other CAD diagnoses. 

Bringing together multiple pieces of data-based evidence can help clarify the risk profile. With diabetes, cross-referencing medical claims data with information from prescription histories can help indicate the severity or type of diabetes. Even something as simple as separating insulin users from non-insulin users can paint a clearer picture and influence the overall conclusion. RGA’s analysis found that those with a diabetic diagnosis and insulin use exhibit a more intense adverse synergy with CAD. Conversely, the synergistic relationship with CAD is almost non-existent when looking at those taking medications associated with diabetes other than insulin.

Figure 1: CAD and Diabetes by Diabetic Medication


Beyond Comorbidities

Leveraging multiple pieces of data-based evidence can help address a variety of complex cases. For example, RGA underwriters wanted to know: Does a chronic pain diagnosis alongside the presence of opioid prescriptions result in an adverse synergy that requires additional debits when combined (i.e., is 2 + 2 > 4)? At first glance, the answer was yes, with the combination exhibiting significant additional mortality beyond each condition’s independent contribution (see Figure 2).

Figure 2: Chronic Pain and Opioids

The team was curious whether the extent of the exposure to opioids made a difference. When examining the results in this light, the adverse synergy essentially disappears. Whether exposure to opioids is low, high, or somewhere in between, the extra mortality from the combination no longer presents (see Figure 3).


Figure 3: Chronic Pain and Opioids by Opioid Exposure


So what was driving the initial observation? It turns out, as one might expect, that chronic pain and opioid use are highly correlated. Those with chronic pain are much more likely than the average person to be prescribed opioids, and to be prescribed greater amounts over a longer period of time. Conversely, the opioids-only group is shifted to low opioid use (e.g., one or two short-term prescriptions are common). Such an imbalance at the aggregate level provides a misleading comparison, one which is not quite “apples to apples.” The interaction between chronic pain and opioids is less about a complex synergistic relationship than it is about a significant overlap between the two. This is a great example of Simpson’s Paradox (see article addendum below), and how the way the data is structured and aggregated can influence conclusions.

Join us as we celebrate a remarkable five-decade journey and look ahead.

Predictive Modeling to the Rescue?

The examples provided above use a relatively simple analytical approach. Predictive modeling and other more advanced multivariate analysis can provide additional benefits when exploring complex data and relationships. These approaches can more systematically control complexity, better isolate the studied condition(s), and remove other unwanted noise to help identify the true driver of mortality risk.

However, predictive models are not magic wands and are susceptible to the same challenges and pitfalls noted above. Predictive models are very eager to please, and they will usually find a way to fit to the data they are fed. Give them data and they will give an answer. While this may sound ideal, it also highlights that the way the data is presented to the model has a significant influence on the outcome. Again, domain experts such as underwriters and medical directors are vital in providing insights into the proper way to tackle the problem and define risk profiles. This is especially true for complex datasets such as prescription and medical claims histories. For example, a predictive model may provide a different conclusion if it is told to examine the relationship between diabetes and CAD, rather than the relationship between insulin-dependent diabetes and CAD.


Today’s data landscape provides an exciting opportunity to better understand complex mortality risks and relationships. The scale and flexibility of available data sources allow for tailored and customized analysis with the potential to vastly improve the accuracy of underwriting comorbidities and other complex situations.

However, this new data paradigm will not simply generate the “right” answers on its own. Discipline, collaboration, communication, and a diversity of expertise – along with a healthy dose of curiosity – are all necessary for success. One must be aware of common analytical pitfalls and be sure to include domain experts, such as underwriters and medical directors, early on and throughout the process. It is imperative to continually keep top of mind that the analysis and its findings are only useful if they are consistent with their interpretation by others and their ultimate real-world application.

Read Part I of this article series for an introduction to the value of collaboration in comorbidity data analysis. 

Something Else to Consider: Simpson's Paradox

When dealing with complex datasets and looking for meaningful relationships, it is important to keep an eye out for Simpson’s Paradox. The paradox is a statistical phenomenon that is often uncovered when a population is divided into subpopulations or an additional variable is considered. A previously observed association between two variables may disappear or reverse when digging a little further. The only way to address the paradox is to identify and account for confounding variables.

The paradox can surface in many areas, including epidemiology, analysis of school test scores, and studies of discrimination, where understanding the paradox is essential for drawing correct conclusions. My favorite examples come from the world of baseball, such as the example below.


In terms of batting average, Mike Lowell outperformed Jacoby Ellsbury over the aggregated two-year period (.304 to .293). But somehow, Ellsbury had a better average in each individual season. The difference-maker in this case is that Ellsbury only had 116 at bats in 2007, and therefore his combined average is more weighted to 2008, his much weaker season in terms of batting average.

More Like This...

Meet the Authors & Experts

Michael Cusumano
Mike Cusumano
Vice President and Actuary, U.S. Mortality Markets

Additional Resources

Note on the Data: The examples in this article series leveraged a well-established RGA database containing prescription medication and medical claims histories, tied to death information, for millions of individuals. The data was split into two segments: the first consisting of prescription histories for up to seven years and medical claims histories for up to four years, and the second consisting of the mortality experience of the group for the four years following the evaluation date. Taken together, the data covered 34 million person-years of exposure and more than 197,000 deaths. The expected mortality basis used the empirical (actual) mortality experience of the dataset, varying by gender, attained age, and calendar year.