However, training predictive models to accurately infer disease risks typically requires longitudinal information, which is often scarce or absent in many clinical and molecular datasets.
In a recent study published in Genome Research, PioneerCampus PI Paolo Casale at Helmholtz Munich, in collaboration with the Technical University of Munich and the Friedrich-Alexander-Universität, have now developed a novel framework called PRiMeR (Predictive Risk modelling using Mendelian Randomization), which can improve disease risk prediction without the need for longitudinal data. The authors show that the model can be trained by aligning the genetic effects on the predictions with those on the disease, thereby eliminating the need for longitudinal data for training.
"Our method leverages genetic data from healthy individuals to make accurate predictions about disease risk," says Paolo Casale. "By integrating genetic causal inference with machine learning, we offer a new perspective for disease prediction".
Unlike conventional methods, PRiMeR utilizes Mendelian randomization to train predictive models that indeed do not rely on longitudinal data. PRiMeR is the first to integrate methods that are typically used to assess the causality of risk factors on an outcome with ML to enable genetics-driven risk predictions. By basing disease-predicting models on a patient’s genetic background rather than longitudinal follow-up data, PRiMeR offers a unique perspective. In more detail, to train the model, PRiMeR leverages risk factors and genetic data from a healthy cohort, along with results from genome-wide association studies of diseases of interest.
The researchers validated PRiMeR to de novo predict the risk of developing diseases such as type 2 diabetes, Alzheimer's or Parkinson's disease. The results show that PRiMeR achieves remarkable accuracy and robustness that outperforms existing baseline models and strongly correlates with established clinical risks.
"PRiMeR has the potential to significantly advance biomarker discovery for personalized medicine," says Daniel Sens, PhD student in Casale’s Lab at Helmholtz Munich. "Our method is particularly useful in situations where longitudinal data is scarce, making it possible to identify biomarkers even in challenging settings."
Considering its long-term impact, PRiMeR shows great potential particularly for profiling diseases with relatively well-characterized genome-association datasets, including Alzheimer's and Parkinson's disease or bipolar disorder. Importantly, its application could further be extended and generalized towards underdiagnosed conditions such as ADHD, depression, or fatty liver disease, paving the way for better and earlier diagnosis and intervention. As such, PRiMeR enables biomarker discovery, even in cases where follow-up data are limited.