Principal Investigator Francesco Paolo Casale
Systems Genetics and Machine Learning

Our research interests lie in the development and application of machine learning and statistical tools to analyze large genetic cohorts with deep molecular and phenotypic data, with the ultimate goal to further our understanding of complex trait biology. We aim to address fundamental biomedical questions such as: Which are the molecular, cellular and organ-level traits associated with disease severity and progression? Which of these are likely to drive disease pathogenesis? How does the interplay of genetic and environmental factors affect these traits?

Our approach combines principles from machine learning, statistical inference and systems genetics, with a strong focus on model scalability, robustness and interpretability. Current major research areas include the development of scalable tools for genetic association studies, deep learning models for imaging genetics, and computational methods to study gene-environment interactions and disease subtypes.


Our Aim

Leverage scalable machine learning and statistical tools together with large system genetics datasets to further our understanding of human disease biology.


Dr. Francesco Paolo Casale

Principal Investigator, Systems Genetics and Machine Learning

Francesco Paolo Casale studied physics at the University of Naples Federico II, Italy. He received his PhD in statistical genetics at the University of Cambridge and the European Bioinformatics Institute in 2016, where he developed new computational methods for genetic association studies and contributed to landmark international projects such as the last phase of the 1000 Genomes Project and the Blueprint initiative. He conducted his postdoctoral studies at the Microsoft Research New England lab in Boston, working on deep generative models for imaging genetics and automated machine learning. In 2019, he joined insitro, a drug discovery and development company located in the bay area. There, he led the statistical genetics team, working at the intersection of human genetics, machine learning and functional genomics to enable target identification and characterization. Since January 2022, he is a Principal Investigator in Machine Learning in Biomedicine at the Helmholtz Munich Institute AI for Health.



Dr. Francesco Paolo Casale

Senior Staff Data Scientist
Insitro, South San Francisco, CA, USA

Senior Lead Sata Scientist
Insitro, South San Francisco, CA, USA

Postdoctoral Fellow
Microsoft Research New England, Cambridge; MA, USA

EMBL-European Bioinformatics Institute, Cambridge, UK 

PhD University of Cambridge & EMBL -EBI3
Statistical Genetics, Oliver Stegle

M.Sc. Università di Napoli Federico II
Physics of Complex Systems, Mario Nicodemi

B.Sc. M.Sc. Università di Napoli Federico II

Highly Recognized article in PloS Genetics Research Prize

Postdoctoral fellowship at Microsoft Research New England

EMBL studentship

MSc and BSc degrees with honors from Università di Napoli Federico II

EASL, the international liver congress, London, UK – Selected talk

Target Validation Conference, Heidelberg, Germany – Invited speake
AI for Good Summit, San Francisco, CA, USA – Invited speaker

ML4H workshop, NIPS, Montreal, Canada – Poster presentation
NeurIPS, Montreal, Canada – Poster presentation
ASHG, San Diego, CA, USA – Selected Poster talk
Big Data in Biotechnology and Biomedicine, Vejle, Denmark – Invited speaker

CMO-BIRS, Oaxaca, Mexico – Invited speaker
Microsoft Research New England, Cambridge, MA, USA – Oral presentation

Selected Publications

The germline genetic component of drug sensitivity in cancer cell lines

Menden MP*, Casale FP*, Stephan J, Bignell GR, Iorio F, McDermott U, Garnett MJ, Saez-Rodriguez J, Stegle O, Nat Commun. 2018 Aug 23;9(1):3385. doi: 10.1038/s41467-018-05811-3.

More Details

A linear mixed model approach to study multivariate gene-environment interactions

Moore R*, Casale FP*, Bonder MJ, Horta D, BIOS Consortium, Franke L, Barroso I, Stegle O. A , Nat Genet. 2019 Jan;51(1):180-186. doi: 10.1038/s41588-018-0271-0. Epub 2018 Nov 26.

More Details

Joint genetic analysis using variant sets reveals polygenic gene-context interactions

Casale FP, Horta D, Rakitsch B, Stegle O, PLoS Genet . 2017 Apr 20;13(4):e1006693. doi: 10.1371/journal.pgen.1006693. eCollection 2017 Apr.

More Details

Genetic drivers of epigenetic and transcriptional variation in human immune cells

Chen L*, Ge B*, Casale FP*, Vasquez L*, Kwan T, Garrido-Martín D, Watt S, Yan Y, Kundu K, Ecker S, Datta A et al., Cell. 2016 Nov 17;167(5):1398-1414.e24. doi: 10.1016/j.cell.2016.10.026.

More Details

Efficient set tests for the genetic analysis of correlated traits

Casale FP*, Rakitsch B*, Lippert C, Stegle O, Nat Methods. 2015 Aug;12(8):755-8. doi: 10.1038/nmeth.3439. Epub 2015 Jun 15.

More Details

Gaussian Process Prior Variational Autoencoders

Casale FP, Dalca A, Saglietti L, Listgarten J, Fusi N, Advances in Neural Information Processing Systems 2018 (pp. 10390-10401)

More Details

Contact us

HPC contact Jasnin


Helmholtz Pioneer Campus
Helmholtz Zentrum München
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)

Current address:

Biomedical Center (BMC) 
Ludwig-Maximilians-Universität München 
Room NC 03.010 
Großhaderner Strasse 9 
82152 Planegg-Martinsried