All Publications

2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2015

2026

Zhao, C. ; Hatzikotoulas, K. ; Balasubramanian, R. ; Bertone-Johnson, E. ; Cai, N. ; Huang, L. ; Huerta-Chagoya, A. ; Janiczek, M. ; Ma, C. ; Mandla, R. ; Paluch, A. ; Rayner, N.W. ; Southam, L. ; Sturgeon, S.R. ; Suzuki, K. ; Taylor, H.J. ; Vankim, N. ; Yin, X. ; Lee, C.H. ; Collins, F. ; Spracklen, C.N.

Associations of combined genetic and lifestyle risks with incident type 2 diabetes in the UK biobank.

Diabetes 75, 860-866 (2026)

UNLABELLED: Type 2 diabetes (T2D) results from the interplay of genetic susceptibility and an unhealthy lifestyle, but their combined effects are not well studied. We examined whether unhealthy modifiable behaviors were associated with similar increases in the risk of incident T2D in individuals with different levels of genetic risk. Among 332,251 UK Biobank participants without diabetes, we constructed a multiancestry genetic risk score (GRS) based on 783 T2D-associated variants, categorized into tertiles. Lifestyle was classified as healthy, intermediate, or unhealthy based on baseline self-reported smoking status, BMI, physical activity level, and diet quality. Cox proportional hazards regression models were used to generate adjusted hazard ratios (HRs) for T2D and associated 95% CIs. During follow-up (median 13.6 years), 13,128 (4.0%) participants developed T2D. GRS (P < 0.001) and lifestyle classification (P < 0.001) were independently associated with increased risk of T2D. Compared with a healthy lifestyle, an unhealthy lifestyle was associated with increased risk in all genetic risk strata, with adjusted HRs ranging from 7.11 to 16.33. High genetic risk and an unhealthy lifestyle were the most significant contributors to T2D development. Individuals at all levels of genetic risk can substantially mitigate their T2D risk through lifestyle modifications. ARTICLE HIGHLIGHTS: Both genetic susceptibility and an unhealthy lifestyle are known to be associated with elevated type 2 diabetes (T2D) risk. However, their combined effects on T2D risk are not well studied. In this large prospective cohort study of more than 332,000 individuals, unhealthy lifestyle factors were associated with risk of incident T2D within and across different levels of genetic risk. These findings suggest individuals at all levels of genetic risk can greatly mitigate their risk of T2D by adhering to a healthy lifestyle.

Diabetes

Wissenschaftlicher Artikel

Scientific Article

Dybdahl Krebs, M. ; Appadurai, V. ; Georgii Hellberg, K.L. ; Ohlsson, H. ; Steinbach, J. ; Pedersen, E. ; Werge, T. ; Sundquist, J. ; Sundquist, K. ; Border, R. ; Cai, N. ; Zaitlen, N. ; Dahl, A. ; Vilhjalmsson, B. ; Flint, J. ; Bacanu, S.A. ; Kendler, K.S. ; Schork, A.J.

The relationship between genotype- and phenotype-based estimates of genetic liability to psychiatric disorders, in practice and in theory.

Am. J. Hum. Genet. 113, 184-201 (2026)

Genetics as a science has roots in studying phenotypes of relatives, but molecular approaches facilitate direct measurements of genomic variation between individuals. Agricultural and human biomedical research are both emphasizing genotype-based instruments, such as polygenic scores, but unlike in agriculture, there is an emerging consensus that family variables act nearly independently of genotypes in models of human disease. However, there is insufficient theoretical treatment of these scores, especially guiding our understanding of how and why scores derived from different sources of data may combine. To advance our understanding of this phenomenon, we use 2,066,057 family records of 99,645 genotyped probands from the Integrative Psychiatric Research (iPSYCH)2015 case-cohort study to show that state-of-the-field genotype- and phenotype-based genetic instruments explain largely independent components of liability to psychiatric disorders. We support these empirical results with theoretical analysis and simulations to describe, in a human biomedical context, parameters affecting current and future performance of the two approaches, their expected interrelationships, and consistency of observed results with expectations under simple additive, polygenic liability models of disease. We conclude, at least for psychiatric disorders, that the low correlation between current phenotype- and genotype-based genetic instruments is caused by both being noisy measures of additive genetic liability. We expect they should remain complementary over the near future and therefore expect approaches integrating both sources of information to achieve more power for genetic inference.

American Journal of Human Genetics, The

Wissenschaftlicher Artikel

Scientific Article

2025

Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium (Cai, N.)

Trans-ancestry genome-wide study of depression identifies 697 associations implicating cell types and pharmacotherapies.

Cell 188, 640-652 (2025)

In a genome-wide association study (GWAS) meta-analysis of 688,808 individuals with major depression (MD) and 4,364,225 controls from 29 countries across diverse and admixed ancestries, we identify 697 associations at 635 loci, 293 of which are novel. Using fine-mapping and functional tools, we find 308 high-confidence gene associations and enrichment of postsynaptic density and receptor clustering. A neural cell-type enrichment analysis utilizing single-cell data implicates excitatory, inhibitory, and medium spiny neurons and the involvement of amygdala neurons in both mouse and human single-cell analyses. The associations are enriched for antidepressant targets and provide potential repurposing opportunities. Polygenic scores trained using European or multi-ancestry data predicted MD status across all ancestries, explaining up to 5.8% of MD liability variance in Europeans. These findings advance our global understanding of MD and reveal biological targets that may be used to target and develop pharmacotherapies addressing the unmet need for effective treatment.

Cell

Wissenschaftlicher Artikel

Scientific Article

Grotzinger, A.D. ; Werme, J. ; Peyrot, W.J. ; Frei, O. ; de Leeuw, C. ; Bicks, L.K. ; Guo, Q. ; Margolis, M.P. ; Coombes, B.J. ; Batzler, A. ; Pazdernik, V. ; Biernacka, J.M. ; Andreassen, O.A. ; Anttila, V. ; Børglum, A.D. ; Breen, G. ; Cai, N. ; Demontis, D. ; Edenberg, H.J. ; Faraone, S.V. ; Franke, B. ; Gandal, M.J. ; Gelernter, J. ; Hatoum, A.S. ; Hettema, J.M. ; Johnson, E.C. ; Jonas, K.G. ; Knowles, J.A. ; Koenen, K.C. ; Maihofer, A.X. ; Mallard, T.T. ; Mattheisen, M. ; Mitchell, K.S. ; Neale, B.M. ; Nievergelt, C.M. ; Nurnberger, J.I. ; O'Connell, K.S. ; Peterson, R.E. ; Robinson, E.B. ; Sanchez-Roige, S.S. ; Santangelo, S.L. ; Scharf, J.M. ; Stefansson, H. ; Stefansson, K. ; Stein, M.B. ; Strom, N.I. ; Thornton, L.M. ; Tucker-Drob, E.M. ; Verhulst, B. ; Waldman, I.D. ; Walters, G.B. ; Wray, N.R. ; Yu, D. ; Lee, P.H. ; Kendler, K.S. ; Smoller, J.W.

Mapping the genetic landscape across 14 psychiatric disorders.

Nature 649, 406–415 (2025)

Psychiatric disorders display high levels of comorbidity and genetic overlap1,2, challenging current diagnostic boundaries. For disorders for which diagnostic separation has been most debated, such as schizophrenia and bipolar disorder3, genomic methods have revealed that the majority of genetic signal is shared4. While over a hundred pleiotropic loci have been identified by recent cross-disorder analyses5, the full scope of shared and disorder-specific genetic influences remains poorly defined. Here we addressed this gap by triangulating across a suite of cutting-edge statistical and functional genomic analyses applied to 14 childhood- and adult-onset psychiatric disorders (1,056,201 cases). Using genetic association data from common variants, we identified and characterized five underlying genomic factors that explained the majority of the genetic variance of the individual disorders (around 66% on average) and were associated with 238 pleiotropic loci. The two factors defined by (1) Schizophrenia and bipolar disorders (SB factor); and (2) major depression, PTSD and anxiety (Internalizing factor) showed high levels of polygenic overlap6 and local genetic correlation and very few disorder-specific loci. The genetic signal shared across all 14 disorders was enriched for broad biological processes (for example, transcriptional regulation), while more specific pathways were shared at the level of the individual factors. The shared genetic signal across the SB factor was substantially enriched in genes expressed in excitatory neurons, whereas the Internalizing factor was associated with oligodendrocyte biology. These observations may inform a more neurobiologically valid psychiatric nosology and implicate targets for therapeutic development designed to treat commonly occurring comorbid presentations.

Nature

Wissenschaftlicher Artikel

Scientific Article

Nappi, A. ; Shilova, L. ; Karaletsos, T. ; Cai, N. ; Casale, F.P.

BayesRVAT enhances rare-variant association testing through Bayesian aggregation of functional annotations.

Genome Res. 35, 2682-2690 (2025)

Gene-level rare variant association tests (RVATs) are essential for uncovering disease mechanisms and identifying therapeutic targets. Advances in sequence-based machine learning have generated diverse variant pathogenicity scores, creating opportunities to improve RVATs. However, existing methods often rely on rigid models or single annotations, limiting their ability to leverage these advances. We introduce BayesRVAT, a Bayesian rare variant association test that jointly models multiple annotations. By specifying priors on annotation effects and estimating genetrait-specific posterior burden scores, BayesRVAT flexibly captures diverse rare-variant architectures. In simulations, BayesRVAT improves power while maintaining calibration. In UK Biobank analyses, it detects 10.2% more blood-trait associations and reveals novel genedisease links, including PRPH2 with retinal disease. Integrating BayesRVAT within omnibus frameworks further increases discoveries, demonstrating that flexible annotation modeling captures complementary signals beyond existing burden and variance-component tests.

Genome Research

Wissenschaftlicher Artikel

Scientific Article

Davis, K.A.S. ; Coleman, J.R.I. ; Adams, M. ; Breen, G. ; Cai, N. ; Davies, H.L. ; Davies, K.J.A. ; Dregan, A. ; Eley, T.C. ; Fox, E. ; Holliday, J. ; Hübel, C. ; John, A. ; Kassam, A.S. ; Kempton, M.J. ; Lee, W. ; Li, D. ; Maina, J.G. ; McCabe, R. ; McIntosh, A.M. ; Oram, S. ; Richards, M. ; Skelton, M. ; Starkey, F. ; Ter Kuile, A.R. ; Thornton, L.M. ; Wang, R. ; Yu, Z. ; Zvrskovec, J. ; Hotopf, M.

The UK Biobank mental health enhancement 2022: Methods and results.

PLoS ONE 20:e0324189 (2025)

BACKGROUND: This paper introduces the UK Biobank (UKB) second mental health questionnaire (MHQ2), describes its design, the respondents and some notable findings. UKB is a large cohort study with over 500,000 volunteer participants aged 40-69 years when recruited in 2006-2010. It is an important resource of extensive health, genetic and biomarker data. Enhancements to UKB enrich the data available. MHQ2 is an enhancement designed to enable and facilitate research with psychosocial and mental health aspects. METHODS: UKB sent participants a link to MHQ2 by email in October-November 2022. The MHQ2 was designed by a multi-institutional consortium to build on MHQ1. It characterises lifetime depression further, adds data on panic disorder and eating disorders, repeats 'current' mental health measures and updates information about social circumstances. It includes established measures, such as the PHQ-9 for current depression and CIDI-SF for lifetime panic, as well as bespoke questions. Algorithms and R code were developed to facilitate analysis. RESULTS: At the time of analysis, MHQ2 results were available for 169,253 UKB participants, of whom 111,275 had also completed the earlier MHQ1. Characteristics of respondents and the whole UKB cohort are compared. The major phenotypes are lifetime: depression (18%); panic disorder (4.0%); a specific eating disorder (2.8%); and bipolar affective disorder I (0.4%). All mental disorders are found less with older age and also seem to be related to selected social factors. In those participants who answered both MHQ1 (2016) and MHQ2 (2022), current mental health measure showed that fewer respondents have harmful alcohol use than in 2016 (relative risk 0.84), but current depression (RR 1.07) and anxiety (RR 0.98) have not fallen, as might have been expected given the relationship with age. We also compare lifetime concepts for test-retest reliability. CONCLUSIONS: There are some drawbacks to UKB due to its lack of population representativeness, but where the research question does not depend on this, it offers exceptional resources that any researcher can apply to access. This paper has just scratched the surface of the results from MHQ2 and how this can be combined with other tranches of UKB data, but we predict it will enable many future discoveries about mental health and health in general.

PLoS ONE

Wissenschaftlicher Artikel

Scientific Article

Nappi, A. ; Cai, N. ; Casale, F.P.

Bayesian aggregation of multiple annotations enhances rare variant association testing.

In: (Research in Computational Molecular Biology). 2025. 428-431 (Lect. Notes Comput. Sc. ; 15647 LNBI)

Gene-based rare variant association tests (RVATs) are essential for uncovering disease mechanisms and identifying candidate drug targets, yet existing frameworks lack flexibility in integrating multiple variant annotations. Here, we introduce BayesRVAT, a Bayesian framework for RVAT which models variant effects using priors informed by multiple annotations. We show that BayesRVAT outperforms state-of-the-art burden test strategies in both simulations and an analysis of 12 blood traits from the UK Biobank.

Lecture Notes in Computer Science

Zuber, V. ; Cronjé, T. ; Cai, N. ; Gill, D. ; Bottolo, L.

Bayesian causal graphical model for joint Mendelian randomization analysis of multiple exposures and outcomes.

Am. J. Hum. Genet. 112, 1173-1198 (2025)

Current Mendelian randomization (MR) methods do not reflect complex relationships among multiple exposures and outcomes as is typical for real-life applications. We introduce MrDAG, a Bayesian causal graphical model for summary-level MR analysis to detect dependency relations within the exposures, the outcomes, and between them to improve causal effects estimation. MrDAG combines three causal inference strategies. It uses genetic variation as instrumental variables to account for unobserved confounders. It performs structure learning to detect and orientate the direction of the dependencies within the exposures and the outcomes. Finally, interventional calculus is employed to derive principled causal effect estimates. In MrDAG the directionality of the causal effects between the exposures and the outcomes is assumed known, i.e., the exposures can only be potential causes of the outcomes, and no reverse causation is allowed. In the simulation study, MrDAG outperforms recently proposed one-outcome-at-a-time and multi-response multi-variable Bayesian MR methods as well as causal graphical models under the constraint on edges' orientation from the exposures to the outcomes. MrDAG was motivated to unravel how lifestyle and behavioral exposures impact mental health. It highlights first, education and second, smoking as effective points of intervention given their important downstream effects on mental health. It also enables the identification of a novel path between smoking and the genetic liability to schizophrenia and cognition, demonstrating the complex pathways toward mental health. These insights would have been impossible to delineate without modeling the paths between multiple exposures and outcomes at once.

American Journal of Human Genetics, The

Wissenschaftlicher Artikel

Scientific Article

Han, S. ; Yu, S. ; Shi, M. ; Harada, M. ; Ge, J. ; Lin, J. ; Prehn, C. ; Petrera, A. ; Li, Y. ; Sam, F. ; Matullo, G. ; Adamski, J. ; Suhre, K. ; Gieger, C. ; Hauck, S.M. ; Herder, C. ; Roden, M. ; Casale, F.P. ; Cai, N. ; Peters, A. ; Wang-Sattler, R.

LEOPARD: Missing view completion for multi-timepoint omics data via representation disentanglement and temporal knowledge transfer.

Nat. Commun. 16:3278 (2025)

Longitudinal multi-view omics data offer unique insights into the temporal dynamics of individual-level physiology, which provides opportunities to advance personalized healthcare. However, the common occurrence of incomplete views makes extrapolation tasks difficult, and there is a lack of tailored methods for this critical issue. Here, we introduce LEOPARD, an innovative approach specifically designed to complete missing views in multi-timepoint omics data. By disentangling longitudinal omics data into content and temporal representations, LEOPARD transfers the temporal knowledge to the omics-specific content, thereby completing missing views. The effectiveness of LEOPARD is validated on four real-world omics datasets constructed with data from the MGH COVID study and the KORA cohort, spanning periods from 3 days to 14 years. Compared to conventional imputation methods, such as missForest, PMM, GLMM, and cGAN, LEOPARD yields the most robust results across the benchmark datasets. LEOPARD-imputed data also achieve the highest agreement with observed data in our analyses for age-associated metabolites detection, estimated glomerular filtration rate-associated proteins identification, and chronic kidney disease prediction. Our work takes the first step toward a generalized treatment of missing views in longitudinal omics data, enabling comprehensive exploration of temporal dynamics and providing valuable insights into personalized healthcare.

Nature Communications

Wissenschaftlicher Artikel

Scientific Article

Cai, N. ; Verhulst, B. ; Andreassen, O.A. ; Buitelaar, J.K. ; Edenberg, H.J. ; Hettema, J.M. ; Gandal, M. ; Grotzinger, A. ; Jonas, K. ; Lee, P. ; Mallard, T.T. ; Mattheisen, M. ; Neale, M.C. ; Nurnberger, J.I. ; Peyrot, W.J. ; Tucker-Drob, E.M. ; Smoller, J.W. ; Kendler, K.S.

Correction: Assessment and ascertainment in psychiatric molecular genetics: challenges and opportunities for cross-disorder research.

Mol. Psychiatry 30:1715 (2025)

Correction to: Molecular Psychiatryhttps://doi.org/10.1038/s41380-024-02878-x, published online 27 December 2024 In this article the author’s name Wouter J. Peyrot was incorrectly written as Wouter Peyrout. The original article has been corrected.

Molecular Psychiatry

2024

Cai, N. ; Verhulst, B. ; Andreassen, O.A. ; Buitelaar, J.K. ; Edenberg, H.J. ; Hettema, J.M. ; Gandal, M. ; Grotzinger, A. ; Jonas, K. ; Lee, P. ; Mallard, T.T. ; Mattheisen, M. ; Neale, M.C. ; Nurnberger, J.I. ; Peyrout, W. ; Tucker-Drob, E.M. ; Smoller, J.W. ; Kendler, K.S.

Assessment and ascertainment in psychiatric molecular genetics: challenges and opportunities for cross-disorder research.

Mol. Psychiatry, DOI: 10.1038/s41380-024-02878-x (2024)

Psychiatric disorders are highly comorbid, heritable, and genetically correlated [1-4]. The primary objective of cross-disorder psychiatric genetics research is to identify and characterize both the shared genetic factors that contribute to convergent disease etiologies and the unique genetic factors that distinguish between disorders [4, 5]. This information can illuminate the biological mechanisms underlying comorbid presentations of psychopathology, improve nosology and prediction of illness risk and trajectories, and aid the development of more effective and targeted interventions. In this review we discuss how estimates of comorbidity and identification of shared genetic loci between disorders can be influenced by how disorders are measured (phenotypic assessment) and the inclusion or exclusion criteria in individual genetic studies (sample ascertainment). Specifically, the depth of measurement, source of diagnosis, and time frame of disease trajectory have major implications for the clinical validity of the assessed phenotypes. Further, biases introduced in the ascertainment of both cases and controls can inflate or reduce estimates of genetic correlations. The impact of these design choices may have important implications for large meta-analyses of cohorts from diverse populations that use different forms of assessment and inclusion criteria, and subsequent cross-disorder analyses thereof. We review how assessment and ascertainment affect genetic findings in both univariate and multivariate analyses and conclude with recommendations for addressing them in future research.

Molecular Psychiatry

Review

Sadowski, M. ; Thompson, M. ; Mefford, J. ; Haldar, T. ; Oni-Orisan, A. ; Border, R. ; Pazokitoroudi, A. ; Cai, N. ; Ayroles, J.F. ; Sankararaman, S. ; Dahl, A.W. ; Zaitlen, N.

Characterizing the genetic architecture of drug response using gene-context interaction methods.

Cell Genom. 4:100722 (2024)

Identifying factors that affect treatment response is a central objective of clinical research, yet the role of common genetic variation remains largely unknown. Here, we develop a framework to study the genetic architecture of response to commonly prescribed drugs in large biobanks. We quantify treatment response heritability for statins, metformin, warfarin, and methotrexate in the UK Biobank. We find that genetic variation modifies the primary effect of statins on LDL cholesterol (9% heritable) as well as their side effects on hemoglobin A1c and blood glucose (10% and 11% heritable, respectively). We identify dozens of genes that modify drug response, which we replicate in a retrospective pharmacogenomic study. Finally, we find that polygenic score (PGS) accuracy varies up to 2-fold depending on treatment status, showing that standard PGSs are likely to underperform in clinical contexts.

Cell Genomics

Wissenschaftlicher Artikel

Scientific Article

Dybdahl Krebs, M. ; Georgii Hellberg, K.L. ; Lundberg, M. ; Appadurai, V. ; Ohlsson, H. ; Pedersen, E.S.L. ; Steinbach, J. ; Matthews, J. ; Border, R. ; LaBianca, S. ; Calle, X. ; Meijsen, J.J. ; Ingason, A. ; Buil, A. ; Vilhjálmsson, B.J. ; Flint, J. ; Bacanu, S.A. ; Cai, N. ; Dahl, A. ; Zaitlen, N. ; Werge, T. ; Kendler, K.S. ; Schork, A.J.

Genetic liability estimated from large-scale family data improves genetic prediction, risk score profiling, and gene mapping for major depression.

Am. J. Hum. Genet., DOI: 10.1016/j.ajhg.2024.09.009 (2024)

Large biobank samples provide an opportunity to integrate broad phenotyping, familial records, and molecular genetics data to study complex traits and diseases. We introduce Pearson-Aitken Family Genetic Risk Scores (PA-FGRS), a method for estimating disease liability from patterns of diagnoses in extended, age-censored genealogical records. We then apply the method to study a paradigmatic complex disorder, major depressive disorder (MDD), using the iPSYCH2015 case-cohort study of 30,949 MDD cases, 39,655 random population controls, and more than 2 million relatives. We show that combining PA-FGRS liabilities estimated from family records with molecular genotypes of probands improves three lines of inquiry. Incorporating PA-FGRS liabilities improves classification of MDD over and above polygenic scores, identifies robust genetic contributions to clinical heterogeneity in MDD associated with comorbidity, recurrence, and severity and can improve the power of genome-wide association studies. Our method is flexible and easy to use, and our study approaches are generalizable to other datasets and other complex traits and diseases.

American Journal of Human Genetics, The

Wissenschaftlicher Artikel

Scientific Article

Meng, X. ; Navoly, G. ; Giannakopoulou, O. ; Levey, D.F. ; Koller, D. ; Pathak, G.A. ; Koen, N. ; Lin, K. ; Adams, M.J. ; Rentería, M.E. ; Feng, Y. ; Gaziano, J.M. ; Stein, D.J. ; Zar, H.J. ; Campbell, M.L. ; van Heel, D.A. ; Trivedi, B. ; Finer, S. ; McQuillin, A. ; Bass, N.J. ; Chundru, V.K. ; Martin, H.C. ; Huang, Q.Q. ; Valkovskaya, M. ; Chu, C.Y. ; Kanjira, S. ; Kuo, P.H. ; Chen, H.C. ; Tsai, S.J. ; Liu, Y.L. ; Kendler, K.S. ; Peterson, R.E. ; Cai, N. ; Fang, Y.J. ; Sen, S. ; Scott, L.J. ; Burmeister, M. ; Loos, R.J.F. ; Preuss, M.H. ; Actkins, K.V. ; Davis, L.K. ; Uddin, M.N. ; Wani, A.H. ; Wildman, D.E. ; Aiello, A.E. ; Ursano, R.J. ; Kessler, R.C. ; Kanai, M. ; Okada, Y. ; Sakaue, S. ; Rabinowitz, J.A. ; Maher, B.S. ; Uhl, G. ; Eaton, W. ; Cruz-Fuentes, C.S. ; Martinez-Levy, G.A. ; Campos, A.I. ; Millwood, I.Y. ; Chen, Z. ; Li, L. ; Wassertheil-Smoller, S. ; Jiang, Y. ; Tian, C. ; Martin, N.G. ; Mitchell, B.L. ; Byrne, E.M. ; Awasthi, S. ; Coleman, J.R.I. ; Ripke, S. ; Sofer, T. ; Walters, R.G. ; McIntosh, A.M. ; Polimanti, R. ; Dunn, E.C. ; Stein, M.B. ; Gelernter, J. ; Lewis, C.M. ; Kuchenbaecker, K.

Multi-ancestry genome-wide association study of major depression aids locus discovery, fine mapping, gene prioritization and causal inference.

Nat. Genet. 56, 222-233 (2024)

Most genome-wide association studies (GWAS) of major depression (MD) have been conducted in samples of European ancestry. Here we report a multi-ancestry GWAS of MD, adding data from 21 cohorts with 88,316 MD cases and 902,757 controls to previously reported data. This analysis used a range of measures to define MD and included samples of African (36% of effective sample size), East Asian (26%) and South Asian (6%) ancestry and Hispanic/Latin American participants (32%). The multi-ancestry GWAS identified 53 significantly associated novel loci. For loci from GWAS in European ancestry samples, fewer than expected were transferable to other ancestry groups. Fine mapping benefited from additional sample diversity. A transcriptome-wide association study identified 205 significantly associated novel genes. These findings suggest that, for MD, increasing ancestral and global diversity in genetic studies may be particularly important to ensure discovery of core genes and inform about transferability of findings.

Nature Genetics

Wissenschaftlicher Artikel

Scientific Article

2023

Huang, L. ; Tang, S. ; Rietkerk, J. ; Appadurai, V. ; Krebs, M.D. ; Schork, A.J. ; Werge, T. ; Zuber, V. ; Kendler, K. ; Cai, N.

Polygenic analyses show important differences between MDD symptoms collected using PHQ9 and CIDI-SF.

Biol. Psychiatry 95, 1110-1121 (2023)

BACKGROUND: Symptoms of Major Depressive Disorder (MDD) are commonly assessed using self-rating instruments like the Patient Health Questionnaire 9 (PHQ9, current symptoms), and the Composite International Diagnostic Interview Short-Form (CIDI-SF, worst-episode symptoms). We perform a systematic comparison between them for their genetics and utility in investigating MDD heterogeneity. METHODS: Using data from the UKBiobank (N = 41948 - 109417), we assess the SNP heritability (h2) and genetic correlation (rG) of both sets of MDD symptoms. We further compare their rG with non-MDD traits, and use Mendelian Randomization (MR) to assess if either set of symptoms have more genetic sharing with non-MDD traits. We further assess how specific each set of symptoms is to MDD, using the metric PRS Pleiotropy. Finally, we use Genomic SEM to identify factors explaining the genetic covariance between each set of symptoms. RESULTS: Corresponding symptoms endorsed through PHQ9 and CIDI-SF have low to moderate genetic correlations (rG=0.43-0.87), and this cannot be fully attributed to different severity thresholds or the use of a skip-structure in CIDI-SF. Both MR and PRS Pleiotropy analyses show that PHQ9 symptoms are more associated with traits which reflect general dysphoria, while the skip-structure in CIDI-SF allows for the identification of heterogeneity among likely MDD cases. Finally, the two sets of symptoms show different factor structures in Genomic SEM, reflective of their genetic differences. CONCLUSIONS: MDD symptoms assessed via the PHQ9 and CIDI-SF are not interchangeable: the former better indexes general dysphoria, while the latter is more informative of within-MDD heterogeneity.

Biological Psychiatry

Wissenschaftlicher Artikel

Scientific Article

An, U. ; Pazokitoroudi, A. ; Alvarez, M. ; Huang, L. ; Bacanu, S.A. ; Schork, A.J. ; Kendler, K. ; Pajukanta, P. ; Flint, J. ; Zaitlen, N. ; Cai, N. ; Dahl, A. ; Sankararaman, S.

Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries.

Nat. Genet. 55, 2269-2276 (2023)

Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.

Nature Genetics

Wissenschaftlicher Artikel

Scientific Article

Dahl, A. ; Thompson, M.L. ; An, U. ; Krebs, M.D. ; Appadurai, V. ; Border, R. ; Bacanu, S.A. ; Werge, T. ; Flint, J. ; Schork, A.J. ; Sankararaman, S. ; Kendler, K.S. ; Cai, N.

Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder.

Nat. Genet. 55, 2082-2093 (2023)

Biobanks often contain several phenotypes relevant to diseases such as major depressive disorder (MDD), with partly distinct genetic architectures. Researchers face complex tradeoffs between shallow (large sample size, low specificity/sensitivity) and deep (small sample size, high specificity/sensitivity) phenotypes, and the optimal choices are often unclear. Here we propose to integrate these phenotypes to combine the benefits of each. We use phenotype imputation to integrate information across hundreds of MDD-relevant phenotypes, which significantly increases genome-wide association study (GWAS) power and polygenic risk score (PRS) prediction accuracy of the deepest available MDD phenotype in UK Biobank, LifetimeMDD. We demonstrate that imputation preserves specificity in its genetic architecture using a novel PRS-based pleiotropy metric. We further find that integration via summary statistics also enhances GWAS power and PRS predictions, but can introduce nonspecific genetic effects depending on input. Our work provides a simple and scalable approach to improve genetic studies in large biobanks by integrating shallow and deep phenotypes.

Nature Genetics

Wissenschaftlicher Artikel

Scientific Article

2022

Border, R. ; Athanasiadis, G.I. ; Buil, A. ; Schork, A.J. ; Cai, N. ; Young, A.I. ; Werge, T. ; Flint, J. ; Kendler, K.S. ; Sankararaman, S. ; Dahl, A.W. ; Zaitlen, N.A.

Cross-trait assortative mating is widespread and inflates genetic correlation estimates.

Science 378, 754-761 (2022)

The observation of genetic correlations between disparate human traits has been interpreted as evidence of widespread pleiotropy. Here, we introduce cross-trait assortative mating (xAM) as an alternative explanation. We observe that xAM affects many phenotypes and that phenotypic cross-mate correlation estimates are strongly associated with genetic correlation estimates (R2 = 74%). We demonstrate that existing xAM plausibly accounts for substantial fractions of genetic correlation estimates and that previously reported genetic correlation estimates between some pairs of psychiatric disorders are congruent with xAM alone. Finally, we provide evidence for a history of xAM at the genetic level using cross-trait even/odd chromosome polygenic score correlations. Together, our results demonstrate that previous reports have likely overestimated the true genetic similarity between many phenotypes.

Science

Wissenschaftlicher Artikel

Scientific Article

Chang, S. ; Fermani, F. ; Lao, C.L. ; Huang, L. ; Jakovcevski, M. ; Di Giaimo, R. ; Gagliardi, M. ; Menegaz, D. ; Hennrich, A.A. ; Ziller, M.J. ; Eder, M. ; Klein, R. ; Cai, N. ; Deussing, J.M.

Tripartite extended amygdala-basal ganglia CRH circuit drives locomotor activation and avoidance behavior.

Sci. Adv. 8:eabo1023 (2022)

An adaptive stress response involves various mediators and circuits orchestrating a complex interplay of physiological, emotional, and behavioral adjustments. We identified a population of corticotropin-releasing hormone (CRH) neurons in the lateral part of the interstitial nucleus of the anterior commissure (IPACL), a subdivision of the extended amygdala, which exclusively innervate the substantia nigra (SN). Specific stimulation of this circuit elicits hyperactivation of the hypothalamic-pituitary-adrenal axis, locomotor activation, and avoidance behavior contingent on CRH receptor type 1 (CRHR1) located at axon terminals in the SN, which originate from external globus pallidus (GPe) neurons. The neuronal activity prompting the observed behavior is shaped by IPACLCRH and GPeCRHR1 neurons coalescing in the SN. These results delineate a previously unidentified tripartite CRH circuit functionally connecting extended amygdala and basal ganglia nuclei to drive locomotor activation and avoidance behavior.

Science Advances

Wissenschaftlicher Artikel

Scientific Article

An, U. ; Cai, N. ; Dahl, A. ; Sankararaman, S.

AutoComplete: Deep learning-based phenotype imputation for large-scale biomedical data.

Lect. Notes Comput. Sc. 13278 LNBI, 385-386 (2022)

Biomedical datasets that aim to collect diverse phenotypic and genomic data across large numbers of individuals are plagued by the large fraction of missing data The ability to accurately impute or “fill-in” missing entries in these datasets is critical to a number of downstream applications.

Lecture Notes in Computer Science

Meeting abstract

Schork, A.J. ; Peterson, R.E. ; Dahl, A. ; Cai, N. ; Kendler, K.S.

Author Correction: Indirect paths from genetics to education (Nature Genetics, (2022), 54, 4, (372-373), 10.1038/s41588-021-00999-5).

Nat. Genet., DOI: 10.1038/s41588-022-01092-1 (2022)

In the version of this article initially published, the affiliations shown for Roseann E. Peterson were incorrect. The correct affiliation, “Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA,” has been updated in the HTML and PDF versions of the article.

Nature Genetics

Falkai, P. ; Koutsouleris, N. ; Bertsch, K. ; Bialas, M. ; Binder, E. ; Bühner, M. ; Buyx, A. ; Cai, N. ; Cappello, S. ; Ehring, T. ; Gensichen, J. ; Hamann, J. ; Hasan, A. ; Henningsen, P. ; Leucht, S. ; Möhrmann, K.H. ; Nagelstutz, E. ; Padberg, F. ; Peters, A. ; Pfäffel, L. ; Reich-Erkelenz, D. ; Riedl, V. ; Rueckert, D. ; Schmitt, A. ; Schulte-Körne, G. ; Scheuring, E. ; Schulze, T.G. ; Starzengruber, R. ; Stier, S. ; Theis, F.J. ; Winkelmann, J. ; Wurst, W. ; Priller, J.

Concept of the Munich/Augsburg Consortium Precision in Mental Health for the German Center of Mental Health.

Front. Psychiatr. 13:815718 (2022)

The Federal Ministry of Education and Research (BMBF) issued a call for a new nationwide research network on mental disorders, the German Center of Mental Health (DZPG). The Munich/Augsburg consortium was selected to participate as one of six partner sites with its concept "Precision in Mental Health (PriMe): Understanding, predicting, and preventing chronicity." PriMe bundles interdisciplinary research from the Ludwig-Maximilians-University (LMU), Technical University of Munich (TUM), University of Augsburg (UniA), Helmholtz Center Munich (HMGU), and Max Planck Institute of Psychiatry (MPIP) and has a focus on schizophrenia (SZ), bipolar disorder (BPD), and major depressive disorder (MDD). PriMe takes a longitudinal perspective on these three disorders from the at-risk stage to the first-episode, relapsing, and chronic stages. These disorders pose a major health burden because in up to 50% of patients they cause untreatable residual symptoms, which lead to early social and vocational disability, comorbidities, and excess mortality. PriMe aims at reducing mortality on different levels, e.g., reducing death by psychiatric and somatic comorbidities, and will approach this goal by addressing interdisciplinary and cross-sector approaches across the lifespan. PriMe aims to add a precision medicine framework to the DZPG that will propel deeper understanding, more accurate prediction, and personalized prevention to prevent disease chronicity and mortality across mental illnesses. This framework is structured along the translational chain and will be used by PriMe to innovate the preventive and therapeutic management of SZ, BPD, and MDD from rural to urban areas and from patients in early disease stages to patients with long-term disease courses. Research will build on platforms that include one on model systems, one on the identification and validation of predictive markers, one on the development of novel multimodal treatments, one on the regulation and strengthening of the uptake and dissemination of personalized treatments, and finally one on testing of the clinical effectiveness, utility, and scalability of such personalized treatments. In accordance with the translational chain, PriMe's expertise includes the ability to integrate understanding of bio-behavioral processes based on innovative models, to translate this knowledge into clinical practice and to promote user participation in mental health research and care.

Frontiers in psychiatry

Review

Schork, A.J. ; Peterson, R.E. ; Dahl, A. ; Cai, N. ; Kendler, K.S.

Indirect paths from genetics to education.

Nat. Genet. 54, 372-373 (2022)

Nature Genetics

Editorial

Nguyen, T.D. ; Harder, A. ; Xiong, Y. ; Kowalec, K. ; Hägg, S. ; Cai, N. ; Kuja-Halkola, R. ; Dalman, C. ; Sullivan, P.F. ; Lu, Y.

Genetic heterogeneity and subtypes of major depression.

Mol. Psychiatry 27, 1667–1675 (2022)

Major depression (MD) is a heterogeneous disorder; however, the extent to which genetic factors distinguish MD patient subgroups (genetic heterogeneity) remains uncertain. This study sought evidence for genetic heterogeneity in MD. Using UK Biobank cohort, the authors defined 16 MD subtypes within eight comparison groups (vegetative symptoms, symptom severity, comorbid anxiety disorder, age at onset, recurrence, suicidality, impairment, and postpartum depression; N ~ 3000-47000). To compare genetic component of these subtypes, subtype-specific genome-wide association studies were performed to estimate SNP-heritability, and genetic correlations within subtype comparison and with other related disorders/traits. The findings indicated that MD subtypes were divergent in their SNP-heritability, and genetic correlations both within subtype comparisons and with other related disorders/traits. Three subtype comparisons (vegetative symptoms, age at onset, and impairment) showed significant differences in SNP-heritability; while genetic correlations within subtype comparisons ranged from 0.55 to 0.86, suggesting genetic profiles are only partially shared among MD subtypes. Furthermore, subtypes that are more clinically challenging, e.g., early-onset, recurrent, suicidal, more severely impaired, had stronger genetic correlations with other psychiatric disorders. MD with atypical-like features showed a positive genetic correlation (+0.40) with BMI while a negative correlation (-0.09) was found in those without atypical-like features. Novel genomic loci with subtype-specific effects were identified. These results provide the most comprehensive evidence to date for genetic heterogeneity within MD, and suggest that the phenotypic complexity of MD can be effectively reduced by studying the subtypes which share partially distinct etiologies.

Molecular Psychiatry

Wissenschaftlicher Artikel

Scientific Article

Zou, J. ; Gopalakrishnan, S. ; Parker, C.C. ; Nicod, J. ; Mott, R. ; Cai, N. ; Lionikas, A. ; Davies, R.W. ; Palmer, A.A. ; Flint, J.

Analysis of independent cohorts of outbred CFW mice reveals novel loci for behavioral and physiological traits and identifies factors determining reproducibility.

Genes Genomes Genetics G3 12:jkab394 (2022)

Combining samples for genetic association is standard practice in human genetic analysis of complex traits, but is rarely undertaken in rodent genetics. Here, using 23 phenotypes and genotypes from two independent laboratories, we obtained a sample size of 3,076 commercially available outbred mice and identified 70 loci, more than double the number of loci identified in the component studies. Fine-mapping in the combined sample reduced the number of likely causal variants, with a median reduction in set size of 51%, and indicated novel gene associations, including Pnpo, Ttll6 and GM11545 with bone mineral density, and Psmb9 with weight. However replication at a nominal threshold of 0.05 between the two component studies was low, with less than a third of loci identified in one study replicated in the second. In addition to overestimates in the effect size in the discovery sample (Winner's Curse), we also found that heterogeneity between studies explained the poor replication, but the contribution of these two factors varied among traits. Leveraging these observations we integrated information about replication rates, study-specific heterogeneity, and Winner's Curse corrected estimates of power to assign variants to one of four confidence levels. Our approach addresses concerns about reproducibility, and demonstrates how to obtain robust results from mapping complex traits in any genome-wide association study.

Genes Genomes Genetics G3

Wissenschaftlicher Artikel

Scientific Article

2021

Cai, N. ; Gomez-Duran, A. ; Yonova-Doing, E. ; Kundu, K. ; Burgess, A.I. ; Golder, Z.J. ; Calabrese, C. ; Bonder, M.J. ; Camacho, M. ; Lawson, R.A. ; Li, L. ; Williams-Gray, C.H. ; ICICLE-PD Study Group ; di Angelantonio, E. ; Roberts, D.J. ; Watkins, N.A. ; Ouwehand, W.H. ; Butterworth, A.S. ; Stewart, I.D. ; Pietzner, M. ; Wareham, N.J. ; Langenberg, C. ; Walter, K. ; Rothwell, P.M. ; Howson, J.M.M. ; Stegle, O. ; Chinnery, P.F. ; Soranzo, N.

Mitochondrial DNA variants modulate N-formylmethionine, proteostasis and risk of late-onset human diseases.

Nat. Med. 27, 1564-1575 (2021)

Mitochondrial DNA (mtDNA) variants influence the risk of late-onset human diseases, but the reasons for this are poorly understood. Undertaking a hypothesis-free analysis of 5,689 blood-derived biomarkers with mtDNA variants in 16,220 healthy donors, here we show that variants defining mtDNA haplogroups Uk and H4 modulate the level of circulating N-formylmethionine (fMet), which initiates mitochondrial protein translation. In human cytoplasmic hybrid (cybrid) lines, fMet modulated both mitochondrial and cytosolic proteins on multiple levels, through transcription, post-translational modification and proteolysis by an N-degron pathway, abolishing known differences between mtDNA haplogroups. In a further 11,966 individuals, fMet levels contributed to all-cause mortality and the disease risk of several common cardiovascular disorders. Together, these findings indicate that fMet plays a key role in common age-related disease through pleiotropic effects on cell proteostasis.

Nature medicine

Wissenschaftlicher Artikel

Scientific Article

Bonder, M.J. ; Smail, C. ; Gloudemans, M.J. ; Frésard, L. ; Jakubosky, D. ; D'Antonio, M. ; Li, X. ; Ferraro, N.M. ; Carcamo-Orive, I. ; Mirauta, B. ; Seaton, D.D. ; Cai, N. ; Vakili, D. ; Horta, D. ; Zhao, C. ; Zastrow, D.B. ; Bonner, D.E. ; Wheeler, M.T. ; Kilpinen, H. ; Knowles, J.W. ; Smith, E.N. ; Frazer, K.A. ; Montgomery, S.B. ; Stegle, O.

Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics.

Nat. Genet. 53, 313-321 (2021)

Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.

Nature Genetics

Wissenschaftlicher Artikel

Scientific Article

Chatzinakos, C. ; Lee, D. ; Cai, N. ; Vladimirov, V.I. ; Webb, B.T. ; Riley, B.P. ; Flint, J. ; Kendler, K.S. ; Ressler, K.J. ; Daskalakis, N.P. ; Bacanu, S.A.

Increasing the resolution and precision of psychiatric genome-wide association studies by re-imputing summary statistics using a large, diverse reference panel.

Am. J. Med. Genet. B 196, 16-27 (2021)

Genotype imputation across populations of mixed ancestry is critical for optimal discovery in large-scale genome-wide association studies (GWAS). Methods for direct imputation of GWAS summary-statistics were previously shown to be practically as accurate as summary statistics produced after raw genotype imputation, while incurring orders of magnitude lower computational burden. Given that direct imputation needs a precise estimation of linkage-disequilibrium (LD) and that most of the methods using a small reference panel for example, ~2,500-subject coming from the 1000 Genome-Project, there is a great need for much larger and more diverse reference panels. To accurately estimate the LD needed for an exhaustive analysis of any cosmopolitan cohort, we developed DISTMIX2. DISTMIX2: (a) uses a much larger and more diverse reference panel compared to traditional reference panels, and (b) can estimate weights of ethnic-mixture based solely on Z-scores, when allele frequencies are not available. We applied DISTMIX2 to GWAS summary-statistics from the psychiatric genetic consortium (PGC). DISTMIX2 uncovered signals in numerous new regions, with most of these findings coming from the rarer variants. Rarer variants provide much sharper location for the signals compared with common variants, as the LD for rare variants extends over a lower distance than for common ones. For example, while the original PGC post-traumatic stress disorder GWAS found only 3 marginal signals for common variants, we now uncover a very strong signal for a rare variant in PKN2, a gene associated with neuronal and hippocampal development. Thus, DISTMIX2 provides a robust and fast (re)imputation approach for most psychiatric GWAS-studies.

American Journal of Medical Genetics, Part B: Neuropsychiatric Genetics

Wissenschaftlicher Artikel

Scientific Article

2020

Chatzinakos, C. ; Georgiadis, F. ; Lee, D. ; Cai, N. ; Vladimirov, V.I. ; Docherty, A. ; Webb, B.T. ; Riley, B.P. ; Flint, J. ; Kendler, K.S. ; Daskalakis, N.P. ; Bacanu, S.A.

TWAS pathway method greatly enhances the number of leads for uncovering the molecular underpinnings of psychiatric disorders.

Am. J. Med. Genet. B 183, 454-463 (2020)

Genetic signal detection in genome-wide association studies (GWAS) is enhanced by pooling small signals from multiple Single Nucleotide Polymorphism (SNP), for example, across genes and pathways. Because genes are believed to influence traits via gene expression, it is of interest to combine information from expression Quantitative Trait Loci (eQTLs) in a gene or genes in the same pathway. Such methods, widely referred to as transcriptomic wide association studies (TWAS), already exist for gene analysis. Due to the possibility of eliminating most of the confounding effects of linkage disequilibrium (LD) from TWAS gene statistics, pathway TWAS methods would be very useful in uncovering the true molecular basis of psychiatric disorders. However, such methods are not yet available for arbitrarily large pathways/gene sets. This is possibly due to the quadratic (as a function of the number of SNPs) computational burden for computing LD across large chromosomal regions. To overcome this obstacle, we propose JEPEGMIX2-P, a novel TWAS pathway method that (a) has a linear computational burden, (b) uses a large and diverse reference panel (33 K subjects), (c) is competitive (adjusts for background enrichment in gene TWAS statistics), and (d) is applicable as-is to ethnically mixed-cohorts. To underline its potential for increasing the power to uncover genetic signals over the commonly used nontranscriptomics methods, for example,MAGMA, we applied JEPEGMIX2-P to summary statistics of most large meta-analyses from Psychiatric Genetics Consortium (PGC). While our work is just the very first step toward clinical translation of psychiatric disorders, PGC anorexia results suggest a possible avenue for treatment.

American Journal of Medical Genetics, Part B: Neuropsychiatric Genetics

Wissenschaftlicher Artikel

Scientific Article

Cai, N. ; Choi, K.W. ; Fried, E.I.

Reviewing the genetics of heterogeneity in depression: Operationalizations, manifestations, and etiologies.

Hum. Mol. Genet. 29, R10-R18 (2020)

With progress in genome-wide association studies (GWAS) of depression, from identifying zero hits in ~ 16 000 individuals in 2013 to 223 hits in more than a million individuals in 2020, understanding the genetic architecture of this debilitating condition no longer appears to be an impossible task. The pressing question now is whether recently discovered variants describe the etiology of a single disease entity. There are a myriad of ways to measure and operationalize depression severity, and major depressive disorder (MDD) as defined in the DSM-5 can manifest in more than ten thousand ways based on symptom profiles alone. Variations in developmental timing, comorbidity, and environmental contexts across individuals and samples further add to the heterogeneity. With big data increasingly enabling genomic discovery in psychiatry, it is more timely than ever to explicitly disentangle genetic contributions to what is likely 'depressions' rather than depression. Here, we introduce three sources of heterogeneity: operationalization, manifestation, and etiology. We review recent efforts to identify depression subtypes using clinical and data-driven approaches, examine differences in genetic architecture of depression across contexts, and argue that heterogeneity in operationalizations of depression is likely a considerable source of inconsistency. Finally, we offer recommendations and considerations for the field going forward.

Human Molecular Genetics

Review

Cai, N. ; Revez, J.A. ; Adams, M.J. ; Andlauer, T.F.M. ; Breen, G. ; Byrne, E.M. ; Clarke, T.K. ; Forstner, A.J. ; Grabe, H.J. ; Hamilton, S.P. ; Levinson, D.F. ; Lewis, C.M. ; Lewis, G. ; Martin, N.G. ; Milaneschi, Y. ; Mors, O. ; Müller-Myhsok, B. ; Penninx, B.W.J.H. ; Perlis, R.H. ; Pistis, G. ; Potash, J.B. ; Preisig, M. ; Shi, J. ; Smoller, J.W. ; Streit, F. ; Tiemeier, H. ; Uher, R. ; Van der Auwera, S. ; Viktorin, A. ; Weissman, M.M. ; Kendler, K.S. ; Flint, J.

Minimal phenotyping yields genome-wide association signals of low specificity for major depression.

Nat. Genet. 52, 437-447 (2020)

Minimal phenotyping refers to the reliance on the use of a small number of self-reported items for disease case identification, increasingly used in genome-wide association studies (GWAS). Here we report differences in genetic architecture between depression defined by minimal phenotyping and strictly defined major depressive disorder (MDD): the former has a lower genotype-derived heritability that cannot be explained by inclusion of milder cases and a higher proportion of the genome contributing to this shared genetic liability with other conditions than for strictly defined MDD. GWAS based on minimal phenotyping definitions preferentially identifies loci that are not specific to MDD, and, although it generates highly predictive polygenic risk scores, the predictive power can be explained entirely by large sample sizes rather than by specificity for MDD. Our results show that reliance on results from minimal phenotyping may bias views of the genetic architecture of MDD and impede the ability to identify pathways specific to MDD.Genetic analyses of depression based on minimal phenotyping identify nonspecific genetic risk factors shared between major depressive disorder (MDD) and other psychiatric conditions, suggesting that this approach may have limited ability to identify pathways specific to MDD.

Nature Genetics

Wissenschaftlicher Artikel

Scientific Article

2019

Dahl, A. ; Cai, N. ; Ko, A. ; Laakso, M. ; Pajukanta, P. ; Flint, J. ; Zaitlen, N.

Reverse GWAS: Using genetics to identify and model phenotypic subtypes.

PLoS Genet. 15:e1008009 (2019)

Recent and classical work has revealed biologically and medically significant subtypes in complex diseases and traits. However, relevant subtypes are often unknown, unmeasured, or actively debated, making automated statistical approaches to subtype definition valuable. We propose reverse GWAS (RGWAS) to identify and validate subtypes using genetics and multiple traits: while GWAS seeks the genetic basis of a given trait, RGWAS seeks to define trait subtypes with distinct genetic bases. Unlike existing approaches relying on off-the-shelf clustering methods, RGWAS uses a novel decomposition, MFMR, to model covariates, binary traits, and population structure. We use extensive simulations to show that modelling these features can be crucial for power and calibration. We validate RGWAS in practice by recovering a recently discovered stress subtype in major depression. We then show the utility of RGWAS by identifying three novel subtypes of metabolic traits. We biologically validate these metabolic subtypes with SNP-level tests and a novel polygenic test: the former recover known metabolic GxE SNPs; the latter suggests subtypes may explain substantial missing heritability. Crucially, statins, which are widely prescribed and theorized to increase diabetes risk, have opposing effects on blood glucose across metabolic subtypes, suggesting the subtypes have potential translational value.

PLoS Genetics

Wissenschaftlicher Artikel

Scientific Article

2018

Cai, N. ; Revez, J.A. ; Adams, M.J. ; Andlauer, T.F.M. ; Breen, G. ; Byrne, E.M. ; Clarke, T.K. ; Forstner, A.J. ; Grabe, H.J. ; Hamilton, S.P. ; Levinson, D.F. ; Lewis, C.M. ; Lewis, G. ; Martin, N.G. ; Milaneschi, Y. ; Mors, O. ; Müller-Myhsok, B. ; Pennix, B.W.J.H. ; Perlis, R.H. ; Pistis, G. ; Potash, J.B. ; Preisig, M. ; Shi, J. ; Smoller, J.W. ; Streit, F. ; Tiemeier, H. ; Uher, R. ; Van der Auwera, S. ; Viktorin, A. ; Weissman, M.M.

Minimal phenotyping yields GWAS hits of low specificity for major depression.

bioRxiv, accepted (2018)

Minimal phenotyping refers to the reliance on the use of a small number of self-report items for disease case identification. This strategy has been applied to genome-wide association studies (GWAS) of major depressive disorder (MDD). Here we report that the genotype derived heritability (h2SNP) of depression defined by minimal phenotyping (14%, SE = 0.8%) is lower than strictly defined MDD (26%, SE = 2.2%). This cannot be explained by differences in prevalence between definitions or including cases of lower liability to MDD in minimal phenotyping definitions of depression, but can be explained by misdiagnosis of those without depression or with related conditions as cases of depression. Depression defined by minimal phenotyping is as genetically correlated with strictly defined MDD (rG = 0.81, SE = 0.03) as it is with the personality trait neuroticism (rG = 0.84, SE = 0.05), a trait not defined by the cardinal symptoms of depression. While they both show similar shared genetic liability with neuroticism, a greater proportion of the genome contributes to the minimal phenotyping definitions of depression (80.2%, SE = 0.6%) than to strictly defined MDD (65.8%, SE = 0.6%). We find that GWAS loci identified in minimal phenotyping definitions of depression are not specific to MDD: they also predispose to other psychiatric conditions. Finally, while highly predictive polygenic risk scores can be generated from minimal phenotyping definitions of MDD, the predictive power can be explained entirely by the sample size used to generate the polygenic risk score, rather than specificity for MDD. Our results reveal that genetic analysis of minimal phenotyping definitions of depression identifies non-specific genetic factors shared between MDD and other psychiatric conditions. Reliance on results from minimal phenotyping for MDD may thus bias views of the genetic architecture of MDD and may impede our ability to identify pathways specific to MDD.

bioRxiv

Wissenschaftlicher Artikel

Scientific Article

Peterson, R.E. ; Cai, N. ; Dahl, A.W. ; Bigdeli, T.B. ; Edwards, A.C. ; Webb, B.T. ; Bacanu, S.A. ; Zaitlen, N. ; Flint, J. ; Kendler, K.S.

Molecular genetic analysis subdivided by adversity exposure suggests etiologic heterogeneity in major depression.

Am. J. Psychiatry 175, 545-554 (2018)

OBJECTIVE: The extent to which major depression is the outcome of a single biological mechanism or represents a final common pathway of multiple disease processes remains uncertain. Genetic approaches can potentially identify etiologic heterogeneity in major depression by classifying patients on the basis of their experience of major adverse events. METHOD: Data are from the China, Oxford, and VCU Experimental Research on Genetic Epidemiology (CONVERGE) project, a study of Han Chinese women with recurrent major depression aimed at identifying genetic risk factors for major depression in a rigorously ascertained cohort carefully assessed for key environmental risk factors (N=9,599). To detect etiologic heterogeneity, genome-wide association studies, heritability analyses, and gene-by-environment interaction analyses were performed. RESULTS: Genome-wide association studies stratified by exposure to adversity revealed three novel loci associated with major depression only in study participants with no history of adversity. Significant gene-by-environment interactions were seen between adversity and genotype at all three loci, and 13.2% of major depression liability can be attributed to genome-wide interaction with adversity exposure. The genetic risk in major depression for participants who reported major adverse life events (27%) was partially shared with that in participants who did not (73%; genetic correlation=+0.64). Together with results from simulation studies, these findings suggest etiologic heterogeneity within major depression as a function of environmental exposures. CONCLUSIONS: The genetic contributions to major depression may differ between women with and those without major adverse life events. These results have implications for the molecular dissection of major depression and other complex psychiatric and biomedical diseases.

American Journal of Psychiatry, The

Wissenschaftlicher Artikel

Scientific Article

2017

Speed, D. ; Cai, N. ; Johnson, M.R. ; Nejentsev, S. ; Balding, D.J.

Reevaluation of SNP heritability in complex human traits.

Nat. Genet. 49, 986-992 (2017)

SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (s.d. 3%) higher than those obtained from the widely used software GCTA and 25% (s.d. 2%) higher than those from the recently proposed extension GCTA-LDMS. Previously, DNase I hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model, their estimated contribution is only 24%.

Nature Genetics

Wissenschaftlicher Artikel

Scientific Article

2015

Cai, N. ; Chang, S. ; Li, Y. ; Li, Q. ; Hu, J. ; Liang, J. ; Song, L. ; Kretzschmar, W. ; Gan, X. ; Nicod, J. ; Rivera, M. ; Deng, H. ; Du, B. ; Li, K. ; Sang, W. ; Gao, J. ; Gao, S. ; Ha, B. ; Ho, H.Y. ; Hu, C. ; Hu, Z. ; Huang, G. ; Jiang, G. ; Jiang, T. ; Jin, W. ; Li, G. ; Lin, Y.T. ; Liu, L. ; Liu, T. ; Liu, Y. ; Lu, Y. ; Lv, L. ; Meng, H. ; Qian, P. ; Sang, H. ; Shen, J. ; Shi, J. ; Sun, J. ; Tao, M. ; Wang, G. ; Wang, J. ; Wang, L. ; Wang, X. ; Yang, H. ; Yang, L. ; Yin, Y. ; Zhang, J. ; Zhang, K. ; Sun, N. ; Zhang, W. ; Zhang, X. ; Zhang, Z. ; Zhong, H. ; Breen, G. ; Marchini, J. ; Chen, Y. ; Xu, Q. ; Xu, X. ; Mott, R. ; Huang, G.J. ; Kendler, K. ; Flint, J.

Molecular signatures of major depression.

Curr. Biol. 25, 1146-1156 (2015)

Adversity, particularly in early life, can cause illness. Clues to the responsible mechanisms may lie with the discovery of molecular signatures of stress, some of which include alterations to an individual's somatic genome. Here, using genome sequences from 11,670 women, we observed a highly significant association between a stress-related disease, major depression, and the amount of mtDNA (p = 9.00 × 10(-42), odds ratio 1.33 [95% confidence interval [CI] = 1.29-1.37]) and telomere length (p = 2.84 × 10(-14), odds ratio 0.85 [95% CI = 0.81-0.89]). While both telomere length and mtDNA amount were associated with adverse life events, conditional regression analyses showed the molecular changes were contingent on the depressed state. We tested this hypothesis with experiments in mice, demonstrating that stress causes both molecular changes, which are partly reversible and can be elicited by the administration of corticosterone. Together, these results demonstrate that changes in the amount of mtDNA and telomere length are consequences of stress and entering a depressed state. These findings identify increased amounts of mtDNA as a molecular marker of MD and have important implications for understanding how stress causes the disease.

Current Biology

Wissenschaftlicher Artikel

Scientific Article

CONVERGE consortium (Cai, N.)

Sparse whole-genome sequencing identifies two loci for major depressive disorder.

Nature 523, 588-591 (2015)

Major depressive disorder (MDD), one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide, poses a major challenge to genetic analysis. To date, no robustly replicated genetic loci have been identified, despite analysis of more than 9,000 cases. Here, using low-coverage whole-genome sequencing of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD, we identified, and subsequently replicated in an independent sample, two loci contributing to risk of MDD on chromosome 10: one near the SIRT1 gene (P = 2.53 × 10(-10)), the other in an intron of the LHPP gene (P = 6.45 × 10(-12)). Analysis of 4,509 cases with a severe subtype of MDD, melancholia, yielded an increased genetic signal at the SIRT1 locus. We attribute our success to the recruitment of relatively homogeneous cases with severe illness.

Nature

Wissenschaftlicher Artikel

Scientific Article

Cai, N. ; Li, Y. ; Chang, S. ; Liang, J. ; Lin, C. ; Zhang, X. ; Liang, L. ; Hu, J. ; Chan, W. ; Kendler, K.S. ; Malinauskas, T. ; Huang, G.J. ; Li, Q. ; Mott, R. ; Flint, J.

Genetic control over mtDNA and its relationship to major depressive disorder.

Curr. Biol. 25, 3170-3177 (2015)

Control over the number of mtDNA molecules per cell appears to be tightly regulated, but the mechanisms involved are largely unknown. Reversible alterations in the amount of mtDNA occur in response to stress suggesting that control over the amount of mtDNA is involved in stress-related diseases including major depressive disorder (MDD). Using low-coverage sequence data from 10,442 Chinese women to compute the normalized numbers of reads mapping to the mitochondrial genome as a proxy for the amount of mtDNA, we identified two loci that contribute to mtDNA levels: one within the TFAM gene on chromosome 10 (rs11006126, p value = 8.73 × 10(-28), variance explained = 1.90%) and one over the CDK6 gene on chromosome 7 (rs445, p value = 6.03 × 10(-16), variance explained = 0.50%). Both loci replicated in an independent cohort. CDK6 is thus a new molecule involved in the control of mtDNA. We identify increased rates of heteroplasmy in women with MDD, and show from an experimental paradigm using mice that the increase is likely due to stress. Furthermore, at least one heteroplasmic variant is significantly associated with changes in the amount of mtDNA (position 513, p value = 3.27 × 10(-9), variance explained = 0.48%) suggesting site-specific heteroplasmy as a possible link between stress and increase in amount of mtDNA. These findings indicate the involvement of mitochondrial genome copy number and sequence in an organism's response to stress.

Current Biology

Wissenschaftlicher Artikel

Scientific Article