Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 5;105(6):1193-1212.
doi: 10.1016/j.ajhg.2019.10.012. Epub 2019 Nov 27.

Integrating Clinical Data and Imputed Transcriptome from GWAS to Uncover Complex Disease Subtypes: Applications in Psychiatry and Cardiology

Affiliations

Integrating Clinical Data and Imputed Transcriptome from GWAS to Uncover Complex Disease Subtypes: Applications in Psychiatry and Cardiology

Liangying Yin et al. Am J Hum Genet. .

Abstract

Classifying subjects into clinically and biologically homogeneous subgroups will facilitate the understanding of disease pathophysiology and development of targeted prevention and intervention strategies. Traditionally, disease subtyping is based on clinical characteristics alone, but subtypes identified by such an approach may not conform exactly to the underlying biological mechanisms. Very few studies have integrated genomic profiles (e.g., those from GWASs) with clinical symptoms for disease subtyping. Here we proposed an analytic framework capable of finding complex diseases subgroups by leveraging both GWAS-predicted gene expression levels and clinical data by a multi-view bicluster analysis. This approach connects SNPs to genes via their effects on expression, so the analysis is more biologically relevant and interpretable than a pure SNP-based analysis. Transcriptome of different tissues can also be readily modeled. We also proposed various evaluation metrics for assessing clustering performance. Our framework was able to subtype schizophrenia subjects into diverse subgroups with different prognosis and treatment response. We also applied the framework to the Northern Finland Birth Cohort (NFBC) 1966 dataset and identified high and low cardiometabolic risk subgroups in a gender-stratified analysis. The prediction strength by cross-validation was generally greater than 80%, suggesting good stability of the clustering model. Our results suggest a more data-driven and biologically informed approach to defining metabolic syndrome and subtyping psychiatric disorders. Moreover, we found that the genes "blindly" selected by the algorithm are significantly enriched for known susceptibility genes discovered in GWASs of schizophrenia or cardiovascular diseases. The proposed framework opens up an approach to subject stratification.

Keywords: cardiovascular disease; clustering; disease subtyping; gene expression; genome-wide association study; schizophrenia.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Outline of Proposed Method for Disease Subtypes Discovery
Figure 2
Figure 2
Illustration of Distance Calculation for Prediction Strength
Figure 3
Figure 3
Comparison across Input Clinical and Neurocognitive Features by Subgroups for SCZ-Affected Individuals (A) Distribution of categorical features by group. (B) Mean and standard error of cognitive features by group (error bars indicate the corresponding standard errors). (C) Mean of standard error of AgeOnset and DUP by group (error bars indicate the corresponding standard errors).
Figure 4
Figure 4
Comparison across Outcome-Related Variables by Subgroups for SCZ-Affected Individuals (A) Violence. (B) Self harm. (C) Treatment response. (D) Episode. (E) PANSS scores (error bars indicate the corresponding standard errors).
Figure 5
Figure 5
Comparison across Input Clinical Features by Male Subgroups from Multi-view Clustering (A) WHR, CRP, FG, and INS (error bars indicate the corresponding standard errors). (B) TC, HDL, LDL, TG, and HOMA_IR (error bars indicate the corresponding standard errors). (C) BMI, SBP, and DBP (error bars indicate the corresponding standard errors).
Figure 6
Figure 6
Comparison across Input Clinical Features by Female Subgroups from Multi-view Clustering (A) WHR, CRP, FG, and INS (error bars indicate the corresponding standard errors). (B) TC, HDL, LDL, TG, and HOMA_IR (error bars indicate the corresponding standard errors). (C) BMI, SBP, and DBP (error bars indicate the corresponding standard errors).

Similar articles

Cited by

References

    1. Sørlie T., Perou C.M., Tibshirani R., Aas T., Geisler S., Johnsen H., Hastie T., Eisen M.B., van de Rijn M., Jeffrey S.S. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA. 2001;98:10869–10874. - PMC - PubMed
    1. Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 years of GWAS discovery: Biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. - PMC - PubMed
    1. So H.C., Sham P.C. Improving polygenic risk prediction from summary statistics by an empirical Bayes approach. Sci. Rep. 2017;7:41262. - PMC - PubMed
    1. So H.C., Chau C.K., Chiu W.T., Ho K.S., Lo C.P., Yim S.H., Sham P.C. Analysis of genome-wide association data highlights candidates for drug repositioning in psychiatry. Nat. Neurosci. 2017;20:1342–1349. - PubMed
    1. Arnedo J., Svrakic D.M., Del Val C., Romero-Zaliz R., Hernández-Cuervo H., Fanous A.H., Pato M.T., Pato C.N., de Erausquin G.A., Cloninger C.R., Zwir I., Molecular Genetics of Schizophrenia Consortium Uncovering the hidden risk architecture of the schizophrenias: confirmation in three independent genome-wide association studies. Am. J. Psychiatry. 2015;172:139–153. - PubMed

Publication types

Substances