Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 30;6(234):234ra57.
doi: 10.1126/scitranslmed.3007191.

Disease risk factors identified through shared genetic architecture and electronic medical records

Affiliations

Disease risk factors identified through shared genetic architecture and electronic medical records

Li Li et al. Sci Transl Med. .

Abstract

Genome-wide association studies have identified genetic variants for thousands of diseases and traits. We evaluated the relationships between specific risk factors (for example, blood cholesterol level) and diseases on the basis of their shared genetic architecture in a comprehensive human disease-single-nucleotide polymorphism association database (VARIMED), analyzing the findings from 8962 published association studies. Similarity between traits and diseases was statistically evaluated on the basis of their association with shared gene variants. We identified 120 disease-trait pairs that were statistically similar, and of these, we tested and validated five previously unknown disease-trait associations by searching electronic medical records (EMRs) from three independent medical centers for evidence of the trait appearing in patients within 1 year of first diagnosis of the disease. We validated that the mean corpuscular volume is elevated before diagnosis of acute lymphoblastic leukemia; both have associated variants in the gene IKZF1. Platelet count is decreased before diagnosis of alcohol dependence; both are associated with variants in the gene C12orf51. Alkaline phosphatase level is elevated in patients with venous thromboembolism; both share variants in ABO. Similarly, we found that prostate-specific antigen and serum magnesium levels were altered before the diagnosis of lung cancer and gastric cancer, respectively. Disease-trait associations identify traits that could serve as future prognostics, if validated through EMR and subsequent prospective trials.

PubMed Disclaimer

Conflict of interest statement

Competing interests: Dr. Butte is a founder and consultant of Personalis, Inc., a genetic testing company. Rong Chen is an employee of Personalis, Inc. The rest of authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Diagram for identifying significant disease-trait genetic associations.
Figure 2
Figure 2. Disease-trait network of 120 significant pairs
The network consists of the 120 significant disease-trait pairs with q ≤ 0.01. Diseases (blue circles) and traits (orange circles) are connected by gray lines (single connection between trait and disease) or red lines (one to a group of diseases or traits). T1-T7 indicate trait modules (light orange circles) connected to a disease or disease module by red lines. D1-D8 indicate disease modules (light blue circles) connected to a trait or trait module by red lines. This network was visualized by Cytoscape 2.6.0 (48) and the CyOog (49) plugin.
Figure 3
Figure 3. Three ways traits and diseases can temporally interrelate
Traits (i.e. risk factors) can manifest prior to disease, at the same time as disease diagnosis, or represent consequences occurring after diagnosis. Genetic variants were either directly observed in traits and diseases (solid edges) or indirectly observed or potentially influenced by a preceding trait or disease (dotted edges). Arrow direction indicates the timing of the interrelation.
Figure 4
Figure 4. Violin plots for clinical validations of five new findings
Violin plots (combination of boxplots and kernel density plots) for clinical validations of 5 new findings based on three independent cohorts from SHC, MSMC, and CUMC. Five new findings are MCV associated with ALL at SHC and MSMC (4A), MGN associated with GCA at MSMC and CUMC (4B), PSA associated with LCA at SHC and MSMC (4C), ALP associated with VTE at MSMC and CUMC (4D), and PLT counts associated with ADS at three centers (4E) tested within one year lab tested before our first diagnosis. In the black box plots, bold black lines boundaries indicate the 25th, 75th percentiles of lab values, and white center squares indicate the median value of lab values. The horizontal lines indicate reference ranges of lab values. The grey shapes indicate density of the number of samples. P-values are reported by Wilcoxon Sum Rank testing.

References

    1. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007 Jun 7;447:661. - PMC - PubMed
    1. Johnson AD, O'Donnell CJ. An open access database of genome-wide association results. BMC medical genetics. 2009;10:6. - PMC - PubMed
    1. Steinbrecher UP, Lougheed M. Scavenger receptor-independent stimulation of cholesterol esterification in macrophages by low density lipoprotein extracted from human aortic intima. Arteriosclerosis and thrombosis : a journal of vascular biology / American Heart Association. 1992 May;12:608. - PubMed
    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America. 2009 Jun 9;106:9362. - PMC - PubMed
    1. Li H, Lee Y, Chen JL, Rebman E, Li J, Lussier YA. Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory. Journal of the American Medical Informatics Association : JAMIA. 2012 Mar-Apr;19:295. - PMC - PubMed

Publication types