Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 30;9(1):3522.
doi: 10.1038/s41467-018-05624-4.

A study paradigm integrating prospective epidemiologic cohorts and electronic health records to identify disease biomarkers

Affiliations

A study paradigm integrating prospective epidemiologic cohorts and electronic health records to identify disease biomarkers

Jonathan D Mosley et al. Nat Commun. .

Abstract

Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. A false discovery rate (FDR)-based significance threshold reveals more known and undescribed associations across a broad range of biomarkers, including biometric measures, plasma proteins and metabolites, functional assays, and behaviors. We confirm an inverse association between LDL-cholesterol level and septicemia risk in an independent epidemiological cohort. This approach efficiently discovers biomarker-disease associations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Overview. a Overview of the study design. Bayesian sparse linear mixed modelling (BSLMM) was used to compute SNP weights for 53 biomarkers from the ARIC study. These weights were used to compute genetically predicted biomarkers in the EHR data set and phenome wide scanning (PheWAS) was used to identify clinical phenotypes associated with the genetically predicted biomarker. b Circos plot showing the 116 significant associations (Bonferroni p < 0.05) between the genetic predictors of the ARIC biomarkers and pheWAS phenotypes. Associations are denoted by lines. Coloring is used to highlight similar groups of biomarkers and pheWAS phenotypes
Fig. 2
Fig. 2
Associations with positive controls. Positive control biomarker-phenotype pairs were identified a priori for 42 ARIC biomarkers. The histogram quantifies the percentage of pairs with Bonferroni p < 0.05, rank order value ≤ 5, false discovery rate (FDR) q < 0.1 or not seen by any of the criteria. Some pairs may fall into multiple categories
Fig. 3
Fig. 3
Comparison of an FDR versus Bonferroni p value selection threshold. a Scatter plot summarizing pheWAS analyses for a genetic predictor of systolic blood pressure (SBP). Each point indicates a logistic regression association analysis, adjusted for birth decade, sex, and 3 PCs, between genetically predicted waist circumference and a pheWAS phenotype. Odds ratios are per standard deviation increase in the genetic predictor. Blue and green colored circles denote associations that are significant at Bonferroni p < 0.05 and FDR q < 0.1, respectively. Only selected points are labelled for clarity. b Count of the number of associations, binned by disease, meeting a Bonferroni, and FDR selection thresholds. c PheWAS associations for a waist circumference genetic predictor and d count of disease associations significant by Bonferroni or FDR criteria. e Frequency histogram of the skewness (see Methods for calculations) of the pheWAS beta coefficients for each of the 53 biomarkers. The red arrow points to the value for waist circumference. HTN: hypertension; PVD: peripheral vascular disease; T2D: type 2 diabetes
Fig. 4
Fig. 4
Associations for selected biomarkers. Scatter plots summarizing pheWAS analyses for genetic predictors of a triglyceride levels, b pack-years of smoking, c serum magnesium levels, and d serum Von Willebrand factor levels. Odds ratios are from logistic regression analyses, adjusting for birth decade, sex, and 3 PCs. Blue and green colored circles denote associations that are significant at Bonferroni p < 0.05 and FDR q < 0.1, respectively. PVD: peripheral vascular disease; AAA: abdominal aortic aneurysm; IHD: ischemic heart disease; DVT: deep vein thrombosis; PE: pulmonary embolism; GI: gastrointestinal
Fig. 5
Fig. 5
Associations with LDL cholesterol (LDL-C). a Scatter plot summarizing pheWAS analyses for a genetic predictor of LDL-C. Blue and green colored circles denote associations that are significant at Bonferroni p < 0.05 and FDR q < 0.1, respectively. b Association analysis between the LDL genetic predictor and the PheWAS septicemia phenotype, stratified by type 2 diabetes (T2D) status. Error bars represent 95% confidence intervals of odds-ratio estimates. c Epidemiological association between the Low (LDL-C < 60 mg/dl) versus Normal LDL-C (between 90 and 130 mg/dl) and septicemia, stratified by T2D status, using an independent EHR cohort. Odds-ratios were determined by multivariable logistic regression adjusting for age, gender and race and stratified by T2D status. Error bars represent 95% confidence intervals. T2D type 2 diabetes

Similar articles

Cited by

References

    1. Hlatky MA, et al. Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation. 2009;119:2408–2416. doi: 10.1161/CIRCULATIONAHA.109.192278. - DOI - PMC - PubMed
    1. Denny JC. Chapter 13 mining electronic health records in the genomics era. PLoS Comput. Biol. 2012;8:e1002823. doi: 10.1371/journal.pcbi.1002823. - DOI - PMC - PubMed
    1. Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: toward better research applications and clinical care. Nat. Rev. Genet. 2012;13:395–405. doi: 10.1038/nrg3208. - DOI - PubMed
    1. Denny JC, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210. doi: 10.1093/bioinformatics/btq126. - DOI - PMC - PubMed
    1. Denny JC, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 2013;31:1102–1110. doi: 10.1038/nbt.2749. - DOI - PMC - PubMed

Publication types

Grants and funding

LinkOut - more resources