Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 31;3(2):100257.
doi: 10.1016/j.xgen.2023.100257. eCollection 2023 Feb 8.

The Michigan Genomics Initiative: A biobank linking genotypes and electronic clinical records in Michigan Medicine patients

Affiliations

The Michigan Genomics Initiative: A biobank linking genotypes and electronic clinical records in Michigan Medicine patients

Matthew Zawistowski et al. Cell Genom. .

Abstract

Biobanks of linked clinical patient histories and biological samples are an efficient strategy to generate large cohorts for modern genetics research. Biobank recruitment varies by factors such as geographic catchment and sampling strategy, which affect biobank demographics and research utility. Here, we describe the Michigan Genomics Initiative (MGI), a single-health-system biobank currently consisting of >91,000 participants recruited primarily during surgical encounters at Michigan Medicine. The surgical enrollment results in a biobank enriched for many diseases and ideally suited for a disease genetics cohort. Compared with the much larger population-based UK Biobank, MGI has higher prevalence for nearly all diagnosis-code-based phenotypes and larger absolute case counts for many phenotypes. Genome-wide association study (GWAS) results replicate known findings, thereby validating the genetic and clinical data. Our results illustrate that opportunistic biobank sampling within single health systems provides a unique and complementary resource for exploring the genetics of complex diseases.

PubMed Disclaimer

Conflict of interest statement

G.R.A. and A.P. work for Regeneron Pharmaceuticals. C.J.W. took a position at Regeneron Pharmaceuticals after the initial submission of this manuscript.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of the Michigan Genomics Initiative (MGI) resource and analysis MGI currently consists of ∼91,000 participants recruited while seeking care at the Michigan Medicine health system. Recruitment is predominantly through the Department of Anesthesiology during inpatient surgical encounters. Participants agree to link a blood sample obtained during consent with their electronic health records for broad research purposes. Genotypes for ∼570,000 genetic variants are obtained from DNA extracted from the blood sample using a customized Illumina Infinium CoreExome-24 array. In this article, we describe the MGI “Freeze 3” cohort consisting of ∼57,000 samples having passed sample-level quality control filtering and imputed for >50 million variants using the TOPMed reference panel. We extracted all available International Classification of Disease (ICD) diagnosis codes from patient electronic health records and mapped to broader dichotomous phecode traits using the PheWAS software. We performed GWASs within a subset of ∼51,000 European-inferred samples from the Freeze 3 cohort using a linear mixed-effect regression model implemented in the SAIGE software. We report results and share GWAS summary statistics for 1,547 traits with ≥60 cases.
Figure 2
Figure 2
MGI recruitment, demographics, and clinical follow-up (A) MGI recruitment over time. The solid line is overall recruitment, and the dashed line is participants with self-reported race other than White. (B) Age and sex distribution of MGI participants. (C) Clinical follow-up time for MGI participants. Follow-up is the amount of time between a participant’s first and most recent diagnosis codes in the Michigan Medicine electronic health records (EHRs). (D) Distribution of ages for MGI participants is nearly identical across follow-up times.
Figure 3
Figure 3
MGI clinical data (A) Most common phecode traits among MGI participants. (B) Number of phecode case assignments per sample increases with participant age. (C) Number of phecode case assignments per sample increases with participant follow-up time. Outlier values were excluded from boxplots to improve readability.
Figure 4
Figure 4
Comparison of phecode case counts between MGI and UKB by disease category MGI has phecode traits with more cases than UKB across all disease categories.
Figure 5
Figure 5
Summary of genetically inferred ancestry and relatedness in MGI participants (A) Comparison of self-reported race/ethnicity and genetically inferred ancestry. MGI samples are projected in the principal component (PC) reference space created by worldwide samples from the Human Genome Diversity Project (HGDP). Each panel shows all MGI participants, with participants colored by the indicated self-reported race or ethnicity. (B) Unique genetically inferred familial configurations containing parent-offspring and full-sibling relationships among MGI participants. The numbers are the observed count for each configuration. (C) Comparison of TopMed and HRC imputation accuracy by inferred ancestry groups. TopMed provides more accurate imputation in all populations with notable gains among non-European participants.

References

    1. Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E., et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. doi: 10.1093/nar/gky1120. - DOI - PMC - PubMed
    1. Beesley L.J., Salvatore M., Fritsche L.G., Pandit A., Rao A., Brummett C., Willer C.J., Lisabeth L.D., Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: existing resources, statistical challenges, and potential opportunities. Stat. Med. 2020;39:773–800. doi: 10.1002/sim.8445. - DOI - PMC - PubMed
    1. Denny J.C., Ritchie M.D., Basford M.A., Pulley J.M., Bastarache L., Brown-Gentry K., Wang D., Masys D.R., Roden D.M., Crawford D.C. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics. 2010;26:1205–1210. doi: 10.1093/bioinformatics/btq126. - DOI - PMC - PubMed
    1. Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. Genome-wide genetic data on ∼500,000 UK Biobank participants. bioRxiv. 2017 doi: 10.1101/166298. Preprint at. - DOI
    1. Nagai A., Hirata M., Kamatani Y., Muto K., Matsuda K., Kiyohara Y., Ninomiya T., Tamakoshi A., Yamagata Z., Mushiroda T., et al. Overview of the BioBank Japan project: study design and profile. J. Epidemiol. 2017;27:S2–S8. doi: 10.1016/j.je.2016.12.005. - DOI - PMC - PubMed

LinkOut - more resources