Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 9;14(1):104.
doi: 10.1186/s13073-022-01106-x.

Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative

Affiliations

Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative

Ruth Johnson et al. Genome Med. .

Erratum in

Abstract

Background: Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative-an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736).

Methods: We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes.

Results: We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals' SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10-16, EAA p-value=6.73×10-11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group.

Conclusions: Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.

Keywords: Biobank; Electronic health records; Genetic ancestry; Genome-wide association studies; Phenome-wide association studies.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Self-identified race/ethnicity (SIRE) and genetically inferred ancestry (GIA) are not analogous. We show a Sankey diagram visualizing the sample size breakdown of individuals in each genetically inferred ancestry group and SIRE groups for all individuals in ATLAS (N = 36,736)
Fig. 2
Fig. 2
Global PCA reflects self-identified race/ethnicity and language of ATLAS participants. A Genetic PCs 1 and 2 of individuals in ATLAS (N=36,736) shaded by continental GIA as inferred from 1000 Genomes. B, C The first two genetic PCs of the ATLAS participants shaded by SIRE and preferred language, respectively. To improve visualization in C, only languages with >10 responses were assigned a color
Fig. 3
Fig. 3
PCA of individuals with inferred East Asian American, European American, and Hispanic Latino American genetic ancestry in ATLAS captures fine-scale subcontinental ancestry groupings. PCA was performed separately within each continental GIA in ATLAS with the corresponding subcontinental ancestry samples from 1000 Genomes: A East Asian American, B European American, C Hispanic Latino American. Cluster annotation labels were determined using a combination of samples from 1000 Genomes and self-identified race, ethnicity, and language information from the EHR
Fig. 4
Fig. 4
IBD sharing between ATLAS participants. InfoMap community membership is indicated by color for all communities with >100 individuals (20 communities total) and individuals with a degree >30. Community membership indicates elevated shared IBD within that community. Community identity is labeled adjacent to the network plot in the corresponding color
Fig. 5
Fig. 5
Disease associations vary across continental genetically inferred ancestry groups in ATLAS. We show the odds ratio computed from associating each phenotype with individuals’ genetically inferred ancestry in ATLAS (N=36,736) under a logistic regression model. Error bars represent 95% confidence intervals
Fig. 6
Fig. 6
Global ancestry correlates with disease prevalence in admixed individuals. Individuals by SIRE who have had a diagnosis of A chronic nonalcoholic liver disease, B uterine leiomyoma, or C liver/intrahepatic bile duct cancer are binned by their proportions of either European, African, Native American, or East Asian ancestry estimated using ADMIXTURE. Within each bin, we plot the prevalence of the diagnoses and provide standard errors (+/− 1.96 SE) of the computed frequencies
Fig. 7
Fig. 7
Recapitulating known associations for chronic nonalcoholic liver disease in ancestry-specific and multi-ancestry meta-analyses in ATLAS. GWAS Manhattan plots for chronic nonalcoholic liver disease in the A European American, B Hispanic Latino American, C African American, D East Asian American GIA groups in ATLAS, and E the meta-analysis performed across all 4 GIA groups. The red dashed line denotes genome-wide significance (p-value < 5×10-8). We recapitulate a known association at the 22q13.31 locus
Fig. 8
Fig. 8
Identifying correlated phenotypes at rs2294915 in both the Hispanic Latino American and European American GIA groups in ATLAS. We show a PheWAS plot at rs2294915 for the Hispanic Latino American (top) and European American (bottom) GIA groups. The red dashed line denotes p-value=4.09×10−5, the significance threshold after adjusting for the number of tested phenotypes. The red dotted line denotes the significance threshold after correcting for both genome-wide significance and the number of tested phenotypes (p-value=4.09×10−11)

References

    1. Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nat Rev Genet. 2020;21(8):493–502. doi: 10.1038/s41576-020-0224-1. - DOI - PubMed
    1. Morley TJ, Han L, Castro VM, Morra J, Perlis RH, Cox NJ, et al. Phenotypic signatures in clinical data enable systematic identification of patients for genetic testing. Nat Med. 2021;27(6):1097–1104. doi: 10.1038/s41591-021-01356-z. - DOI - PMC - PubMed
    1. Bastarache L, Hughey JJ, Hebbring S, Marlo J, Zhao W, Ho WT, et al. Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. Science. 2018;359(6381):1233–1239. doi: 10.1126/science.aal4043. - DOI - PMC - PubMed
    1. Abul-Husn NS, Kenny EE. Personalized medicine and the power of electronic health records. Cell. 2019;177:58–69. doi: 10.1016/j.cell.2019.02.039. - DOI - PMC - PubMed
    1. Svensson CK. Representation of American blacks in clinical trials of new drugs. JAMA. 1989;261(2):263–265. doi: 10.1001/jama.1989.03420020117041. - DOI - PubMed

Publication types