Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 3;105(4):763-772.
doi: 10.1016/j.ajhg.2019.08.012. Epub 2019 Sep 26.

Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies

Collaborators, Affiliations

Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies

Huaying Fang et al. Am J Hum Genet. .

Abstract

Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.

Keywords: biobank; ethnicity-specific trait loci; genetic ancestry; multi-ethnic cohort; self-reported race/ethnicity; stratified analysis; trans-ethnic GWAS.

PubMed Disclaimer

Conflict of interest statement

S.L.D. has received research grants from the following for-profit organizations in the last three years: AbbVie Inc., Anolinx LLC, Astellas Pharma Inc., AstraZeneca Pharmaceuticals LP, Boehringer Ingelheim International GmbH, Celgene Corporation, Eli Lilly and Company, Genentech Inc., Genomic Health, Inc., Gilead Sciences Inc., GlaxoSmithKline PLC, Innocrin Pharmaceuticals Inc., Janssen Pharmaceuticals, Inc., Kantar Health, Myriad Genetic Laboratories, Inc., Novartis International AG, and PAREXEL International Corporation through the University of Utah or Western Institute for Biomedical Research. S.M.D. has received research grant from RenalytixAI and CytoVas through the University of Pennsylvania.

Figures

Figure 1
Figure 1
Decision Tree for HARE Assignment For each individual, PL1PL2PLK denote the support vector machine predicted probabilities, arranged in decreasing order from the most likely stratum, L1, to the least likely stratum, LK. If the individual’s SIRE is not missing, PSIRE denotes the support vector machine predicted probability corresponding to SIRE; otherwise PSIRE is undefined. For analyses reported in this study, t1 = 40 and t2 = 20.
Figure 2
Figure 2
The First Two Principal Components of Genetically Inferred Ancestry and HARE Assignments for Individuals, whose SIRE Is Non-missing and Consistent across Records Colored points represent individuals whose HARE agrees with SIRE. Black points highlight individuals whose genetically inferred ancestry strongly disagrees with SIRE; subsequently HARE for these individuals is set to missing. All other MVP participants are denoted in gray. The gold triangle indicates a hypothetical individual whose HARE could be non-Hispanic European, Hispanic, or missing, depending on her SIRE. Shown are non-Hispanic white (A), non-Hispanic black (B), Hispanic (C), and non-Hispanic Asian (D).
Figure 3
Figure 3
The First Two Principal Components of Genetically Inferred Ancestry and HARE Assignments for Individuals, whose SIRE Is Missing or Inconsistent across Records Colored points represent individuals, whose HARE is assigned to one of the strata. Shown are non-Hispanic white (A), non-Hispanic black (B), Hispanic (C), and non-Hispanic Asian (D).
Figure 4
Figure 4
Simulation Results Comparing Statistical Power for Detecting Minority-Specific Causal Variants using Mega-analysis (x axis) versus Stratified Analysis (y axis) Black dots indicate causal SNPs predominantly occurring in non-Hispanic blacks; red triangles indicate causal SNPs predominantly occurring in Hispanics. Shown are (A) rare variants with MAF ≤ 0.01; (B) common variants with MAF ≥ 0.1. Causal SNPs detected by both methods with power of 0 or 1 are omitted. Comparison of power for fixed levels of genetic variance explained by the causal variants can be found in Figure S4.
Figure 5
Figure 5
Number of Genome-wide Significant Height Loci in Each HARE Group Due to the relatively small sample size of non-Hispanics Asians, no genomic region reached genome-wide significance in this group, and therefore this HARE group is not included. SNPs with p < 5 × 10−8 are considered significant; SNPs within 1 Mb are grouped into a single locus.

References

    1. Gaziano J.M., Concato J., Brophy M., Fiore L., Pyarajan S., Breeling J., Whitbourne S., Deen J., Shannon C., Humphries D. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 2016;70:214–223. - PubMed
    1. Falush D., Stephens M., Pritchard J.K. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol. Ecol. Notes. 2007;7:574–578. - PMC - PubMed
    1. Tang H., Peng J., Wang P., Risch N.J. Estimation of individual admixture: analytical and study design considerations. Genet. Epidemiol. 2005;28:289–301. - PubMed
    1. Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. - PubMed
    1. Coram M.A., Duan Q., Hoffmann T.J., Thornton T., Knowles J.W., Johnson N.A., Ochs-Balcom H.M., Donlon T.A., Martin L.W., Eaton C.B. Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations. Am. J. Hum. Genet. 2013;92:904–916. - PMC - PubMed

Publication types