Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 4:17:325.
doi: 10.1186/s12864-016-2654-x.

Selecting SNPs informative for African, American Indian and European Ancestry: application to the Family Investigation of Nephropathy and Diabetes (FIND)

Affiliations

Selecting SNPs informative for African, American Indian and European Ancestry: application to the Family Investigation of Nephropathy and Diabetes (FIND)

Robert C Williams et al. BMC Genomics. .

Abstract

Background: The presence of population structure in a sample may confound the search for important genetic loci associated with disease. Our four samples in the Family Investigation of Nephropathy and Diabetes (FIND), European Americans, Mexican Americans, African Americans, and American Indians are part of a genome- wide association study in which population structure might be particularly important. We therefore decided to study in detail one component of this, individual genetic ancestry (IGA). From SNPs present on the Affymetrix 6.0 Human SNP array, we identified 3 sets of ancestry informative markers (AIMs), each maximized for the information in one the three contrasts among ancestral populations: Europeans (HAPMAP, CEU), Africans (HAPMAP, YRI and LWK), and Native Americans (full heritage Pima Indians). We estimate IGA and present an algorithm for their standard errors, compare IGA to principal components, emphasize the importance of balancing information in the ancestry informative markers (AIMs), and test the association of IGA with diabetic nephropathy in the combined sample.

Results: A fixed parental allele maximum likelihood algorithm was applied to the FIND to estimate IGA in four samples: 869 American Indians; 1385 African Americans; 1451 Mexican Americans; and 826 European Americans. When the information in the AIMs is unbalanced, the estimates are incorrect with large error. Individual genetic admixture is highly correlated with principle components for capturing population structure. It takes ~700 SNPs to reduce the average standard error of individual admixture below 0.01. When the samples are combined, the resulting population structure creates associations between IGA and diabetic nephropathy.

Conclusions: The identified set of AIMs, which include American Indian parental allele frequencies, may be particularly useful for estimating genetic admixture in populations from the Americas. Failure to balance information in maximum likelihood, poly-ancestry models creates biased estimates of individual admixture with large error. This also occurs when estimating IGA using the Bayesian clustering method as implemented in the program STRUCTURE. Odds ratios for the associations of IGA with disease are consistent with what is known about the incidence and prevalence of diabetic nephropathy in these populations.

Keywords: Diabetic nephropathy; Individual genetic ancestry; Population structure; SNP.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Information contrasts. For a 3-ancestral population model there are three information contrasts that are represented by the absolute value of the difference of the respective allele frequencies for allele 1 of the SNP: |P1-P2|, |P1-P3|, and |P2-P3|, a value that is usually given the symbol δ. The variable In is the information-for-assignment statistic. Accurate individual ancestry estimates depend upon balancing the information between these 3 contrasts
Fig. 2
Fig. 2
Mean ancestry when estimated with three sets of SNPs, each set maximized for information in one contrast. Each of the ancestral populations was modeled by samples from HapMap or from the Pima Indian GWAS. Three sets of SNPs were each maximized for information in one of the three contrasts and then used to estimate the respective mean ancestry (CEU, European (EU); LWK and YRI, African (AF); Pima, American Indian (AI)) in each sample, with the expectation of a mean of 1.0. When the ancestry of the sample was not represented in the maximized contrast set, then the estimates of individual ancestry become unstable with large error
Fig. 3
Fig. 3
Mean heritage for persons who self-identify in the FIND study. Legend: Mean estimates are presented for the three components of individual ancestry in the FIND samples. For European Americans, American Indians, and African Americans the expected largest component is >0.8, while for Mexican Americans the European and American Indian components are similar. EU: European Ancestry; AI: American Indian Ancestry; AF: African Ancestry
Fig. 4
Fig. 4
Mean standard error of individual heritage estimates in four FIND samples by number of SNP Loci. The mean standard error of the individual ancestry estimates was calculated across the 4 FIND samples at 1300 points, adding each successive SNP to the calculation in chromosome and position order (EU, dotted line; AI, dashed line; AF, solid line). After the addition of about 200 informative SNPs, the standard error falls below 0.02 and decreases further at a slower rate with each additional locus. It takes approximately 700 SNPs in the estimates to have a mean standard error <0.01
Fig. 5
Fig. 5
Estimates of individual heritage for the FIND Mexican American sample with and without the Pima genotypes. Panel a has the estimates from STRUCTURE while using the 1300 genotypes from the Pima, CEU, LWK, and YRI samples. These are very similar to the estimates obtained from the maximum likelihood method that is presented in Panel c. When the Pima genotypes were removed from the STRUCTURE analysis, the amount of American Indian ancestry was overestimated in the Mexican sample in Panel b. It is recommend that, in the latter situation, maximum likelihood returns the better estimates of individual heritage

Similar articles

Cited by

References

    1. Knowler WC, Coresh J, Elston RC, Freedman BI, Iyengar SK, Kimmel PL, et al. The family investigation of nephropathy and diabetes (FIND) Design and Methods. J Diabetes Complicat. 2005;19:1–9. doi: 10.1016/j.jdiacomp.2003.12.007. - DOI - PubMed
    1. Iyengar SK, Abboud HE, Goddard KA, Saad MF, Adler SG, Arar NH, Bowden DW, Family Investigation of Nephropathy and Diabetes Research Group et al. Genome-wide scans for diabetic nephropathy and albuminuria in multiethnic populations: the family investigation of nephropathy and diabetes (FIND) Diabetes. 2007;56:1577–85. doi: 10.2337/db06-1154. - DOI - PubMed
    1. Kao WH, Klag MJ, Meoni LA, Reich D, Berthier-Schaad Y, Li M, Family Investigation of Nephropathy and Diabetes Research Group et al. MYH9 is associated with nondiabetic and end-stage renal disease in African Americans. Nat Genet. 2008;40:1185–92. doi: 10.1038/ng.232. - DOI - PMC - PubMed
    1. Iyengar S, Sedor JR, Freedman BI, Kao WHL, Kretzler M, Keller BJ, et al. Genome-wide association and trans-ethnic meta-analysis for advanced diabetic kidney disease: Family Investigation of Nephropathy and Diabetes (FIND) PLoS Genet. 2015;11(8):e1005352. doi: 10.1371/journal.pgen.1005352. - DOI - PMC - PubMed
    1. Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, Boehnke M. Genome-wide association studies in diverse populations. Nat Rev Genet. 2010;11:356–66. doi: 10.1038/nrg2760. - DOI - PMC - PubMed

MeSH terms

Substances