Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May;21(5):554-62.
doi: 10.1038/ejhg.2012.258. Epub 2012 Dec 5.

Genetic ancestry inference using support vector machines, and the active emergence of a unique American population

Affiliations

Genetic ancestry inference using support vector machines, and the active emergence of a unique American population

Ryan J Haasl et al. Eur J Hum Genet. 2013 May.

Erratum in

  • Eur J Hum Genet. 2013 May;21(5):578

Abstract

We use genotype data from the Marshfield Clinical Research Foundation Personalized Medicine Research Project to investigate genetic similarity and divergence between Europeans and the sampled population of European Americans in Central Wisconsin, USA. To infer recent genetic ancestry of the sampled Wisconsinites, we train support vector machines (SVMs) on the positions of Europeans along top principal components (PCs). Our SVM models partition continent-wide European genetic variance into eight regional classes, which is an improvement over the geographically broader categories of recent ancestry reported by personal genomics companies. After correcting for misclassification error associated with the SVMs (<10%, in all cases), we observe a >14% discrepancy between insular ancestries reported by Wisconsinites and those inferred by SVM. Values of FST as well as Mantel tests for correlation between genetic and European geographic distances indicate minimal divergence between Europe and the local Wisconsin population. However, we find that individuals from the Wisconsin sample show greater dispersion along higher-order PCs than individuals from Europe. Hypothesizing that this pattern is characteristic of nascent divergence, we run computer simulations that mimic the recent peopling of Wisconsin. Simulations corroborate the pattern in higher-order PCs, demonstrate its transient nature, and show that admixture accelerates the rate of divergence between the admixed population and its parental sources relative to drift alone. Together, empirical and simulation results suggest that genetic divergence between European source populations and European Americans in Central Wisconsin is subtle but already under way.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) The resolution of ancestry inference in a local US population is dependent on the level of admixture in the population. For example, in a population of European Americans, the resolution of ancestry inference might range from continent (low resolution), when the local population is panmictic, to country (high resolution), when there is little or no admixture within the local population. (b) Initially, sink (US) and source (European) populations are coincident in genetic space as determined by methods such as PCA. (c) After some period of time, sink and source populations grow divergent because of genetic drift and admixture; the actual pattern of divergence in genetic space will vary depending on, at least, the level of admixture within the sink population.
Figure 2
Figure 2
Empirical PCA results. (a) The PC1/PC2 biplot from combined PCA corroborates the Northern European ancestry of most European-American participants of the PMRP. Shown are Europeans from POPRES (filled circles, black for individuals from one of the nine countries reported in the PMRP sample, gray circles for others), and Wisconsinites (open, magenta circles). Abbreviations mark the mean position of POPRESEurope ancestries (IRE: Ireland; FIN: Finland; POL: Poland; SPA/POR: Spain+Portugal; ITA: Italy). (b) Scores for PCs 1–4 from combined PCA are equally variable in Europeans (black bars) and Wisconsinites (magenta bars); however, Wisconsinites show visibly greater variation along PCs 5–10. (c) PMRPabridged individuals (open, red circles) projected onto PCs 1 and 2 computed from POPRESEurope data only. (d) The same as (c) after PMRP PC scores were corrected for projection bias following Lee et al.
Figure 3
Figure 3
Multiclass SVM models trained on PC1 and PC2 scores from combined and projection PCA. (a) SVM trained on combined PCA results. The underlying contour map roughly outlines the decision boundaries for the eight regional classes. Data points are colored according to the class of the training datum. Filled circles are support vectors. (b) Positions of PMRPinsular-abridged individuals claiming German insular ancestry superimposed on the SVM based on combined PCA. (c) SVM trained on projection PCA results. (d) Positions of PMRPunabridged individuals claiming German insular ancestry superimposed on the SVM based on projection PCA.
Figure 4
Figure 4
Simulation PCA results (admixture plus drift). (a) PC1/PC2 biplot of source (black) and sink (magenta) populations immediately after the founding of the sink population. At this point, sink and source populations are largely coincident with differences likely due to founder effects. (b) PC1/PC2 biplot of source and sink populations 75 generations after colonization. Source and sink populations are divergent, forming discrete clusters of individuals. (c) PC6 through time. PC6 scores from sink and source individuals are equally variable upon founding of the sink population. At 5, 25, and 50 generations after colonization, however, the sink population shows much greater variation, reminiscent of patterns in Figure 2b. By 75 generations after colonization, sink and source populations once again show equal variation along PC6.

Similar articles

Cited by

References

    1. Smith MW, Patterson N, Lautenberger JA, et al. A high-density admixture map for disease gene discovery in African Americans. Am J Hum Genet. 2004;74:1001–1013. - PMC - PubMed
    1. Lind JM, Hutcheson-Dilks HB, Williams SM, et al. Elevated male European and female African contributions to the genomes of African American individuals. Hum Genet. 2007;120:713–722. - PubMed
    1. Basu A, Tang H, Zhu X, et al. Genome-wide distribution of ancestry in Mexican Americans. Hum Genet. 2008;124:207–214. - PMC - PubMed
    1. Via M, Gignoux CR, Roth LA, et al. History shaped the geographic distribution of genomic admixture on the island of Puerto Rico. PLoS One. 2001;6:e16513. - PMC - PubMed
    1. Sloan CD, Andrew AD, Duell EJ, et al. Genetic population structure analysis in New Hampshire reveals Eastern European ancestry. PLoS One. 2009;4:e6928. - PMC - PubMed

Publication types