Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 4:4:127.
doi: 10.3389/fgene.2013.00127. eCollection 2013.

Principal component analysis reveals the 1000 Genomes Project does not sufficiently cover the human genetic diversity in Asia

Affiliations

Principal component analysis reveals the 1000 Genomes Project does not sufficiently cover the human genetic diversity in Asia

Dongsheng Lu et al. Front Genet. .

Abstract

The 1000 Genomes Project (1KG) aims to provide a comprehensive resource on human genetic variations. With an effort of sequencing 2,500 individuals, 1KG is expected to cover the majority of the human genetic diversities worldwide. In this study, using analysis of population structure based on genome-wide single nucleotide polymorphisms (SNPs) data, we examined and evaluated the coverage of genetic diversity of 1KG samples with the available genome-wide SNP data of 3,831 individuals representing 140 population samples worldwide. We developed a method to quantitatively measure and evaluate the genetic diversity revealed by population structure analysis. Our results showed that the 1KG does not have sufficient coverage of the human genetic diversity in Asia, especially in Southeast Asia. We suggested a good coverage of Southeast Asian populations be considered in 1KG or a regional effort be initialized to provide a more comprehensive characterization of the human genetic diversity in Asia, which is important for both evolutionary and medical studies in the future.

Keywords: 1000 Genomes Project; Human Genome Diversity Project; Pan-Asian SNP Project; human genetic diversity; population structure; principal component analysis; single nucleotide polymorphisms.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
The geographic location of each population. The figure on the left is the documented geographic locations of all the 140 populations used in our analysis, while the figure on the right is the magnified version in East Asians and Southeast Asians.
FIGURE 2
FIGURE 2
(A) The PCA result of all the 140 populations worldwide from 1KG, HGDP, and PASNP without sampling. x-Axis denotes the value of PC1, while y-axis denotes the value of PC2, with each dot in the figure representing one individual. The color for individuals from 1KG, HGDP, and PASNP are red, sea green, and green, respectively. (B) The first two components of PCA result based on randomly selected individuals belonging to Africa, Europe, America, East Asia, and Southeast Asia in the 1KG or HGDP and PASNP have larger sample size in the corresponding geographic area. The colors for each dataset are the same as (A). (C). The PCA result using individuals from CHB, CEU, Oceania, West Asia, Central and South Asia (CenSouthAsia), and Southeast Asia. (D). The PCA result when pooling individuals from CHB, CHS, CDX, KHV, and individuals from Thailand except for TH-MA and TH-TN.
FIGURE 3
FIGURE 3
The first two components of the PCA result with sampling approach based on different contexts: worldwide population context (A). East Asian and Pacific islander context (B). East Asian and Southeast Asian contexts (C). and Southeast Asian context (D). The individuals with red color are from 1KG and those with sea green color are from the other two datasets.
FIGURE 4
FIGURE 4
The value of Dd as a function of m for populations from Southeast Asia (A), Europe (B), and Native America (C) based on 100 PCAs using sampling approach under worldwide populations context, bars indicating the standard deviation (SD) of Dd. The boxplot of the value of Dd derived from 1000 PCAs using sampling approach under different population contexts, with m = 10 (D) and m = 1 (E). The nine population contexts are worldwide (WW), non-Africans (NA), Eurasians and Oceanians (EAO), Asians and Oceanians (AO), non-Western Asians and Oceanians (NWAO), East Asians and Pacific islanders (EPI), East Asians and Southeast Asians (ESEA), Southeast Asians and Oceanians (SEAO), and Southeast Asians (SEA).

References

    1. Depristo M. A., Banks E., Poplin P., Garimella K. V., Maguire J. R., Hartl C., et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43 491–49810.1038/ng.806 - DOI - PMC - PubMed
    1. Holm H., Gudbjartsson D. F., Sulem P., Masson G., Helgadottir H. T., Zanon C., et al. (2011) A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat. Genet. 43 316–32010.1038/ng.781 - DOI - PMC - PubMed
    1. Li J. Z., Absher D. M., Tang H., Southwick A. M., Casto A. M., Ramachandran S., et al. (2008) Worldwide human relationships inferred from genome-wide patterns of variation. Science 319 1100–110410.1126/science.1153717 - DOI - PubMed
    1. McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20 1297–130310.1101/gr.107524.110 - DOI - PMC - PubMed
    1. Pasaniuc B., Rohland N., McLaren P. J., Garimella K., Zaitlen N., Li H., et al. (2012) Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat. Genet. 44 631–63510.1038/ng.2283 - DOI - PMC - PubMed

LinkOut - more resources