Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 22;20(1):215.
doi: 10.1186/s13059-019-1838-5.

PGG.SNV: understanding the evolutionary and medical implications of human single nucleotide variations in diverse populations

Affiliations

PGG.SNV: understanding the evolutionary and medical implications of human single nucleotide variations in diverse populations

Chao Zhang et al. Genome Biol. .

Abstract

Despite the tremendous growth of the DNA sequencing data in the last decade, our understanding of the human genome is still in its infancy. To understand the implications of genetic variants in the light of population genetics and molecular evolution, we developed a database, PGG.SNV ( https://www.pggsnv.org ), which gives much higher weight to previously under-investigated indigenous populations in Asia. PGG.SNV archives 265 million SNVs across 220,147 present-day genomes and 1018 ancient genomes, including 1009 newly sequenced genomes, representing 977 global populations. Moreover, estimation of population genetic diversity and evolutionary parameters is available in PGG.SNV, a unique feature compared with other databases.

Keywords: Disease risk allele; Evolutionary conservation; Human diversity; Indigenous populations; Natural selection; Population genetics and genomics; Population prevalence; Single nucleotide variations; Variant annotation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Analysis framework for data generation, collection, integration, and annotation. The ellipsis in the right hexagonal represents other population genomics analyses that are not included in the current version of the database but would be performed in later versions
Fig. 2
Fig. 2
Comparison of the number of genomes and populations between PGG.SNV and other frequently used data sets. a Geographical distributions of the population samples included in PGG.SNV. Each dot represents an ethnic group, and each bar denotes the number of genomes of the corresponding ancestry. b A comparison of the numbers of genomes included in the 1000 Genomes Project (1KGP), Exome Sequencing Project (ESP), The Genome Aggregation Database (gnomAD), and PGG.SNV. Each color represents an ancestry that was used in a. c A comparison of the numbers of populations or ethnic groups included in different databases. PGG.SNV includes 852 present-day populations and 125 ancient populations which is defined based on geography and time period
Fig. 3
Fig. 3
Construction of PGG.SNV database. SQL, Structured Query Language; API, Application Programming Interface; App, Application
Fig. 4
Fig. 4
An example of the user-friendly method for visualization and accessing data. a Basic information for the selected SNV. Alt. Allele Frequency denotes the frequency of alternative alleles in the PGG.SNV database, with the alternative allele counts and total allele counts shown in brackets. The Modern Human Population Count represents the number of ethnic groups whose genomic data contain the selected variant in the PGG.SNV database. The Ancient Genome Count denotes the number of ancient genomic data sets that contain the selected variant in the PGG.SNV database. At the bottom of a, there are nine annotation cards for a selected variant. Users can switch them to visualize the corresponding annotation. b Allele frequencies of the variant across worldwide populations. The figure is interactive on the web, with an allele frequency pie chart of each population in a worldwide map where geographic locations represent the position of slices for corresponding populations. It has embedded mouse-scrolling events allowing the user to zoom in and out the resolution, a mouse-hovering event on a slice to get detailed information, and a figure- and table-switching event. c Custom pop-up windows for selecting populations, ancestries, and data sets. Note that the choices between population, ancestry, and data set buttons are related but not independent. d Allele frequencies of a variant in different data sets. e WeChat Quick Response (QR) code for access to the information including that in the PGG.SNV database. Users can scan the code and follow the PGGbase official account to access data via a smart phone
Fig. 5
Fig. 5
Prevalence and differentiation of Mendelian disease variants across populations. a Allele frequency spectrum of Mendelian-inherited disease variants. Mutations are grouped into five categories by their severity (see “Population and ancestry assignment”). b Rareness and population and ancestry differentiation of Mendelian-inherited disease variants. c An example of a pathogenic variant (rs78838117) that shows high DAF in some Southeast populations such as Bateq people (DAF = 0.33), Jakun people (DAF = 0.15), and Mendriq people (DAF = 0.125) in Malaysia. The corresponding PGG.SNV link is https://www.pggsnv.org/searchinfo.html?key=11-2930440-G-A. d An example of a pathogenic variant (rs41469351) that shows high DAF in some West Africa populations such as the Gambian people (DAF = 0.36). The corresponding PGG.SNV link is https://www.pggsnv.org/searchinfo.html?key=3-46412262-C-T. e An example of a variant (rs12917189) that shows large differentiation between Africans and non-Africans. The corresponding PGG.SNV link is https://www.pggsnv.org/searchinfo.html?key=15-43023482-T-C. f An example of a variant (rs10828415) that shows large differentiation between East Asian and other ancestral populations. The corresponding PGG.SNV link is https://www.pggsnv.org/searchinfo.html?key=10-23482850-G-A

References

    1. International HapMap C A haplotype map of the human genome. Nature. 2005;437:1299–1320. doi: 10.1038/nature04226. - DOI - PMC - PubMed
    1. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008;451:998–1003. doi: 10.1038/nature06742. - DOI - PubMed
    1. Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Rieder MJ, Altshuler D, Shendure J, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493:216–220. doi: 10.1038/nature11690. - DOI - PMC - PubMed
    1. Consortium TGP, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, Karthikeyan S, Iles L, Pollard MO, Choudhury A, et al. The African genome variation project shapes medical genetics in Africa. Nature. 2015;517:327–332. doi: 10.1038/nature13997. - DOI - PMC - PubMed

Publication types