Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 10:9:428.
doi: 10.1186/1471-2105-9-428.

SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access

Affiliations

SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access

Jorge Amigo et al. BMC Bioinformatics. .

Abstract

Background: In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics.

Results: We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 x 10(9) genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested.

Conclusion: In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of processes implemented in SPSmart. The underlying SPSmart processing engine is capable of dealing with virtually any database that contains genotypes grouped by populations. Any dataset is summarized into common populational statistical indexes, and then combined with dbSNP additional information in order to improve the online data browsing experience.
Figure 2
Figure 2
Finding discrepancies among databases. SNP rs6824418 data from Perlegen and HapMap indicating discrepant allele frequency estimates for populations EUA and CEU (European American and CEPH European respectively).
Figure 3
Figure 3
Comparing similar populations in different databases. SNP rs2789823 data from Perlegen and HapMap illustrating a fixed difference SNP that shows the degree of European:African admixture in the African American population sample of Perlegen (AFA) compared to the HapMap African population: the Yoruba of Ibadan, Nigeria (YRI).
Figure 4
Figure 4
Inspecting a chromosome region. Using a chromosome region search to find an alternative SNP marker with improved quality flanking sequence in the same linkage disequilibrium block (rs10012227 as a better substitute for rs4698702).

References

    1. The International HapMap Consortium A haplotype map of the human genome. Nature. 2005;437:1299–1320. doi: 10.1038/nature04226. - DOI - PMC - PubMed
    1. Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005;15:1592–1593. doi: 10.1101/gr.4413105. - DOI - PMC - PubMed
    1. Peacock E, Whiteley P. Perlegen sciences, inc. Pharmacogenomics. 2005;6:439–442. doi: 10.1517/14622416.6.4.439. - DOI - PubMed
    1. Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, et al. A human genome diversity cell line panel. Science. 2002;296:261–262. doi: 10.1126/science.296.5566.261b. - DOI - PubMed
    1. Rosenberg NA. Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet. 2006;70:841–847. doi: 10.1111/j.1469-1809.2006.00285.x. - DOI - PubMed

Publication types

LinkOut - more resources