Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(7):e40294.
doi: 10.1371/journal.pone.0040294. Epub 2012 Jul 11.

Limitations of the human reference genome for personalized genomics

Affiliations

Limitations of the human reference genome for personalized genomics

Jeffrey A Rosenfeld et al. PLoS One. 2012.

Abstract

Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: TS is an employee of Perkin Elmer. This does not alter the authors’ adherence to all the PLoS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Location of SNPs relative to the probed base in microarrays.
Histogram showing the number of SNPs in upstream and downstream positions relative to the probed SNP on the Illumina1 M array. The red line indicates the location of the probed SNP.
Figure 2
Figure 2. Problematic probes on microarrays.
The number of probes on the Affymetrix Axiom CEU (blue) and Illumina 2.5 M(red) arrays that are found to contain an un-probed SNP for sub-samples of the 1KGP SNPs.
Figure 3
Figure 3. LD block lengths.
The mean length of LD blocks as the number of genotyped markers increases.
Figure 4
Figure 4. The LD block structure of two genes for the HapMap data and the 1KGP data.
A. The BRCA1 gene using the HapMap data. B. The BRCA1 gene using the 1KGP data. C. The JAK2 gene using the HapMap data. D. The Jak2 gene using the 1KGP data.
Figure 5
Figure 5. Venn diagrams illustrating the overlap in SNP calls between the 1KGP and CG.
A. For the full call sets. B. For the matched set of 32 genomes.

References

    1. MacArthur DG, Tyler-Smith C. Loss-of-function variants in the genomes of healthy humans. Human molecular genetics. 2010;19:R125. - PMC - PubMed
    1. Li R, Li Y, Zheng H, Luo R, Zhu H, et al. Building the sequence map of the human pan-genome. Nature biotechnology. 2009;28:57–63. - PubMed
    1. Kidd JM, Sampas N, Antonacci F, Graves T, Fulton R, et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nature methods. 2010;7:365–371. - PMC - PubMed
    1. Durbin RM, Altshuler DL, Abecasis GR, Bentley DR, Chakravarti A, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Roach JC, Glusman G, Smit AFA, Huff CD, Hubley R, et al. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science. 2010;328:636–639. - PMC - PubMed

Publication types