Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 5:13:375.
doi: 10.1186/1471-2164-13-375.

Pacific biosciences sequencing technology for genotyping and variation discovery in human data

Affiliations

Pacific biosciences sequencing technology for genotyping and variation discovery in human data

Mauricio O Carneiro et al. BMC Genomics. .

Abstract

Background: Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects.

Results: We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis.

Conclusion: Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Characterization of Pacific Biosciences data.a) Base error mode rate for deletions, insertions and mismatches. b) Length distribution of reads in the Pacific Biosciences discovery dataset (here some raw reads are as long as 5,000 bases). c) Pacific Biosciences error rate by position. Shown are all errors (mismatch, insertion and deletion) by base position, including every base sequenced despite any previously known variation (this is why the average is slightly higher than 15%). Due to the diminishing number of reads with bases beyond 1000 we only plot here positions up to 1000. d-f) GC bias of the Pacific Biosciences instrument represented by the genomes of P. falciparum (low GC), E. coli (average GC) and R. sphaeroides (high GC) shows good balance in GC coverage where there is sufficient data in the genome, regardless of GC content.
Figure 2
Figure 2
Error profile of Pacific Biosciences data.a) A chart showing the number of observations of the alternate allele in all heterozygous sites and how reference bias pulls the median significantly below the expected 0.5. This combination creates multiple possible alignments with the highest alignment score, allowing the aligner in some cases to hide the true alternate allele inside an insertion to maximize the alignment score at the cost of reference bias. b) IGV browser (http://www.broadinstitute.org/igv/) screenshot of the validation dataset showing an example of a case of aligner-created reference bias on Pacific Biosciences RS data. The true SNPs (C) are correctly called in individual reads. c) An IGV browser[18,19] screen snapshot of a region in the discovery dataset where Illumina HiSeq data suffers from context specific errors that makes it appear as a true heterozygous site whereas Pacific Biosciences RS data (with errors nearly random, though more frequent) clearly shows no event in this region.

References

    1. Durbin RM, Altshuler DL, Durbin RM. et al.A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. - DOI - PMC - PubMed
    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TFC, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. - DOI - PMC - PubMed
    1. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. doi: 10.1038/nature08250. - DOI - PMC - PubMed
    1. Musunuru K, Pirruccello JP, Do R, Peloso GM, Guiducci C, Sougnez C, Garimella KV, Fisher S, Abreu J, Barry AJ, Fennell T, Banks E, Ambrogio L, Cibulskis K, Kernytsky A, Gonzalez E, Rudzicz N, Engert JC, DePristo MA, Daly MJ, Cohen JC, Hobbs HH, Altshuler D, Schonfeld G, Gabriel SB, Yue P, Kathiresan S. Exome sequencing, ANGPTL3 mutations, and familial combined hypolipidemia. N Engl J Med. 2010;363:2220–2227. doi: 10.1056/NEJMoa1002926. - DOI - PMC - PubMed
    1. Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet. 2010;19:R145–51. doi: 10.1093/hmg/ddq333. - DOI - PMC - PubMed

Publication types

LinkOut - more resources