Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 7;114(10):E1923-E1932.
doi: 10.1073/pnas.1618065114. Epub 2017 Feb 21.

PEMapper and PECaller provide a simplified approach to whole-genome sequencing

Collaborators, Affiliations

PEMapper and PECaller provide a simplified approach to whole-genome sequencing

H Richard Johnston et al. Proc Natl Acad Sci U S A. .

Abstract

The analysis of human whole-genome sequencing data presents significant computational challenges. The sheer size of datasets places an enormous burden on computational, disk array, and network resources. Here, we present an integrated computational package, PEMapper/PECaller, that was designed specifically to minimize the burden on networks and disk arrays, create output files that are minimal in size, and run in a highly computationally efficient way, with the single goal of enabling whole-genome sequencing at scale. In addition to improved computational efficiency, we implement a statistical framework that allows for a base by base error model, allowing this package to perform as well or better than the widely used Genome Analysis Toolkit (GATK) in all key measures of performance on human whole-genome sequences.

Keywords: GATK; SNP calling; genome sequencing; sequence mapping; software.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Theta across all 97 samples based on the calls from PEMapper/PECaller, GATK PASS, and GATK Tranche99.9. PEMapper/PECaller and GATK PASS samples sit between 0.00075 and 0.0009 variants per base as expected. Tranche99.9 calls are much lower.
Fig. 2.
Fig. 2.
Comparison of Ts/Tv ratios for PEMapper/Caller, GATK PASS, and GATK Tranche99.9 called variants. PEMapper/PECaller and GATK PASS are virtually identical at near 2.04 and 2.05 per sample, respectively, indicating excellent quality calls. GATK Tranche99.9 is much lower, between 1.3 and 1.5 per sample, indicating much lower-quality calls.
Fig. 3.
Fig. 3.
Theta in all sample exomes based on PEMapper/PECaller, GATK PASS, and Tranche99.9 calls. GATK PASS and PEMapper/PECaller samples are near 0.00045 as expected, with PEMapper/PECaller calling slightly more variants.
Fig. 4.
Fig. 4.
Ts/Tv ratio across all sample exomes based on PEMapper/PECaller, GATK PASS, and Tranche99.9 calls. All samples called by PEMapper/PECaller and GATK PASS are near three as expected. Tranche99.9 calls are much lower again.
Fig. 5.
Fig. 5.
Silent to replacement (S/R) ratio across all sample exomes based on PEMapper/PECaller, GATK PASS, and Tranche99.9 calls. All samples called by PEMapper/PECaller and GATK PASS are between 1.05 and 1.15 as expected. Again, Tranche99.9 calls are significantly lower.

Comment in

References

    1. Bainbridge MN, et al. Whole-genome sequencing for optimized patient management. Sci Trans Med. 2011;3(87):87re83. - PMC - PubMed
    1. Saunders CJ, et al. Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Sci Trans Med. 2012;4(154):154ra135. - PMC - PubMed
    1. Levy SE, Myers RM. Advancements in next-generation sequencing. Annu Rev Genomics Hum Genet. 2016;17:95–115. - PubMed
    1. Stavropoulos DJ, et al. Whole-genome sequencing expands diagnostic utility and improves clinical management in paediatric medicine. NPJ Genomic Med. 2016;1:15012. - PMC - PubMed
    1. Muir P, et al. The real cost of sequencing: Scaling computation to keep pace with data generation. Genome Biol. 2016;17(1):53. - PMC - PubMed

Publication types

LinkOut - more resources