Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 2;8(1):24.
doi: 10.1186/s13073-016-0269-0.

Medical implications of technical accuracy in genome sequencing

Affiliations

Medical implications of technical accuracy in genome sequencing

Rachel L Goldfeder et al. Genome Med. .

Abstract

Background: As whole exome sequencing (WES) and whole genome sequencing (WGS) transition from research tools to clinical diagnostic tests, it is increasingly critical for sequencing methods and analysis pipelines to be technically accurate. The Genome in a Bottle Consortium has recently published a set of benchmark SNV, indel, and homozygous reference genotypes for the pilot whole genome NIST Reference Material based on the NA12878 genome.

Methods: We examine the relationship between human genome complexity and genes/variants reported to be associated with human disease. Specifically, we map regions of medical relevance to benchmark regions of high or low confidence. We use benchmark data to assess the sensitivity and positive predictive value of two representative sequencing pipelines for specific classes of variation.

Results: We observe that the accuracy of a variant call depends on the genomic region, variant type, and read depth, and varies by analytical pipeline. We find that most false negative WGS calls result from filtering while most false negative WES variants relate to poor coverage. We find that only 74.6% of the exonic bases in ClinVar and OMIM genes and 82.1% of the exonic bases in ACMG-reportable genes are found in high-confidence regions. Only 990 genes in the genome are found entirely within high-confidence regions while 593 of 3,300 ClinVar/OMIM genes have less than 50% of their total exonic base pairs in high-confidence regions. We find greater than 77 % of the pathogenic or likely pathogenic SNVs currently in ClinVar fall within high-confidence regions. We identify sites that are prone to sequencing errors, including thousands present in publicly available variant databases. Finally, we examine the clinical impact of mandatory reporting of secondary findings, highlighting a false positive variant found in BRCA2.

Conclusions: Together, these data illustrate the importance of appropriate use and continued improvement of technical benchmarks to ensure accurate and judicious interpretation of next-generation DNA sequencing results in the clinical setting.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Complexity of the Genome. a The genome consists of several (overlapping) regions. Eighty-six percent of 35 bp sequences and 95 % of 100 bp sequences are unique to one location in the reference genome. b A total of 50.6 % of the non-N reference genome falls into a repeat (data from RepeatMasker). c There is great variation in exon count and number of exonic bases per gene (data from RefSeq). d An unrooted phylogenetic tree derived from multiple alignment of cDNA sequences of 10 voltage-gated sodium channel genes within the human genome illustrates the complexity evolutionary relationship of paralogous sequences which complicates the process of short-read alignment in next-generation sequencing. A related voltage gated calcium channel CACNA1L is included as an outgroup
Fig. 2
Fig. 2
a The fraction of each ACMG gene within GIAB high-confidence regions. b Violin plots showing the distribution of the fraction each gene in the GIAB high-confidence regions for NA12878 for relevant gene sets: ACMG reportable genes, genes with variants in OMIM or ClinVar, and all genes. c Boxplots showing the distribution of the fraction of first, second, middle, penultimate, and last exon in ClinVar or OMIM genes in the GIAB high-confidence regions
Fig. 3
Fig. 3
a The number of sites in the genome where each 35 bp sequence appears for Genome in a Bottle high-confidence and low-confidence regions. b The fraction of each RepeatMasker repeat class in high-confidence regions
Fig. 4
Fig. 4
Bar graphs displaying the fraction of ClinVar pathogenic or likely pathogenic SNVs in high-confidence regions, unique sequences (35 bp), and alignable sequences (100 bp). The black line represents the genome-wide value
Fig. 5
Fig. 5
ClinVar variants within ACMG genes in the ExAC database. Depth of coverage in log2 space versus the number of samples that were unable to be called for that variant. The size of the points is relative to quality scores from GATK during joint calling. Orange indicates that the variant is in a high-confidence NA12878 region while blue is considered to be in low confidence. Triangles highlight variants that failed VQSR filtering

Comment in

References

    1. Dewey FE, Grove ME, Pan CP, Goldstein BA, Bernstein JA, Chaib H, et al. Clinical Interpretation and Implications of Whole-Genome Sequencing. JAMA. 2014;311:1035–1044. doi: 10.1001/jama.2014.1717. - DOI - PMC - PubMed
    1. Dewey FE, Grove ME, Priest JR, Waggott D, Batra P, Miller CL, et al. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data. PLoS Genet. Public Library of Science. 2015;11(10):e1005496. - PMC - PubMed
    1. Meynert AM, Ansari M, FitzPatrick DR, Taylor MS. Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics. 2014;15:247. doi: 10.1186/1471-2105-15-247. - DOI - PMC - PubMed
    1. Fang H, Wu YY, Narzisi G, O’Rawe JA, Barron LTJ, Rosenbaum J, et al. Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med. 2014;6:17. doi: 10.1186/gm534. - DOI - PMC - PubMed
    1. O’Rawe J, Jiang T, Sun GQ, Wu YY, Wang W, Hu JC, et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 2013;5:18. doi: 10.1186/gm422. - DOI - PMC - PubMed

Publication types