Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan;40(1):e2.
doi: 10.1093/nar/gkr861. Epub 2011 Oct 19.

Ultrasensitive detection of rare mutations using next-generation targeted resequencing

Affiliations

Ultrasensitive detection of rare mutations using next-generation targeted resequencing

Patrick Flaherty et al. Nucleic Acids Res. 2012 Jan.

Abstract

With next-generation DNA sequencing technologies, one can interrogate a specific genomic region of interest at very high depth of coverage and identify less prevalent, rare mutations in heterogeneous clinical samples. However, the mutation detection levels are limited by the error rate of the sequencing technology as well as by the availability of variant-calling algorithms with high statistical power and low false positive rates. We demonstrate that we can robustly detect mutations at 0.1% fractional representation. This represents accurate detection of one mutant per every 1000 wild-type alleles. To achieve this sensitive level of mutation detection, we integrate a high accuracy indexing strategy and reference replication for estimating sequencing error variance. We employ a statistical model to estimate the error rate at each position of the reference and to quantify the fraction of variant base in the sample. Our method is highly specific (99%) and sensitive (100%) when applied to a known 0.1% sample fraction admixture of two synthetic DNA samples to validate our method. As a clinical application of this method, we analyzed nine clinical samples of H1N1 influenza A and detected an oseltamivir (antiviral therapy) resistance mutation in the H1N1 neuraminidase gene at a sample fraction of 0.18%.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Method flowchart. The method for detecting rare variants compares the baseline error rate from multiple reference replicates to the sample error rate at each position. Sample and reference DNA are independently prepared and tagged with indexed adapters. The reference and sample libraries are pooled and sequenced on the same lane. The reads are aligned and preprocessed to filter out strand-specific errors. The parameters of a Beta-Binomial model are fit to the reference sequence data to obtain a null hypothesis error rate distribution for each position. Finally, the error rate of the sample sequencing data is compared to the null distribution to call rare variants.
Figure 2.
Figure 2.
Position-specific error rate distribution. The average sequence error rate variance across positions is significantly greater than the average variability at each position. The across-position distribution is shown on the right side in dark blue and a sample of five within-position density estimates is shown below it. The empirical within-position and across-position distribution estimates show that a small number of outlying positions contribute to the excessive variance in the across-position distribution.
Figure 3.
Figure 3.
Variant positions in the 0.1% mixture sample of synthetic DNA are identified by the statistical model. The x-axis is the reference error rate as estimated by formula image in the model and the y-axis is the sample error rate (error read depth/total read depth). True negatives (black), true positives (blue) and false positives (red) for three replicates are identified in both samples. For each of the three replicates, the model finds 14 of 14 true positives; 5, 4 and 1 additional calls (false positives), respectively, are made. Requiring a consensus call of all three replicates eliminates these false positives.
Figure 4.
Figure 4.
Detection power depends on both read depth and experimental precision. We show here that the statistical power of the model, the likelihood of detecting a true positive at a given effect size (level of prevalence), increases with read depth and sample preparation precision, up to asymptotic limits. (a) Read depth (n) is held constant at an example level of 10 000 and it can be seen that power increases with experimental precision (formula image) up to a limit of approximately 0.4 for an effect size of 0.1%. (b) When the experimental precision (formula image) is held constant at 10 000, power increases with read depth (n) up to a limit of approximately 0.4 for an effect size of 0.1%. (c) For a fixed false positive and false negative rate, the detectable effect size decreases with both increasing sample preparation precision (formula image) and read depth (formula image). A greater gain is achieved by improving sample preparation precision than by increasing read depth if the experimental variation is large. (d) The ROC curve for a fixed effect size and sample preparation precision improves rapidly as the read depth increases. Read depth limits the sensitivity at all false positive rates when low, but when read depth is high the ROC curve approaches an asymptotic curve controlled by the experimental variation.
Figure 5.
Figure 5.
Sequencing results of clinical samples of H1N1 influenza A. (a) A red dot indicates a position called as a mutant (formula image) and has a sample fraction >0.1% and green dots indicate an estimated sample fraction >1%. (b) A detail display of 10 positions in sample BN3 shows the difference between the reference and sample sequencing error rates for called mutations in two replicate lanes. The non-reference base composition for both lanes (in sequence logo format) shows that the three mutations are T to C pyrimidine transitions. (c) We identified the H275Y mutation responsible for oseltamivir resistance in one clinical sample (BN9). Across all of the H1N1 clinical samples, we display a breakdown of the individual sequencing error rate for the non-reference bases at codon position 1. The mutation in sample BN9 is readily apparent. The dotted line indicates the expected base error rate from a uniform distribution across bases using the total sequencing error rate.

Similar articles

Cited by

References

    1. Hedskog C, Mild M, Jernberg J, Sherwood E, Bratt G, Leitner T, Lundeberg J, Andersson B, Albert J. Dynamics of HIV-1 quasispecies during antiviral treatment dissected using ultra-deep pyrosequencing. PLoS One. 2010;5:e11345. - PMC - PubMed
    1. Kuroda M, Katano H, Nakajima N, Tobiume M, Ainai A, Sekizuka T, Hasegawa H, Tashiro M, Sasaki Y, Arakawa Y, et al. Characterization of quasispecies of pandemic 2009 influenza A virus (A/H1N1/2009) by de novo sequencing using a next-generation DNA sequencer. PLoS One. 2010;5:e10256. - PMC - PubMed
    1. Tsibris AM, Korber B, Arnaout R, Russ C, Lo CC, Leitner T, Gaschen B, Theiler J, Paredes R, Su Z, et al. Quantitative deep sequencing reveals dynamic HIV-1 escape and large population shifts during CCR5 antagonist therapy in vivo. PLoS One. 2009;4:e5683. - PMC - PubMed
    1. Wang C, Mitsuya Y, Gharizadeh B, Ronaghi M, Shafer RW. Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 2007;17:1195–1201. - PMC - PubMed
    1. Thomas RK, Nickerson E, Simons JF, Janne PA, Tengs T, Yuza Y, Garraway LA, LaFramboise T, Lee JC, Shah K, et al. Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nat. Med. 2006;12:852–855. - PubMed

Publication types