Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Feb 4;368(1614):20120205.
doi: 10.1098/rstb.2012.0205. Print 2013 Mar 19.

Viral population analysis and minority-variant detection using short read next-generation sequencing

Affiliations

Viral population analysis and minority-variant detection using short read next-generation sequencing

Simon J Watson et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

RNA viruses within infected individuals exist as a population of evolutionary-related variants. Owing to evolutionary change affecting the constitution of this population, the frequency and/or occurrence of individual viral variants can show marked or subtle fluctuations. Since the development of massively parallel sequencing platforms, such viral populations can now be investigated to unprecedented resolution. A critical problem with such analyses is the presence of sequencing-related errors that obscure the identification of true biological variants present at low frequency. Here, we report the development and assessment of the Quality Assessment of Short Read (QUASR) Pipeline (http://sourceforge.net/projects/quasr) specific for virus genome short read analysis that minimizes sequencing errors from multiple deep-sequencing platforms, and enables post-mapping analysis of the minority variants within the viral population. QUASR significantly reduces the error-related noise in deep-sequencing datasets, resulting in increased mapping accuracy and reduction of erroneous mutations. Using QUASR, we have determined influenza virus genome dynamics in sequential samples from an in vitro evolution of 2009 pandemic H1N1 (A/H1N1/09) influenza from samples sequenced on both the Roche 454 GSFLX and Illumina GAIIx platforms. Importantly, concordance between the 454 and Illumina sequencing allowed unambiguous minority-variant detection and accurate determination of virus population turnover in vitro.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(a) Read numbers (black line) and mean ‘mean read-quality-score’ (grey line) with error bars indicating standard deviation about this mean for the four 454 read-sets and (b) four corresponding Illumina read-sets. Increasing the per-read quality score (PRQC) cut-off has a marked effect on 454 read-sets, discarding up to 59% of reads and resulting in a nearly 6-fold increase in base-calling confidence. PRQC has a negligible effect on Illumina read-sets up to 34, where 88% of the reads are discarded for a 1.35-fold increase in confidence.
Figure 2.
Figure 2.
(a) Median coverage depth for the 454 read-sets and (b) the Illumina read-sets with error bars indicating the median absolute deviation (MAD) about their median. The asterisk over PRQC34 indicates that genome coverage dropped below 100%. (a) The trends match that of figure 3a, with the mapped depth dropping by 61% to 125-fold coverage at PRQC40. The mean MAD decreases from 100 to 42, indicating that the majority of the discarded reads are from regions of high coverage. (b) Illumina's mapped depth trend mirrors that of the read numbers too; at a PRQC of 34 mapped depth drops by 91%, resulting in incomplete genome coverage. Therefore, PRQC values of 40 and 33 are used for subsequent 454 and Illumina analyses, respectively.
Figure 3.
Figure 3.
(a) Median mapped coverage depth for the 454 read-sets and (b) the Illumina read-sets at increasingly stringent per-base quality control (PBQC) values, and a PRQC of 40 for 454 and 33 for Illumina. Asterisks indicate where genome coverage drops below 100%. (a) Full genome coverage is retained up to PBQC20, beyond which gaps appear in the genome at homopolymer stretches and regions of low coverage. (b) An 11% decrease in median mapped coverage is observed at PBQC32, while still retaining full genome coverage. Little difference is seen by increasing this to 33, but at 34 the majority of the bases are removed, resulting in incomplete genome coverage.
Figure 4.
Figure 4.
(a) Minority base frequencies for PB2 segment of sample 3, sequenced on the 454 platform without QC and (b) following PRQC40 and PBQC20, and (c) on the Illumina platform without QC and (d) following PRQC33 and PBQC33. Each minority base profile has the region between nucleotides 72 and 378 expanded for clarity. Performing QC on the 454 samples removes the many low-frequency (≤0.01) low-quality variants (I,II), thereby removing technical errors; the fewer remaining minority variants have greater support for further investigation. Importantly, some minority variants present at frequencies ≥0.1 before QC disappear following QC, and some that appear to be insignificant become significant afterwards. The minority-variant profiles following QC look much similar between the two platforms following QC (II) and (IV) than before (I) and (III).
Figure 5.
Figure 5.
(a) Base frequencies displayed as a stacked histogram for nine different genome positions sequenced on 454 and (b) Illumina. For each position, the base frequencies are shown across four time points, allowing the dynamics of the minorities at that position to be observed. There is little difference in the base frequencies between the two platforms; the dynamics are almost identical, indicating platform independence in observing the changes in minorities, despite their technological differences. Two different population dynamics can be observed in the time series; five of the positions have their majority base replaced while the other four show a transient appearance of minority variants that are subsequently lost.

References

    1. Richman DD, et al. 1994. Nevirapine resistance mutations of human immunodeficiency virus type 1 selected during therapy. J. Virol. 68 1660–1666 - PMC - PubMed
    1. Bunnik EM, et al. 2011. Detection of inferred CCR5- and CXCR4-using HIV-1 variants and evolutionary intermediates using ultra-deep pyrosequencing. PLoS Pathog. 7, e1002106 (doi:10.1371/journal.ppat.1002106) - DOI - PMC - PubMed
    1. Moya A, Holmes E, González-Candelas F. 2004. The population genetics and evolutionary epidemiology of RNA viruses. Nat. Rev. Microbiol. 2, 279–288 (doi:10.1038/nrmicro863) - DOI - PMC - PubMed
    1. Wang C, Mitsuya Y, Gharizadeh B, Ronaghi M, Shafer RW. 2007. Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 17, 1195–1201 (doi:10.1101/gr.6468307) - DOI - PMC - PubMed
    1. Archer J, Braverman MS, Taillon BE, Desany B, James I, Harrigan PR, Lewis M, Robertson DL. 2009. Detection of low-frequency pretherapy chemokine (CXC motif) receptor 4 (CXCR4)-using HIV-1 with ultra-deep pyrosequencing. AIDS 23, 1209–1218 (doi:10.1097/QAD.0b013e32832b4399) - DOI - PMC - PubMed

Publication types