Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 15;4(1):1.
doi: 10.1186/2042-5783-4-1.

Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

Affiliations

Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

Kerensa McElroy et al. Microb Inform Exp. .

Abstract

Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart detailing pipeline steps required for deep sequencing projects. After extracting genomic material, PCR amplification may be required prior to library preparation. For sequencing of a target region (‘amplicon sequencing’), multiple, ‘nested’ PCR rounds may be performed. Sequencing adapters and primers may be included in the primer for the final round, or may be annealed to the ends of fragments after amplification. For whole genome sequencing, multiple, overlapping PCR products are randomly sheared before annealing of sequencing adapters and primers. Alternatively, if sufficient genomic material is available, shearing and annealing may be performed directly without PCR amplification. If sequencing RNA, RT must be performed before library preparation. For amplicon sequencing, this may take the form of an initial RT-PCR. Choice of sequencing technology is dependent on the project’s aims: for instance, the longer reads of Roche-454 may be more appropriate for reconstructing haplotypes, while the high data volume afforded by Illumina is more suitable for detecting very low frequency SNVs. After sequencing, reads must be aligned, either via multiple sequence alignment or to a reference. Choice of reference is critical; if available, a published reference or references may be used; alternatively, a consensus sequence may be used, generated through de novo assembly, or by alignment to a published reference followed by replacement of fixed variants, or by Sanger sequencing the same sample as submitted for deep sequencing. Following alignment, a number of bioinformatic tools are available for SNV calling, haplotype reconstruction, and downstream analysis.

References

    1. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11(1):31–46. doi: 10.1038/nrg2626. - DOI - PubMed
    1. Esteller M. Non-coding RNAs in human disease. Nat Rev Genet. 2011;12(12):861–874. doi: 10.1038/nrg3074. - DOI - PubMed
    1. Skalsky RL, Cullen BR. Viruses, microRNAs, and host interactions. Annu Rev Microbiol. 2010;64:123–141. doi: 10.1146/annurev.micro.112408.134243. - DOI - PMC - PubMed
    1. Kriesel JD, Hobbs MR, Jones BB, Milash B, Nagra RM, Fischer KF. Deep sequencing for the detection of virus-like sequences in the brains of patients with multiple sclerosis: detection of GBV-C in human brain. PLoS One. 2012;7(3):e31886. doi: 10.1371/journal.pone.0031886. - DOI - PMC - PubMed
    1. Gilbert JA, Dupont CL. Microbial metagenomics: beyond the genome. Ann Rev Mar Sci. 2011;3:347–371. doi: 10.1146/annurev-marine-120709-142811. - DOI - PubMed