Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 13;108(50):20166-71.
doi: 10.1073/pnas.1110064108. Epub 2011 Nov 30.

Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID

Affiliations

Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID

Cassandra B Jabara et al. Proc Natl Acad Sci U S A. .

Abstract

Viruses can create complex genetic populations within a host, and deep sequencing technologies allow extensive sampling of these populations. Limitations of these technologies, however, potentially bias this sampling, particularly when a PCR step precedes the sequencing protocol. Typically, an unknown number of templates are used in initiating the PCR amplification, and this can lead to unrecognized sequence resampling creating apparent homogeneity; also, PCR-mediated recombination can disrupt linkage, and differential amplification can skew allele frequency. Finally, misincorporation of nucleotides during PCR and errors during the sequencing protocol can inflate diversity. We have solved these problems by including a random sequence tag in the initial primer such that each template receives a unique Primer ID. After sequencing, repeated identification of a Primer ID reveals sequence resampling. These resampled sequences are then used to create an accurate consensus sequence for each template, correcting for recombination, allelic skewing, and misincorporation/sequencing errors. The resulting population of consensus sequences directly represents the initial sampled templates. We applied this approach to the HIV-1 protease (pro) gene to view the distribution of sequence variation of a complex viral population within a host. We identified major and minor polymorphisms at coding and noncoding positions. In addition, we observed dynamic genetic changes within the population during intermittent drug exposure, including the emergence of multiple resistant alleles. These results provide an unprecedented view of a complex viral population in the absence of PCR resampling.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Tagging viral RNA templates with a Primer ID before PCR amplification and sequencing allows for direct removal of artifactual errors and identifies resampling. (A) A primer was designed to bind downstream of the protease coding domain. In the 5′ tail of the primer, a degenerate string of eight nucleotides created a Primer ID, allowing for 65,536 unique combinations. An a priori selected three nucleotide barcode was designed for the sample ID. Finally, a heterologous string of nucleotides with low affinity to the HIV-1 genome was included in the far 5′ end for use as the priming site in the PCR amplification. (B) PCR biases and sequencing error are introduced during amplification and sequencing of viral templates. Repetitive identification of the barcode and Primer ID allow for tracking of each templating event from a single tagged cDNA. As errors are minor components within the Primer ID population, forming a consensus sequence directly removes them, and corrects for PCR resampling. (C) HIV-1 RNA templates isolated from plasma samples from two pre- and one postintermittent ritonavir drug therapy were tagged, amplified, and deep sequenced. Tagged sequences containing full-length protease were used to create a population of consensus sequences when at least three sequences contained an identical barcode and Primer ID.
Fig. 2.
Fig. 2.
Frequency of codon variation across all 99 positions in protease over three time points. Within a codon position, the first two bars represent untreated time points 1 and 2, respectively. Bars 3 and 4 are the third time point split based on the presence or absence of the resistance mutations to ritonavir. Bar 3 is the population of susceptible genotypes (defined as not V82A, I84V, or L90M), and bar 4 is the major resistant variant, V82A, population. Upward facing bars are nonsynonymous changes (scale in regular typeface), and downward facing bars are synonymous changes (scale in bolded typeface). Within a codon position, different shading represents different SNPs.
Fig. 3.
Fig. 3.
Phylogenetic representation of protease population derived from deep sequencing with a Primer ID. A Neighbor-Joining tree was constructed from sequences derived from all three time points and colored based on susceptibility to ritonavir. Blue colored taxa represent susceptible variants (defined as not V82A/I/L/F, I84V, or L90M). Red colored taxa represent variants containing the major ritonavir resistant variant, V82A. Pink colored taxa represent the minor resistant variants V82I/L/F. Green and orange colored taxa represent the minor resistant alleles L90M and I84V, respectively. Within a color, color brightness is correlated with sample time. Dark green and red arrows point to pre-RTV low-abundance sequences that clonally amplified to their respective clades.

Comment in

  • Degenerate Primer IDs and the birthday problem.
    Sheward DJ, Murrell B, Williamson C. Sheward DJ, et al. Proc Natl Acad Sci U S A. 2012 May 22;109(21):E1330; author reply E1331. doi: 10.1073/pnas.1203613109. Epub 2012 Apr 19. Proc Natl Acad Sci U S A. 2012. PMID: 22517746 Free PMC article. No abstract available.

References

    1. Margulies M, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
    1. Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. - PubMed
    1. Bentley DR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed
    1. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46. - PubMed
    1. Fischer W, et al. Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing. PLoS ONE. 2010;5:e12303. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources