Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan;7(1):182.
doi: 10.4172/2153-0602.1000182. Epub 2015 Nov 8.

Towards Better Precision Medicine: PacBio Single-Molecule Long Reads Resolve the Interpretation of HIV Drug Resistant Mutation Profiles at Explicit Quasispecies (Haplotype) Level

Affiliations

Towards Better Precision Medicine: PacBio Single-Molecule Long Reads Resolve the Interpretation of HIV Drug Resistant Mutation Profiles at Explicit Quasispecies (Haplotype) Level

Da Wei Huang et al. J Data Mining Genomics Proteomics. 2016 Jan.

Abstract

Development of HIV-1 drug resistance mutations (HDRMs) is one of the major reasons for the clinical failure of antiretroviral therapy. Treatment success rates can be improved by applying personalized anti-HIV regimens based on a patient's HDRM profile. However, the sensitivity and specificity of the HDRM profile is limited by the methods used for detection. Sanger-based sequencing technology has traditionally been used for determining HDRM profiles at the single nucleotide variant (SNV) level, but with a sensitivity of only ≥ 20% in the HIV population of a patient. Next Generation Sequencing (NGS) technologies offer greater detection sensitivity (~ 1%) and larger scope (hundreds of samples per run). However, NGS technologies produce reads that are too short to enable the detection of the physical linkages of individual SNVs across the haplotype of each HIV strain present. In this article, we demonstrate that the single-molecule long reads generated using the Third Generation Sequencer (TGS), PacBio RS II, along with the appropriate bioinformatics analysis method, can resolve the HDRM profile at a more advanced quasispecies level. The case studies on patients' HIV samples showed that the quasispecies view produced using the PacBio method offered greater detection sensitivity and was more comprehensive for understanding HDRM situations, which is complement to both Sanger and NGS technologies. In conclusion, the PacBio method, providing a promising new quasispecies level of HDRM profiling, may effect an important change in the field of HIV drug resistance research.

Keywords: HIV-1 drug resistance mutation; Haplotype; Linkage; Next generation sequencing; PacBio; Quasispecies; Single nucleotide variant; Tag-sequence; Third generation sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Illustration of Single Nucleotide Variant (SNV) and Quasispecies-level Analysis. The upper-case letter in red color is denoted as a mutation. The small-case letter in black is denoted as a wildtype (WT) nucleotide
A. HIV population has two sub populations (quasispecies). The SNV-level report lists two individual mutations in the HIV population at position 108 of PR and position 536 of RT separately. In contrast, the quasispecies-level report lists two tag-sequences (representing the corresponding HIV quasispecies) of “..A..t…” and “..g..C..” by linking the two nucleotide calls at the position of interest. B. HIV population has different quasispecies composition with a distinct sub population of WT HIV. The SNV-level report gives a result identical to that of A, even though the two HIV populations have different compositions. The SNV-level report does not distinguish the different situations due to lack of haplotype information. In contrast, the quasispecies-level report reflects the difference in precision.
Figure 2
Figure 2. Case study 1 shows the dynamics of HIV drug resistant mutation (HDRM) across two time points
A. Plot illustration of hypothesis of the potential HDRM changes across two time points. The triangle in red denotes drug resistance mutation at the position, and the one in black is wildtype. The gray area means the <20% frequency area that is below the detection sensitivity of traditional Sanger based TruGene sequencing. The question is whether the HDRM seen existed in the time point 1. B. The SNV-analysis results are derived from TruGene and MiSeq. MiSeq is able to detect 3 additional mutations (613, 615 and 846) with lower frequencies around ~ 1% in the time point 1. The HDRM seem to exist in very low frequencies, but without confidence, because the frequencies are at the boundary of MiSeq detection sensitivity. C. The quasispecies-level analysis of the HIV population was done based on PacBio datasets for the two time points. The positions of 6 drug resistance mutations are selected into the tag panel. The tag-sequences show that HDRM clearly exist at very low frequency in the time point 1, and becomes the dominant quasispecies in the time point 2. Of note, the table lists all potential quasispecies. The tag-sequences with difference value <2 and frequency <0.4% should be considered as noise.
Figure 3
Figure 3. The concepts of signal-noise separations of SNV-analysis vs. quasispecies-analysis. The red dots are denoted as higher sequencing errors on the PacBio long reads. The blue and green marks represent two true mutations
A. The PacBio reads contain the mixed signals of the true mutations (in blue and green) and many errors (in red). Of note, the errors are in random distribution. B. Individual SNV-analysis is to measure the frequency of the variant (comparing to the wildtype reference sequence) for each given position in a one-by-one independent way. The frequencies of each position can be combined and presented in a SNV plot. The background noise (dots in red) is at the level of ~ 1% for both MiSeq and PacBio. If an individual true mutation (marks in blue or green) is at the noise area, it is difficult to distinguish it from the background noise. C. The PacBio errors are randomly distributed, so that it is difficult to form a co-occurrence of two errors on the same read by chance. In contrast, the co-occurrence of two true mutations (in blue and green) with a relatively high frequency could be easily distinguished from noise signals. The co-occurrence pattern of quasispecies is the most important foundation for the tag-based quasispecies analysis.
Figure 4
Figure 4. Benchmark study shows PacBio’s quasispecies-analysis is more comprehensive than MiSeq’s SNV-analysis
A. The SNV plots derived from MiSeq SNV analysis on benchmark admixture samples of 1.25%, 0.625% and 0.125%. The background noise (in blue) is in general low level (<1%). The expected 16 SNVs (in red) are well separated from background for the 1.25% benchmark sample. The separation of signal (in red) and noise (in blue) start to be blurred on the 0.625% benchmark sample, and finally lost on the 0.125% benchmark sample. B. The tag sequences derived from PacBio’s quasispecies analysis on the same benchmark samples using the 16 SNVs at a tag panel. After the positions of 16 expected SNVs were selected into tag panel (signature) to construct artificial tag-sequence representing, the quasispecies profile is simply presented by the tag-sequences. For the tag-sequence, the lower case letter in black denotes wildtype (wt) nucleotide of the position. The upper case letter in red denotes a mutation of the position. The column of ‘difference’ values is to measure the number of mutations co-occurring on the same quasispecies by comparing it to the most frequent tag-sequence. According to the co-occurrence concept illustrated, the tag sequences with the difference number <2 and frequency <0.4% should be considered as noise or unconfident records. With these criteria, the minor quasispecies can easily stand out in the 0.165% admixture sample (Figure 3).
Figure 5
Figure 5. Case study 2 shows the distinct HIV drug resistance mutation (HDRM) profiles associated with different quasispecies
A. The hypothesis of two scenarios how the HDRM could be distributed in the HIV quasispecies. The marks in red are denoted as HDRM, and ones in black are wildtype. B. MiSeq and TruGene’s SNV-level analyses show 10 out of 18 HDRMs are co-exist with wildtype sequences on the RT region. The individual SNV-level analysis without linkage information of the HDRM could not provide further detailed understanding how the HDRM distributed across quasispecies as scenario 1 or 2. C. PacBio’s quasispecies analysis gives much clearer HDRM profiles at quasispecies level. The results support scenario 1 as demonstrated in A.

References

    1. Malet I, Delelis O, Soulie C, Wirden M, Tchertanov L, et al. Quasispecies variant dynamics during emergence of resistance to raltegravir in HIV-1-infected patients. The Journal of antimicrobial chemotherapy. 2009;63:795–804. - PubMed
    1. Menzo S, Bagnarelli P, Monachetti A, Fiorelli L, Clementi M. Complexity and dynamics of HIV-1 quasispecies. Journal of biological regulators and homeostatic agents. 2000;14:4–6. - PubMed
    1. Nowak MA, May RM, Anderson RM. The evolutionary dynamics of HIV-1 quasispecies and the development of immunodeficiency disease. Aids. 1990;4:1095–1103. - PubMed
    1. Clavel F, Hance AJ. HIV drug resistance. The New England journal of medicine. 2004;350:1023–1035. - PubMed
    1. Shafer RW. Rationale and uses of a public HIV drug-resistance database. The Journal of infectious diseases. 2006;194:S51–58. - PMC - PubMed

LinkOut - more resources