Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar;88(5):2977-90.
doi: 10.1128/JVI.03128-13. Epub 2013 Dec 26.

Developing high-throughput HIV incidence assay with pyrosequencing platform

Affiliations

Developing high-throughput HIV incidence assay with pyrosequencing platform

Sung Yong Park et al. J Virol. 2014 Mar.

Abstract

Human immunodeficiency virus (HIV) incidence is an important measure for monitoring the epidemic and evaluating the efficacy of intervention and prevention trials. This study developed a high-throughput, single-measure incidence assay by implementing a pyrosequencing platform. We devised a signal-masking bioinformatics pipeline, which yielded a process error rate of 5.8 × 10(-4) per base. The pipeline was then applied to analyze 18,434 envelope gene segments (HXB2 7212 to 7601) obtained from 12 incident and 24 chronic patients who had documented HIV-negative and/or -positive tests. The pyrosequencing data were cross-checked by using the single-genome-amplification (SGA) method to independently obtain 302 sequences from 13 patients. Using two genomic biomarkers that probe for the presence of similar sequences, the pyrosequencing platform correctly classified all 12 incident subjects (100% sensitivity) and 23 of 24 chronic subjects (96% specificity). One misclassified subject's chronic infection was correctly classified by conducting the same analysis with SGA data. The biomarkers were statistically associated across the two platforms, suggesting the assay's reproducibility and robustness. Sampling simulations showed that the biomarkers were tolerant of sequencing errors and template resampling, two factors most likely to affect the accuracy of pyrosequencing results. We observed comparable biomarker scores between AIDS and non-AIDS chronic patients (multivariate analysis of variance [MANOVA], P = 0.12), indicating that the stage of HIV disease itself does not affect the classification scheme. The high-throughput genomic HIV incidence marks a significant step toward determining incidence from a single measure in cross-sectional surveys.

Importance: Annual HIV incidence, the number of newly infected individuals within a year, is the key measure of monitoring the epidemic's rise and decline. Developing reliable assays differentiating recent from chronic infections has been a long-standing quest in the HIV community. Over the past 15 years, these assays have traditionally measured various HIV-specific antibodies, but recent technological advancements have expanded the diversity of proposed accurate, user-friendly, and financially viable tools. Here we designed a high-throughput genomic HIV incidence assay based on the signature imprinted in the HIV gene sequence population. By combining next-generation sequencing techniques with bioinformatics analysis, we demonstrated that genomic fingerprints are capable of distinguishing recently infected patients from chronically infected patients with high precision. Our high-throughput platform is expected to allow us to process many patients' samples from a single experiment, permitting the assay to be cost-effective for routine surveillance.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Design scheme for the pyrosequencing platform of the genomic incidence assay. (A and B) An example of pyrosequencing flowgram data (A) of each patient's sample was processed by the bioinformatics pipeline based on differences in the flow intensity and base quality score between correct and incorrect base calls (B). The pipeline applied a mask to the data that retained 99.987% of all correct base calls from 13,550 control reads. Only 5% of the correct reads were removed by masking, achieving a 5% level of significance and a process error rate of 5.8 × 10−4 per base. (C) The processed reads were examined by a frequency-based Hamming distance (HD) distribution. The HD distributions from patient PI9189 and patient PY9507 shown here provide representative examples of incident and chronic infections, respectively. (D) To quantify the amount of closely related sequences (the signature of incident infections), two biomarkers, genome similarity index 5 (GSI5) and the 25% quantile of the HD distribution (Q25), were defined. Incident and chronic samples were predominantly segregated in the two-biomarker plane. The border line between the two stages was determined by receiver operating characteristic (ROC) analysis: samples above the border line were classified as incident infections, with all others classified as chronic infections.
FIG 2
FIG 2
Biomarker sensitivity to sequencing error rate. An increase in sequencing error rate (ε) for 100 simulated reads of identical 400-base-long segments introduces false heterogeneity to the original population. (A) As the simulations incorporated more random substitutions, deletions, and insertions into the initial sequences, the GSI5 score did not decline as precipitously as did its counterpart, GSI0, maintaining the true value of 1 for error rates of less than around 10−3. (B) As the simulations introduce random errors into 100 identical sequences, Q25 remained at 0 for error rates of less than 10−3. The two genomic biomarkers selected, GSI5 and Q25, remained at their true values at error rates equal to and above the pyrosequencing platform's process error rate (marked by dotted vertical lines), demonstrating the tolerability to sequencing errors.
FIG 3
FIG 3
Biomarker sensitivity to sequence resampling and template amount. Monte Carlo simulations were performed using the SGA sequence data from patient IJ7234, whose sample contained 20 different viral templates with GSI5 = 0.081 and Q25 = 26. Each simulation consisted of randomly drawing from a given number of unique templates and resampling as necessary until N reads were obtained. From the frequency-based HD distribution of N reads, the two biomarkers were measured. This procedure was repeated 100 times for each number of unique templates, and the averages and 95% confidence intervals of GSI5 and Q25 were obtained. As shown in the comparisons in panels A, B, and C, the two biomarkers did not vary notably with changes in the total number of resampled reads, N, denoting the insensitivity to sequence resampling. The number of initial unique templates was a sensitive parameter at limited quantities. Note that GSI5 and Q25 scores did not deviate from their true values until the number of unique templates was reduced to less than around 10, denoting the tolerability to limited amount of templates.
FIG 4
FIG 4
Masking pipeline for pyrosequencing data analysis. (A) The distributions of the flow intensity and base quality score of correct base calls (blue) and incorrect base calls (red). The flow intensity of each base call of the pyrosequencing platform is the light intensity signal which is proportional to the number of nucleotides incorporated; for example, a flow intensity of 100 during the adenine flow would result in a base call of a single “A” and a flow intensity of 200 during the same nucleotide flow would result in a base call of “AA.” The base quality score is a logarithmic measure inversely related to the probability that the given base call is incorrect; a base quality score of 40 indicates that the base call has a 1 in 10,000 chance of being wrong. Here, signals and errors were designated after the 13,550 reads containing 4,787,859 base calls were aligned to each of 7 control sequences. The z axis represents the number of observed correct or incorrect base calls with a given flow intensity and base quality score. At flow intensities of 100, 200, 300, 400, and 500, signals outnumber errors by several orders of magnitude. Also note that signal density increases along with base quality score, resulting in peak signal density at the highest possible score, 40. Meanwhile, errors are preferentially located at intermediate flow intensities and tend to have lower base quality scores. (B) The masking boundary (black) applied to the density plot of correct base calls (blue) and incorrect base calls (red). Based on the tendency of signals and errors to segregate into different regions as shown in panel A, the masking algorithm determined border lines that enclosed 99.987% of all correct base calls. As a result, the pipeline discarded only 5% of all reads identical to each corresponding reference sequence (5% level of significance) and reduced the process error rate from 1.1 × 10−3 to 5.8 × 10−4 per base.
FIG 5
FIG 5
The genomic incidence assay's classification scheme. (A) Biomarker score distributions of HIV envelope gene segments (HXB2 8212 to 7601) obtained from all 12 incident infections and 24 chronic infections by the pyrosequencing platform. The dotted lines represent the boundaries between incident (red) and chronic (blue) infections obtained by the ROC analysis; any line enclosed within the region marked by two dotted lines discriminates between incident and chronic infections with the maximum sum of sensitivity and specificity. Each cutoff line places incident infections above it and chronic infections below it. The classification achieved 100% sensitivity and 96% specificity. (B) Applying the same classification scheme to the same region of HIV envelope gene segments of 182 incident and 43 chronic infections, which were obtained by SGA, resulted in 97% sensitivity and 100% specificity. The SGA data sets were collected from references , , and .
FIG 6
FIG 6
Replicability of pyrosequencing results. Data represent three replicates of pyrosequencing data sets for each of the two chronic samples, JW8291 (circles) and TP0539 (asterisks), adjacent to the classification border (dotted lines). By conducting separate PCR runs for each replicate, three amplicon libraries for both patients' samples were prepared independently and a pyrosequencing run was performed by assigning a separate identification number (ID) to each replicate. The six resulting biomarker scores were located below the dotted cutoff lines, yielding correct classification results for all replicates. The similarities in biomarker scores and classification results indicate consistency within pyrosequencing data among independently compiled samples from a single subject.
FIG 7
FIG 7
Comparison between biomarkers obtained by pyrosequencing and SGA. The GSI5 (A) and Q25 (B) biomarkers obtained from 3 incident samples (PI9189, WN9587, and VH8724) and 10 chronic samples (PL7408, IJ7234, UD9992, QZ8149, LE2707, CX7332, NK9147, EC8287, KI6633, and CF6610) were tested on both the pyrosequencing and the SGA platforms. Two independent measures by the two platforms for both GSI5 and Q25 were statistically significantly correlated (F statistics, r = 0.88, F = 35.9, and P = 9.0 × 10−5 for GSI5 and r = 0.63, F = 7.06, and P = 0.022 for Q25).

References

    1. Brookmeyer R. 1991. Reconstruction and future trends of the AIDS epidemic in the United States. Science 253:37–42. 10.1126/science.2063206 - DOI - PubMed
    1. Busch MP, Pilcher CD, Mastro TD, Kaldor J, Vercauteren G, Rodriguez W, Rousseau C, Rehle TM, Welte A, Averill MD, Garcia Calleja JM. 2010. Beyond detuning: 10 years of progress and new challenges in the development and application of assays for HIV incidence estimation. AIDS 24:2763–2771. 10.1097/QAD.0b013e32833f1142 - DOI - PubMed
    1. Mastro TD. 2013. Determining HIV incidence in populations: moving in the right direction. J. Infect. Dis. 207:204–206. 10.1093/infdis/jis661 - DOI - PubMed
    1. Brookmeyer R, Quinn TC. 1995. Estimation of current human immunodeficiency virus incidence rates from a cross-sectional survey using early diagnostic tests. Am. J. Epidemiol. 141:166–172 - PubMed
    1. Janssen RS, Satten GA, Stramer SL, Rawal BD, O'Brien TR, Weiblen BJ, Hecht FM, Jack N, Cleghorn FR, Kahn JO, Chesney MA, Busch MP. 1998. New testing strategy to detect early HIV-1 infection for use in incidence estimates and for clinical and prevention purposes. JAMA 280:42–48. 10.1001/jama.280.1.42 - DOI - PubMed

Publication types

MeSH terms

Associated data

LinkOut - more resources