Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

doi:10.1371/journal.pcbi.1006498

. 2018 Dec 13;14(12):e1006498.

doi: 10.1371/journal.pcbi.1006498. eCollection 2018 Dec.

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

Affiliations

¹ Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA.
² Department of Biology and Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
³ Division of Medical Virology, Department of Pathology, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, Western Cape, South Africa.
⁴ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁵ Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁶ Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California San Diego, La Jolla, CA, USA.
⁷ Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.

PMID: 30543621
PMCID: PMC6314628
DOI: 10.1371/journal.pcbi.1006498

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

Kemal Eren et al. PLoS Comput Biol. 2018.

. 2018 Dec 13;14(12):e1006498.

doi: 10.1371/journal.pcbi.1006498. eCollection 2018 Dec.

Authors

Affiliations

¹ Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA.
² Department of Biology and Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
³ Division of Medical Virology, Department of Pathology, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, Western Cape, South Africa.
⁴ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁵ Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁶ Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California San Diego, La Jolla, CA, USA.
⁷ Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.

PMID: 30543621
PMCID: PMC6314628
DOI: 10.1371/journal.pcbi.1006498

Abstract

Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of the FLEA pipeline, broken into conceptual sub-pipelines.**
The *Quality* and *Consensus* sub-pipelines process each time point separately. Duplicate steps in other time points are grayed out. CCS stands for “circular consensus sequences”; QCS for “quality-controlled sequences”, and HQCS for “high-quality consensus sequences”.

**Fig 2. Quality and consensus sub-pipelines.**
These steps are repeated independently on each time point. Numbers are reported from the analysis of sequences from the first time point (V03) of donor P018, which is three months post infection. Percentages give the fraction of sequences retained after filtering. Tasks indicate whether they use third-party tools USEARCH or MAFFT.

**Fig 3. Hidden Markov model used for trimming poly-A and poly-T heads and tails.**
A head and tail states have a small (p = 0.01) probability to emit non-A bases, and similarly for T. The *body* state emits all four bases with equal probability. The *start*, and *stop* states emit nothing.

**Fig 4. Screenshot of the multidimensional scaling plot.**
The embedding in two dimensions preserves pairwise evolutionary distances between HQCSs. Node area is proportional to copy number, and color corresponds to time point. The increasing genetic diversity of the population is visible as time goes on.

**Fig 5. Screenshot of the evolutionary trajectory report.**
Four evolutionary metrics (dS divergence, dN divergence, total divergence, and total diversity) and two phenotype metrics (length and possible N-linked glycosylation sites) are shown for gp160.

**Fig 6. Screenshot of amino acid sequences viewer.**
Sequences are grouped by identity, with aggregate copy number and population percentage shown to the right. An overview of the amplicon, optionally annotated with region names, provides fast access to different locations of the alignment. Selecting columns of the alignment interactively updates the amino acid dynamics plot, showing the dynamics of the selected motif over time. In this case, the trajectory shows changes in the N332 glycan supersite. Sites inferred by FUBAR to be undergoing positive selection are selectable.

**Fig 7. Screenshots of the interactive three-dimensional Env structure, colored according to JS divergence (left) and dN/dS values (right).**
Positions imputed to be undergoing more positive selection (dN/dS > 1) are darker red, and positions undergoing more purifying selection (dN/dS < 1) are darker blue. The right structure also shows motif positions highlighted in the sequence viewer.

**Fig 8. Screenshot of dN/dS values mapped to protein positions and separated by time point.**

**Fig 9. Screenshot of the phylogenetic tree viewer.**
Leaf node size corresponds to sequence copy number. Node color corresponds to time point. Since ancestral sequences have been inferred, ancestral nodes are colored according to the selected motif, which in this case is the N332 glycan supersite.

**Fig 10. Comparison of true sequence abundances versus copy numbers inferred by FLEA for each time point of the simulated P018 data.**
Each node represents one sequence, with the area denoting its relative abundance in the population. The true population (top) is colored green. For each true sequence, the matching HQCS sequences appears below it in blue. Red nodes denote false negatives and positives. The most common false negative for each time point is annotated with its abundance.

See this image and copyright information in PMC

Cited by

Long-read amplicon denoising.
Kumar V, Vollbrecht T, Chernyshev M, Mohan S, Hanst B, Bavafa N, Lorenzo A, Kumar N, Ketteringham R, Eren K, Golden M, Oliveira MF, Murrell B. Kumar V, et al. Nucleic Acids Res. 2019 Oct 10;47(18):e104. doi: 10.1093/nar/gkz657. Nucleic Acids Res. 2019. PMID: 31418021 Free PMC article.
Rapid and Focused Maturation of a VRC01-Class HIV Broadly Neutralizing Antibody Lineage Involves Both Binding and Accommodation of the N276-Glycan.
Umotoy J, Bagaya BS, Joyce C, Schiffner T, Menis S, Saye-Francisco KL, Biddle T, Mohan S, Vollbrecht T, Kalyuzhniy O, Madzorera S, Kitchin D, Lambson B, Nonyane M, Kilembe W; IAVI Protocol C Investigators; IAVI African HIV Research Network; Poignard P, Schief WR, Burton DR, Murrell B, Moore PL, Briney B, Sok D, Landais E. Umotoy J, et al. Immunity. 2019 Jul 16;51(1):141-154.e6. doi: 10.1016/j.immuni.2019.06.004. Immunity. 2019. PMID: 31315032 Free PMC article.
Vaccine-Induced Protection from Homologous Tier 2 SHIV Challenge in Nonhuman Primates Depends on Serum-Neutralizing Antibody Titers.
Pauthner MG, Nkolola JP, Havenar-Daughton C, Murrell B, Reiss SM, Bastidas R, Prévost J, Nedellec R, von Bredow B, Abbink P, Cottrell CA, Kulp DW, Tokatlian T, Nogal B, Bianchi M, Li H, Lee JH, Butera ST, Evans DT, Hangartner L, Finzi A, Wilson IA, Wyatt RT, Irvine DJ, Schief WR, Ward AB, Sanders RW, Crotty S, Shaw GM, Barouch DH, Burton DR. Pauthner MG, et al. Immunity. 2019 Jan 15;50(1):241-252.e6. doi: 10.1016/j.immuni.2018.11.011. Epub 2018 Dec 11. Immunity. 2019. PMID: 30552025 Free PMC article.
High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution.
Callahan BJ, Wong J, Heiner C, Oh S, Theriot CM, Gulati AS, McGill SK, Dougherty MK. Callahan BJ, et al. Nucleic Acids Res. 2019 Oct 10;47(18):e103. doi: 10.1093/nar/gkz569. Nucleic Acids Res. 2019. PMID: 31269198 Free PMC article.

References

1. DeLeon O, Hodis H, O’Malley Y, Johnson J, Salimi H, Zhai Y, et al. Accurate predictions of population-level changes in sequence and structural properties of HIV-1 Env using a volatility-controlled diffusion model. PLOS Biology. 2017. 04;15(4):1–38. 10.1371/journal.pbio.2001549 - DOI - PMC - PubMed
1. Fischer W, Ganusov VV, Giorgi EE, Hraber PT, Keele BF, Leitner T, et al. Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing. PLOS ONE. 2010. 08;5(8):1–15. 10.1371/journal.pone.0012303 - DOI - PMC - PubMed
1. Henn MR, Boutwell CL, Charlebois P, Lennon NJ, Power KA, Macalalad AR, et al. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLOS Pathogens. 2012. 03;8(3):1–14. 10.1371/journal.ppat.1002529 - DOI - PMC - PubMed
1. Leung P, Bull R, Lloyd A, Luciani F. A bioinformatics pipeline for the analyses of viral escape dynamics and host immune responses during an infection. BioMed Research International. 2014;2014 10.1155/2014/680249 - DOI - PMC - PubMed
1. McCloskey RM, Liang RH, Harrigan PR, Brumme ZL, Poon AFY. An evaluation of phylogenetic methods for reconstructing transmitted HIV variants using longitudinal clonal HIV sequence data. Journal of Virology. 2014. June;88(11):6181–6194. 10.1128/JVI.00483-14 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

[1] DeLeon O, Hodis H, O’Malley Y, Johnson J, Salimi H, Zhai Y, et al. Accurate predictions of population-level changes in sequence and structural properties of HIV-1 Env using a volatility-controlled diffusion model. PLOS Biology. 2017. 04;15(4):1–38. 10.1371/journal.pbio.2001549 - DOI - PMC - PubMed

[2] DeLeon O, Hodis H, O’Malley Y, Johnson J, Salimi H, Zhai Y, et al. Accurate predictions of population-level changes in sequence and structural properties of HIV-1 Env using a volatility-controlled diffusion model. PLOS Biology. 2017. 04;15(4):1–38. 10.1371/journal.pbio.2001549 - DOI - PMC - PubMed

[3] Fischer W, Ganusov VV, Giorgi EE, Hraber PT, Keele BF, Leitner T, et al. Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing. PLOS ONE. 2010. 08;5(8):1–15. 10.1371/journal.pone.0012303 - DOI - PMC - PubMed

[4] Fischer W, Ganusov VV, Giorgi EE, Hraber PT, Keele BF, Leitner T, et al. Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing. PLOS ONE. 2010. 08;5(8):1–15. 10.1371/journal.pone.0012303 - DOI - PMC - PubMed

[5] Henn MR, Boutwell CL, Charlebois P, Lennon NJ, Power KA, Macalalad AR, et al. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLOS Pathogens. 2012. 03;8(3):1–14. 10.1371/journal.ppat.1002529 - DOI - PMC - PubMed

[6] Henn MR, Boutwell CL, Charlebois P, Lennon NJ, Power KA, Macalalad AR, et al. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLOS Pathogens. 2012. 03;8(3):1–14. 10.1371/journal.ppat.1002529 - DOI - PMC - PubMed

[7] Leung P, Bull R, Lloyd A, Luciani F. A bioinformatics pipeline for the analyses of viral escape dynamics and host immune responses during an infection. BioMed Research International. 2014;2014 10.1155/2014/680249 - DOI - PMC - PubMed

[8] Leung P, Bull R, Lloyd A, Luciani F. A bioinformatics pipeline for the analyses of viral escape dynamics and host immune responses during an infection. BioMed Research International. 2014;2014 10.1155/2014/680249 - DOI - PMC - PubMed

[9] McCloskey RM, Liang RH, Harrigan PR, Brumme ZL, Poon AFY. An evaluation of phylogenetic methods for reconstructing transmitted HIV variants using longitudinal clonal HIV sequence data. Journal of Virology. 2014. June;88(11):6181–6194. 10.1128/JVI.00483-14 - DOI - PMC - PubMed

[10] McCloskey RM, Liang RH, Harrigan PR, Brumme ZL, Poon AFY. An evaluation of phylogenetic methods for reconstructing transmitted HIV variants using longitudinal clonal HIV sequence data. Journal of Virology. 2014. June;88(11):6181–6194. 10.1128/JVI.00483-14 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

Affiliations

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases