Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 4;20(1):305.
doi: 10.1186/s12859-019-2878-2.

TAR-VIR: a pipeline for TARgeted VIRal strain reconstruction from metagenomic data

Affiliations

TAR-VIR: a pipeline for TARgeted VIRal strain reconstruction from metagenomic data

Jiao Chen et al. BMC Bioinformatics. .

Abstract

Background: Strain-level RNA virus characterization is essential for developing prevention and treatment strategies. Viral metagenomic data, which can contain sequences of both known and novel viruses, provide new opportunities for characterizing RNA viruses. Although there are a number of pipelines for analyzing viruses in metagenomic data, they have different limitations. First, viruses that lack closely related reference genomes cannot be detected with high sensitivity. Second, strain-level analysis is usually missing.

Results: In this study, we developed a hybrid pipeline named TAR-VIR that reconstructs viral strains without relying on complete or high-quality reference genomes. It is optimized for identifying RNA viruses from metagenomic data by combining an effective read classification method and our in-house strain-level de novo assembly tool. TAR-VIR was tested on both simulated and real viral metagenomic data sets. The results demonstrated that TAR-VIR competes favorably with other tested tools.

Conclusion: TAR-VIR can be used standalone for viral strain reconstruction from metagenomic data. Or, its read recruiting stage can be used with other de novo assembly tools for superior viral functional and taxonomic analyses. The source code and the documentation of TAR-VIR are available at https://github.com/chjiao/TAR-VIR .

Keywords: RNA virus; Read classification; Strain assembly; Viral metagenomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Enriching SARS-CoV reads using the bat coronavirus genome as the reference. a and b show the aligned and recruited reads profile. The dataset was aligned by BWA with the default parameter ("-B 4, -A 1"). BWA is chosen to include more locally aligned reads in (a). The reads were recruited using the overlap cutoff of 150 bp. c displays the sequence identity between SARS-Cov and the bat coronavirus. The profile was generated using VISTA [41]
Fig. 2
Fig. 2
Two scenarios. a. The reference is a gene or a functional site (long green bar). The reads are represented by short lines. Short green lines can be mapped to the reference sequence and define the set of seed reads. The first iteration of overlap detection will identify new reads (blue lines) overlapping with the seed reads. The second iteration of overlap detection will identify more reads (red lines). b. The reference is a remotely related genome (long green bar). The seed reads can be mapped to the reference genome and are represented by short green lines. Two iterations of overlap detection will recruit new reads (blue lines and red lines, respectively)
Fig. 3
Fig. 3
Chimeric reads may introduce contamination. The reference is a gene or a functional site (long green bar). The reads are represented by short lines. Green reads are sequenced from the reference. Red color represents sequences from another species. a. When the overlap cutoff is small, a chimeric read can be extended and thus recruits reads from other species. b. When the overlap cutoff is bigger than half of the read size, a chimeric read could be recruited but will not be extended in the following iterations

Similar articles

Cited by

References

    1. Woolhouse ME, Rambaut A, Kellam P. Lessons from Ebola: Improving infectious disease surveillance to inform outbreak management. Sci Transl Med. 2015;7(307):307–53075. doi: 10.1126/scitranslmed.aab0191. - DOI - PMC - PubMed
    1. Sharma D, Priyadarshini P, Vrati S. Unraveling the web of viroinformatics: computational tools and databases in virus research. J Virol. 2015;89(3):1489–501. doi: 10.1128/JVI.02027-14. - DOI - PMC - PubMed
    1. Yutin N, Makarova KS, Gussow AB, Krupovic M, Segall A, Edwards RA, Koonin EV. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat Microbiol. 2018;3(1):38. doi: 10.1038/s41564-017-0053-y. - DOI - PMC - PubMed
    1. Li L, Deng X, Da Costa AC, Bruhn R, Deeks SG, Delwart E. Virome analysis of antiretroviral-treated HIV patients shows no correlation between T-cell activation and anelloviruses levels. J Clin Virol. 2015;72:106–13. doi: 10.1016/j.jcv.2015.09.004. - DOI - PMC - PubMed
    1. Lim ES, Zhou Y, Zhao G, Bauer IK, Droit L, Ndao IM, Warner BB, Tarr PI, Wang D, Holtz LR. Early life dynamics of the human gut virome and bacterial microbiome in infants. Nat Med. 2015;21(10):1228–34. doi: 10.1038/nm.3950. - DOI - PMC - PubMed