VirRep: a hybrid language representation learning framework for identifying viruses from human gut metagenomes
- PMID: 38965579
- PMCID: PMC11229495
- DOI: 10.1186/s13059-024-03320-9
VirRep: a hybrid language representation learning framework for identifying viruses from human gut metagenomes
Abstract
Identifying viruses from metagenomes is a common step to explore the virus composition in the human gut. Here, we introduce VirRep, a hybrid language representation learning framework, for identifying viruses from human gut metagenomes. VirRep combines a context-aware encoder and an evolution-aware encoder to improve sequence representation by incorporating k-mer patterns and sequence homologies. Benchmarking on both simulated and real datasets with varying viral proportions demonstrates that VirRep outperforms state-of-the-art methods. When applied to fecal metagenomes from a colorectal cancer cohort, VirRep identifies 39 high-quality viral species associated with the disease, many of which cannot be detected by existing methods.
Keywords: Human gut metagenomes; Language representation learning; Virus identification.
© 2024. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures





Similar articles
-
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.Microbiome. 2017 Jul 6;5(1):69. doi: 10.1186/s40168-017-0283-5. Microbiome. 2017. PMID: 28683828 Free PMC article.
-
Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.BMC Bioinformatics. 2016 Jan 16;17:38. doi: 10.1186/s12859-015-0875-7. BMC Bioinformatics. 2016. PMID: 26774270 Free PMC article.
-
The Chinese gut virus catalogue reveals gut virome diversity and disease-related viral signatures.Genome Med. 2025 Mar 26;17(1):30. doi: 10.1186/s13073-025-01460-6. Genome Med. 2025. PMID: 40140988 Free PMC article.
-
Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit.Nat Protoc. 2021 Apr;16(4):1785-1801. doi: 10.1038/s41596-020-00480-3. Epub 2021 Mar 1. Nat Protoc. 2021. PMID: 33649565 Review.
-
Benchmarking Metagenomics Tools for Taxonomic Classification.Cell. 2019 Aug 8;178(4):779-794. doi: 10.1016/j.cell.2019.07.010. Cell. 2019. PMID: 31398336 Free PMC article. Review.
Cited by
-
Complementary insights into gut viral genomes: a comparative benchmark of short- and long-read metagenomes using diverse assemblers and binners.Microbiome. 2024 Dec 20;12(1):260. doi: 10.1186/s40168-024-01981-z. Microbiome. 2024. PMID: 39707560 Free PMC article.
-
Highly accurate prophage island detection with PIDE.Genome Biol. 2025 Aug 20;26(1):254. doi: 10.1186/s13059-025-03733-0. Genome Biol. 2025. PMID: 40836306 Free PMC article.
-
ViraLM: empowering virus discovery through the genome foundation model.Bioinformatics. 2024 Nov 28;40(12):btae704. doi: 10.1093/bioinformatics/btae704. Bioinformatics. 2024. PMID: 39579086 Free PMC article.
References
-
- Clooney AG, Sutton TD, Shkoporov AN, Holohan RK, Daly KM, O’Regan O, et al. Whole-virome analysis sheds light on viral dark matter in inflammatory bowel disease. Cell Host Microbe. 2019;26(764–778):e765. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous