ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
- PMID: 37322495
- PMCID: PMC10273728
- DOI: 10.1186/s13073-023-01196-1
ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
Abstract
Background: Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases' prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time-consuming and difficult to reproduce.
Results: We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance threshold(s) or cluster stability regions and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography, or vaccination/clinical status. ReporTree is able to maintain cluster nomenclature in subsequent analyses and to generate a nomenclature code combining cluster information at different hierarchical levels, thus facilitating the active surveillance of clusters of interest. By handling several input formats and clustering methods, ReporTree is applicable to multiple pathogens, constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a comprehensive benchmarking of (i) the cg/wgMLST workflow with large datasets of four foodborne bacterial pathogens and (ii) the alignment-based SNP workflow with a large dataset of Mycobacterium tuberculosis. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata, such as antibiotic resistance data. By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species.
Conclusions: In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree .
Keywords: Automated pipeline; Genetic clustering; Genomic surveillance; Public health; ReporTree.
© 2023. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures



Similar articles
-
INSaFLU-TELEVIR: an open web-based bioinformatics suite for viral metagenomic detection and routine genomic surveillance.Genome Med. 2024 Apr 25;16(1):61. doi: 10.1186/s13073-024-01334-3. Genome Med. 2024. PMID: 38659008 Free PMC article.
-
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.PeerJ. 2017 Oct 6;5:e3893. doi: 10.7717/peerj.3893. eCollection 2017. PeerJ. 2017. PMID: 29372115 Free PMC article.
-
P-DOR, an easy-to-use pipeline to reconstruct bacterial outbreaks using genomics.Bioinformatics. 2023 Sep 2;39(9):btad571. doi: 10.1093/bioinformatics/btad571. Bioinformatics. 2023. PMID: 37701995 Free PMC article.
-
High Resolution Melting as a rapid, reliable, accurate and cost-effective emerging tool for genotyping pathogenic bacteria and enhancing molecular epidemiological surveillance: a comprehensive review of the literature.Ann Ig. 2017 Jul-Aug;29(4):293-316. doi: 10.7416/ai.2017.2153. Ann Ig. 2017. PMID: 28569339 Review.
-
Phylogenomic Pipeline Validation for Foodborne Pathogen Disease Surveillance.J Clin Microbiol. 2019 Apr 26;57(5):e01816-18. doi: 10.1128/JCM.01816-18. Print 2019 May. J Clin Microbiol. 2019. PMID: 30728194 Free PMC article. Review.
Cited by
-
Pathogenic Escherichia coli, Salmonella spp. and Campylobacter spp. in Two Natural Conservation Centers of Wildlife in Portugal: Genotypic and Phenotypic Characterization.Microorganisms. 2022 Oct 27;10(11):2132. doi: 10.3390/microorganisms10112132. Microorganisms. 2022. PMID: 36363724 Free PMC article.
-
Development of a Providencia stuartii multilocus sequence typing scheme.Front Microbiol. 2024 Oct 31;15:1493621. doi: 10.3389/fmicb.2024.1493621. eCollection 2024. Front Microbiol. 2024. PMID: 39545238 Free PMC article.
-
In silico and in vitro comparative analysis of 79 Acinetobacter baumannii clinical isolates.Microbiol Spectr. 2025 Jul;13(7):e0284924. doi: 10.1128/spectrum.02849-24. Epub 2025 May 16. Microbiol Spectr. 2025. PMID: 40377313 Free PMC article.
-
Exploring SNP filtering strategies: the influence of strict vs soft core.Microb Genom. 2025 Jan;11(1):001346. doi: 10.1099/mgen.0.001346. Microb Genom. 2025. PMID: 39812553 Free PMC article.
-
Genomic epidemiology and antimicrobial resistance of Morganella clinical isolates between 2016 and 2023.Front Cell Infect Microbiol. 2025 Jan 31;14:1464736. doi: 10.3389/fcimb.2024.1464736. eCollection 2024. Front Cell Infect Microbiol. 2025. PMID: 39958990 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous