Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 4;21(4):e1012593.
doi: 10.1371/journal.pcbi.1012593. eCollection 2025 Apr.

TIPP3 and TIPP3-fast: Improved abundance profiling in metagenomics

Affiliations

TIPP3 and TIPP3-fast: Improved abundance profiling in metagenomics

Chengze Shen et al. PLoS Comput Biol. .

Abstract

We present TIPP3 and TIPP3-fast, new tools for abundance profiling in metagenomic datasets. Like its predecessor, TIPP2, the TIPP3 pipeline uses a maximum likelihood approach to place reads into labeled taxonomies using marker genes, but it achieves superior accuracy to TIPP2 by enabling the use of much larger taxonomies through improved algorithmic techniques. We show that TIPP3 is generally more accurate than leading methods for abundance profiling in two important contexts: when reads come from genomes not already in a public database (i.e., novel genomes) and when reads contain sequencing errors. We also show that TIPP3-fast has slightly lower accuracy than TIPP3, but is also generally more accurate than other leading methods and uses a small fraction of TIPP3's runtime. Additionally, we highlight the potential benefits of restricting abundance profiling methods to those reads that map to marker genes (i.e., using a filtered marker-gene based analysis), which we show typically improves accuracy. TIPP3 is freely available at https://github.com/c5shen/TIPP3.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of the TIPP pipeline.
TIPP3 follows the same pipeline structure as TIPP and TIPP2 but differs in how some steps are performed in order to achieve higher accuracy and scalability. The common pipeline structure has three stages. Stage 1: Metagenomic reads are first binned to marker genes with BLAST. Stage 2: The query reads are added to the selected marker gene’s multiple sequence alignment, and a phylogenetic placement method is used to place reads into corresponding taxonomic trees using these alignments. Stage 3: Taxonomic labels are inferred from the placements and aggregated for the final abundance profile computation.
Fig 2
Fig 2. The impact of filtering reads on Kraken2 and Bracken for abundance profiling accuracy.
(a) Abundance profiling accuracy by normalized Hellinger distance (lower means more accurate) of two ways of running Kraken2 and Bracken on Illumina and PacBio reads from three mock microbial communities (50 known, 100 mixed, and 50 novel genomes). Dashed lines correspond to using filtered reads, and solid lines correspond to using all (unfiltered) reads. (b) Scatter plot of species-specific abundance estimation errors (PacBio reads) to corresponding genome sizes for 50 known genomes of Bracken and Kraken2 using filtered or all reads as inputs. The estimation error for each taxon is calculated as the fractional difference between its estimated abundance and the reference abundance (y-axis). A Robust Linear Model with Huber Loss [44] was used to fit a regression line for each method. The shaded area around each fitted line represents a 95% confidence interval of the corresponding method.
Fig 3
Fig 3. Normalized Hellinger distance of TIPP3, TIPP3-fast, and TIPP3-small profiling reads from mock microbial communities with known, mixed, and novel genomes.
Both TIPP3 and TIPP3-small use WITCH to add query reads to marker gene MSAs, and TIPP3-fast uses BLAST to compute query read alignments to marker gene MSAs. TIPP3 uses pplacer with the taxtastic package for placement and a support value of 90%. TIPP3-fast uses BSCAMPP for placement and a support value of 95%. TIPP3-small uses pplacer for query placement and a support value of 95%, the same setup in TIPP2 [17].
Fig 4
Fig 4. Normalized Hellinger distance of methods profiling reads from mock microbial communities with known, mixed, and novel genomes.
For PacBio read datasets, mOTUsv3 did not produce any classification or profile and thus is absent.
Fig 5
Fig 5. Normalized Hellinger distance of TIPP3, TIPP3-fast, and MetaPhlAn4 profiling Illumina reads from mock microbial communities with known, mixed, and novel genomes.
Fig 6
Fig 6. Normalized Hellinger distance of methods profiling Illumina and PacBio reads from the CAMI-II Marine dataset.
Metabuli(filtered), mOTUsv3, and MetaPhlAn4 did not produce a profile for CAMI-II Marine PacBio reads.
Fig 7
Fig 7. Species-specific abundance estimation error of methods profiling reads from a mock microbial community with known genomes.
(a) Illumina reads. (b) PacBio reads. (c) Nanopore reads. mOTUsv3 is excluded because it either produced no profile or had high abundance profiling errors except for Illumina reads. The estimation error is shown on the y-axis. For each comparison, a taxon is shown if and only if it is present in the reference and at least one method has an estimation error strictly greater than 10% in magnitude. Species are sorted left-to-right by TIPP3’s error, from overestimation to underestimation. Full results for all datasets at species and genus levels can be found in Sect G in S1 Appendix.
Fig 8
Fig 8. Runtimes of abundance profiling methods on Testing-1 datasets.

References

    1. Gilbert JA, Jansson JK, Knight R. The Earth Microbiome project: successes and aspirations. BMC Biol. 2014;12(1). doi: 10.1186/s12915-014-0069-1 - DOI - PMC - PubMed
    1. Zeevi D, Korem T, Godneva A, Bar N, Kurilshikov A, Lotan-Pompan M, et al.. Structural variation in the gut microbiome associates with host health. Nature 2019;568(7750):43–8. doi: 10.1038/s41586-019-1065-y - DOI - PubMed
    1. Fan Y, Pedersen O. Gut microbiota in human metabolic health and disease. Nat Rev Microbiol 2021;19(1):55–71. doi: 10.1038/s41579-020-0433-9 - DOI - PubMed
    1. Talmor-Barkan Y, Bar N, Shaul AA, Shahaf N, Godneva A, Bussi Y, et al.. Metabolomic and microbiome profiling reveals personalized risk factors for coronary artery disease. Nat Med 2022;28(2):295–302. doi: 10.1038/s41591-022-01686-6 - DOI - PubMed
    1. Klappenbach JA, Saxman PR, Cole JR, Schmidt TM. rrndb: the Ribosomal RNA operon copy number database. Nucleic Acids Res 2001;29(1):181–4. doi: 10.1093/nar/29.1.181 - DOI - PMC - PubMed

LinkOut - more resources