Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 2;15(1):19270.
doi: 10.1038/s41598-025-98875-3.

CleanSeqU algorithm for decontamination of catheterized urine 16S rRNA sequencing data

Affiliations

CleanSeqU algorithm for decontamination of catheterized urine 16S rRNA sequencing data

Sung Min Yoon et al. Sci Rep. .

Abstract

Contamination in low-biomass samples, such as urine, presents a major challenge for 16S rRNA gene sequencing, as extraneous DNA from reagents and the environment often obscures microbial signals. Existing in silico decontamination algorithms face limitations in accurately identifying and removing these contaminants. To address this issue, we developed CleanSeqU, a novel decontamination algorithm designed to enhance the accuracy of 16S rRNA gene sequencing data for catheterized urine samples. This approach is grounded in the principle that the compositional pattern of potential contaminant taxa remains similar between biological samples and blank controls. Also, the algorithm identifies potential contaminants based on ecological plausibility and custom blacklist. We evaluated CleanSeqU's performance using vaginal microbiome dilution experiments as a proxy for low-biomass urine samples and compared it to the Decontam, Microdecon, and SCRuB algorithm. CleanSeqU consistently outperformed Decontam, Microdecon, and SCRuB across various contamination levels, with superior accuracy, F1-scores, and reduced beta-dissimilarity. CleanSeqU improved specificity and positive predictive value by correctly identifying and removing a higher number of contaminant amplicon sequence variants (ASVs). Furthermore, the reduced alpha diversity in the decontaminated datasets suggests more precise contaminant elimination. With its practical use of a single blank extraction control per batch and adjustable decontamination rules, CleanSeqU provides an efficient and scalable solution that delivers accurate microbial profiles. Our findings highlight its potential to significantly advance urine microbiome research by delivering more accurate microbial profiles.

Keywords: 16S rRNA gene sequencing; Blank extraction control; Decontamination algorithms; Low biomass samples; Microbial contamination; Urine microbiome.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethics approval and consent to participants: All procedures involving the leftover of human vaginal samples and the study were approved by the Ethics Committee of GC Labs (GCL-2023-1075-02). All authors have provided their consent for publication.

Figures

Fig. 1
Fig. 1
Flowchart of the CleanSeqU decontamination processes. Top 5 ASVs refers to the five ASVs identified at the highest abundance in the blank extraction control.
Fig. 2
Fig. 2
The proportion of total contaminants increases with decreasing amount of bacterial input material.
Fig. 3
Fig. 3
Stacked bar plot representing the bacteria identified in dilution series of each set. The expected bacteria from the undiluted vaginal microbial community are displayed in color, while contaminant bacteria are in grayscale. Bacteria that existed at a prevalence of less than 5% in the undiluted vaginal microbial community were designated as the other genuine bacteria.
Fig. 4
Fig. 4
The proportion of removed ASV reads in the dilution series samples in each batch determined using Decontam, Microdecon, SCRuB and ClenaSeq-U.
Fig. 5
Fig. 5
Alpha diversity calculated by Chao1 estimating species richness showed that more types of ASVs usually were removed by CleanSeqU than Decontam, Microdecon, and SCRuB across all dilution stages. An asterisk (***) indicates that Wilcoxon rank P < 0.001.
Fig. 6
Fig. 6
Stacked bar plot for classification of ASVs in each decontamination algorithms. (A) Decontam stacked bas plot, (B) Microdecon stacked bar plot, (C) SCRuB stacked bar plot, (D) CleanSeqU stacked bar plot. TP (true positive), undiluted vaginal community ASVs correctly classified; TN (true negative), contaminant ASVs correctly classified; FP (false positive), contaminant ASVs incorrectly classified; FN (false negative), undiluted vaginal community ASVs incorrectly classified.
Fig. 7
Fig. 7
Comparison of accuracy, F1-score, and Bray–Curtis dissimilarity of four algorithms according to the dilution samples in each dilution series set. (A) Accuracy, (B) F1-score, (C) Bray–Curtis dissimilarity. CleanSeqU had higher accuracy and F1-score and showed more similar results to the ground truth than the other algorithms in most of dilution samples.
Fig. 8
Fig. 8
Trend of changes in accuracy, F1-score, and Bray–Curtis dissimilarity depending on contaminant proportion using all dilution samples. (A) Accuracy, (B) F1-score, (C) Bray–Curtis dissimilarity. Both of accuracy and F1-score gradually decreased and beta-dissimilarity gradually increased as the contaminant proportion increased. Especially, the values of F1-score and beta-dissimilarity tend to change sharply in more highly contaminated samples.
Fig. 9
Fig. 9
Difference in F1-score and beta-dissimilarity between decontam, microdecon, SCRuB and CleanSeqU based on the group which was divided by the contaminant proportion 90%. (A) F1-score, (B) Bray–Curtis dissimilarity. CleanSeqU showed a significantly better F1-score and beta-dissimilarity in the group with a contaminant proportion less than 90%, in contrast, there was no significant difference in those parameters in the group with a contaminant proportion more than 90%. An asterisk (*), (***) and (****) indicates that Wilcoxon rank P < 0.1, P < 0.001 and P < 0.0001, respectively.

References

    1. Turnbaugh, P. J. et al. The Human Microbiome Project. Nature449 (7164), 804–810 (2007). - PMC - PubMed
    1. Brubaker, L. & Wolfe, A. J. The new world of the urinary microbiota in women. Am. J. Obstet. Gynecol.213 (5), 644–649 (2015). - PMC - PubMed
    1. Theis, K. R. et al. Does the human placenta delivered at term have a microbiota? Results of cultivation, quantitative real-time PCR, 16S rRNA gene sequencing, and metagenomics. Am. J. Obstet. Gynecol.220 (3), 267 (2019). e1-267 e39. - PMC - PubMed
    1. Aho, V. T. E. et al. The Microbiome of the human lower airways: A next generation sequencing perspective. World Allergy Organ. J.8 (1), 23 (2015). - PMC - PubMed
    1. Pohl, H. G. et al. The urine microbiome of healthy men and women differs by urine collection method. Int. Neurourol. J.24 (1), 41–51 (2020). - PMC - PubMed

Substances

LinkOut - more resources