Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 23;21(1):269.
doi: 10.1186/s12915-023-01737-5.

Benchmarking MicrobIEM - a user-friendly tool for decontamination of microbiome sequencing data

Affiliations

Benchmarking MicrobIEM - a user-friendly tool for decontamination of microbiome sequencing data

Claudia Hülpüsch et al. BMC Biol. .

Abstract

Background: Microbiome analysis is becoming a standard component in many scientific studies, but also requires extensive quality control of the 16S rRNA gene sequencing data prior to analysis. In particular, when investigating low-biomass microbial environments such as human skin, contaminants distort the true microbiome sample composition and need to be removed bioinformatically. We introduce MicrobIEM, a novel tool to bioinformatically remove contaminants using negative controls.

Results: We benchmarked MicrobIEM against five established decontamination approaches in four 16S rRNA amplicon sequencing datasets: three serially diluted mock communities (108-103 cells, 0.4-80% contamination) with even or staggered taxon compositions and a skin microbiome dataset. Results depended strongly on user-selected algorithm parameters. Overall, sample-based algorithms separated mock and contaminant sequences best in the even mock, whereas control-based algorithms performed better in the two staggered mocks, particularly in low-biomass samples (≤ 106 cells). We show that a correct decontamination benchmarking requires realistic staggered mock communities and unbiased evaluation measures such as Youden's index. In the skin dataset, the Decontam prevalence filter and MicrobIEM's ratio filter effectively reduced common contaminants while keeping skin-associated genera.

Conclusions: MicrobIEM's ratio filter for decontamination performs better or as good as established bioinformatic decontamination tools. In contrast to established tools, MicrobIEM additionally provides interactive plots and supports selecting appropriate filtering parameters via a user-friendly graphical user interface. Therefore, MicrobIEM is the first quality control tool for microbiome experts without coding experience.

Keywords: 16S rRNA gene sequencing; Bioinformatic decontamination; Decontam; Low-biomass microbiome; Negative control; SourceTracker; Youden’s index.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the decontamination benchmarking study design. Three mock datasets were used for decontamination benchmarking, one with an even, and two with a staggered community structure. Mock communities were available as dilution series covering a wide range of bacterial biomass per sample (108 to 5.55 × 103 bacterial cells). Two sample-based and five control-based decontamination algorithms were compared based on their classification performance into mock and contaminant reads, evaluated by Youden’s index and other evaluation scores. The same parameters and tools were also evaluated in a low-biomass environmental dataset from the skin. Additional information about the decontamination filters implemented in MicrobIEM can be found in Additional file 1: Supplementary Figure 1
Fig. 2
Fig. 2
Sample composition and level of contamination by dilution in the mock communities used for benchmarking. The proportion of contaminants increases with decreasing amount of bacterial input material, both in the even mock community (A) and in the two staggered mock communities A and B (B, C). The even mock community (A) contains species of 6–22% expected relative abundance, and comprises threefold serial dilutions from 1.5 × 109 to 2.3 × 105 bacterial input cells and one pipeline negative control (NEG). The staggered mock community A (B) contains species of 0.18–18% expected relative abundance, and comprises tenfold serial dilutions from 1 × 109 to 1 × 102 bacterial input cells and three pipeline negative controls (NEG). The staggered mock community B (C) contains species of 3–84.3% expected relative abundance, and comprises 20-fold serial dilutions from 1.1 × 105 to 5.55 × 103 bacterial input cells and four pipeline negative controls (NEG). Each bar in B shows the mean composition per triplicate (103: duplicate) per dilution, and each bar in C shows the mean composition over four replicates per dilution. Reads not matching expected sequences were defined as contaminants (see details in “Methods”). Dilutions highlighted in bold are selected for decontamination benchmarking
Fig. 3
Fig. 3
Benchmarking of decontamination algorithms in mock communities. In the even mock community (A), sample-based decontamination algorithms perform best (frequency filter, Decontam frequency filter); whereas in the staggered mock communities A and B (B, C), control-based decontamination algorithms perform better (Decontam prevalence filter, SourceTracker, presence filter, MicrobIEM span filter, MicrobIEM ratio filter). MicrobIEM’s span filter of “1 of all” is equivalent to the presence filter, and the number of available thresholds for MicrobIEM’s span filter depends on the number of negative controls per dataset (A: 1, B: 3, C: 4 pipeline negative controls). Each algorithm was evaluated by its ability to distinguish expected mock reads from contaminating reads (defined by reads not matching expected sequences), from high (108) to low-biomass samples (103 bacterial cells). The performance per algorithm was quantified by Youden’s index, ranging from 1 (perfect classification) over 0 (random classification) to − 1 (indicating reversed labels). Algorithms were run separately per dilution, except for the Decontam frequency filter in A and SourceTracker in all datasets. Values in B represent mean values over triplicates per dilution, and values in C represent mean values over four replicates per dilution. Freq. = frequency, prev. = prevalence
Fig. 4
Fig. 4
Effect of decontamination algorithms on major skin inhabitants and contaminants in a low-biomass skin microbiome dataset. The effect of six bioinformatic decontamination algorithms with tool-specific thresholds was evaluated on three typical skin inhabitants (CorynebacteriumCutibacterium, and Staphylococcus) and three potential contaminants (Acinetobacter, Comamonas, Pseudomonas). While sample-based decontamination algorithms (frequency filter, Decontam frequency filter) had little effect on the relative abundance of the top 10 genera of the low-biomass skin microbiome dataset, control-based decontamination algorithms (Decontam prevalence filter, SourceTracker, presence filter, MicrobIEM span filter, MicrobIEM ratio filter) specifically reduced Pseudomonas and Comamonas. MicrobIEM’s span filter of “1 of all” is equivalent to the presence filter. Horizontal black lines indicate the relative abundance per genus before applying the bioinformatic decontamination approaches. Freq. = frequency, prev. = prevalence
Fig. 5
Fig. 5
Screenshots of interactive graphical support for contamination removal with MicrobIEM. The interactive graphical user interface of MicrobIEM supports the user by displaying which features are removed (in orange, A) or kept (in blue, B) with the current filter threshold indicated as a vertical black line. In this example, filtering is based on the pipeline negative control NEG2. Each datapoint represents one feature (ASV or OTU) present in the selected control type (NEG2), and bubble area indicates the mean relative abundance per feature in the samples. Interactive hover texts (orange box in A, blue box in B) provide further information per feature, such as ID, taxonomy and mean relative abundance over samples
Fig. 6
Fig. 6
Screenshots of interactive graphical outputs from MicrobIEM's basic microbiome analysis options. The interactive graphical user interface of MicrobIEM facilitates basic microbiome analysis. Implemented are alpha diversity analysis (A), beta diversity analysis (B), and analysis of the taxonomic composition (C) based on metadata and an easy and dynamic sample selection within the tool. As an example, differences in microbiome alpha diversity at week 0 and week 8 (A) and in global microbiome structure (B) are shown by lesional (LS) versus non-lesional (NL) skin, while C displays the sample composition per patient on genus level at a selected timepoint (week 8). Dots in A and B indicate individual samples. Boxes in A denote the median and interquartile range (IQR, distance between 25 and 75th percentile), and whiskers represent values up to 1.5 times the IQR. Ellipses in B denote 95% confidence intervals around cluster centroids based on a multivariate t-distribution. Bars in C show the microbiome composition of the ten most abundant genera at one timepoint per patient, while remaining genera are summarized as “Others”

Similar articles

Cited by

References

    1. Herzyk P. Chapter 8 - Next-Generation Sequencing. In: Padmanabhan S, editor. Handbook of Pharmacogenomics and Stratified Medicine. San Diego: Academic Press; 2014. pp. 125–145.
    1. Marchesi JR, Ravel J. The vocabulary of microbiome research: a proposal. Microbiome. 2015;3(1):31. doi: 10.1186/s40168-015-0094-5. - DOI - PMC - PubMed
    1. Kong HH, Oh J, Deming C, Conlan S, Grice EA, Beatson MA, et al. Temporal shifts in the skin microbiome associated with disease flares and treatment in children with atopic dermatitis. Genome Res. 2012;22(5):850–859. doi: 10.1101/gr.131029.111. - DOI - PMC - PubMed
    1. Shreiner AB, Kao JY, Young VB. The gut microbiome in health and in disease. Curr Opin Gastroenterol. 2015;31(1):69–75. doi: 10.1097/MOG.0000000000000139. - DOI - PMC - PubMed
    1. Kim H-J, Kim JJ, Myeong NR, Kim T, Kim D, An S, et al. Segregation of age-related skin microbiome characteristics by functionality. Sci Rep. 2019;9(1):16748. doi: 10.1038/s41598-019-53266-3. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources