Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 7;11(1):130.
doi: 10.1038/s41522-025-00762-2.

Benefits and challenges of host depletion methods in profiling the upper and lower respiratory microbiome

Affiliations

Benefits and challenges of host depletion methods in profiling the upper and lower respiratory microbiome

Chun Wang et al. NPJ Biofilms Microbiomes. .

Abstract

Metagenomic sequencing for respiratory pathogen detection faces two challenges: efficient host DNA depletion and the representativeness of upper respiratory samples for lower tract infections. In this study, we benchmarked seven host depletion methods, including a new method (F_ase), using bronchoalveolar lavage fluid (BALF), oropharyngeal swab (OP), and mock samples. All methods significantly increased microbial reads, species richness, genes richness, and genome coverage while reduced bacterial biomass, introduced contamination, and altered microbial abundance. Some commensals and pathogens, including Prevotella spp. and Mycoplasma pneumoniae, were significantly diminished. F_ase demonstrated the most balanced performance. High-resolution microbiomes profiling revealed distinct microbial niche preferences and microbiome disparities between the upper and lower respiratory tract. In pneumonia patients, 16.7% of high-abundance species (>1%) in BALF were underrepresented (<0.1%) in OP, highlighting OP's limitations as lower respiratory proxies. This study underscores both the potential and challenges of metagenomic sequencing in advancing microbial ecology and clinical research.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Respiratory sample characteristics and the schematic overview of host DNA depletion methods.
A The amount of bacterial DNA. B The amount of host DNA. C The ratio of microbe-to-host read numbers in metagenomic data. Reads from contaminated microbes were discarded. In (AC), each dot represents a sample, and samples from the same individuals were connected with lines; the center line of the boxplot represents the median, box limits represent the upper and lower quartiles, and whiskers represent the 1.5x interquartile range. **p < 0.01, ***p < 0.001, Wilcoxon matched pairs tests. n = 35 for BALF and n = 34 for OP. D Schematic overview of host DNA depletion methods evaluated in this study.
Fig. 2
Fig. 2. Comparison of effectiveness between host DNA depletion methods.
Host DNA concentration (pg/ml BALF or pg/swab) in BALF (A) and OP (B) samples treated with different host DNA depletion methods. Bacterial retention rate in BALF (C) and OP (D) samples after host removal. Data in (AD) were measured using qPCR essays. The proportion of non-contaminant microbial reads in raw sequencing data for BALF (E) and OP (F). Fold change of species richness (G) and gene family richness (H) between host-depleted and raw BALF samples. Fold change of species richness (I) and gene family richness (J) between host-depleted and raw OP samples. Species richness was represented by the number of species. Gene family richness was represented by the number of gene families. Correlation between the proportion of microbial reads in Raw samples and the improvement in species richness (K) and gene family richness (L) after host depletions. The y-axis represents the difference in richness between host-depleted and Raw samples. Fitting curves of loess regression are indicated for each method. In (AJ), the center line of the boxplot represents the median, box limits represent upper and lower quartiles, and whiskers represent the 1.5x interquartile range. Different letters on the top indicate statistically significant differences among methods (Wilcoxon matched pairs tests followed by FDR adjustment). In (GL), data from eight samples from the same patient were rarefied to the lowest sequencing depth among them. n = 35 for BALF and n = 34 for OP.
Fig. 3
Fig. 3. Influence of host DNA depletion on the fidelity of microbiome profiling.
Similarity in microbial composition between R_ase and other methods, and proportion of contaminating microbial reads in total microbial reads in BALF (A) and OP (B) samples. Dots indicate median values of different methods, and error bars indicate standard deviations. JSD (Jensen-Shannon distance) was calculated excluding identified contaminating components. The gray area on the left indicates the range of JSD between technical replicates, while the gray area on the right indicates the 5th to 95th percentile range of JSD among samples from different individuals. Absolute and relative abundance changes of common species in BALF (C) and OP (D) samples. Fold changes in abundances were calculated between different methods and R_ase. Asterisks indicate significant differences in abundances between different methods and R_ase, *p.adj < 0.05, **p.adj < 0.01, ***p.adj < 0.001, Wilcoxon matched pairs tests. n = 35 for BALF and n = 34 for OP.
Fig. 4
Fig. 4. Integrated evaluation of host DNA depletion methods.
Radar charts showing the performance of host DNA depletion methods on five evaluation metrics in BALF (A) and OP (B) samples. Accuracy was defined as 1 subtracted by the JSD between microbial compositions of different methods and R_ase. The contamination level was defined as 1 subtracted by the proportion of contaminants in total microbial reads. Microbial proportion and accuracy were estimated excluding exogenous contamination. Maximum-minimum measures were used for data scaling. The two methods with the best performance (largest radar area) are labeled in bold. n = 35 for BALF and n = 34 for OP.
Fig. 5
Fig. 5. Evaluation of host DNA depletion methods using a mock microbial community.
A Microbial composition of the mock community obtained with different methods. Hierarchical clustering was performed based on JSD. B Fold changes in microbial absolute abundances between different methods and R_ase. Significant differences between different methods and R_ase are indicated with colorized dots (paired Student’s t-tests followed by FDR adjustment). n = 3 replicates.
Fig. 6
Fig. 6. Comparison between the upper and lower respiratory tract microbiomes.
A Principal Co-ordinates Analysis (PCoA) plot based on JSD of microbial compositions. R2 and p values from PERMANOVA are shown. BALF and OP samples from the same individuals were connected with lines. Inserted boxplots show the JSD between different sample types, and different letters on the top indicate statistically significant differences between groups. B 2D density plot showing the correlation of species abundances between BALF and OP. A total of 209 species with relative abundances higher than 0.01 in at least one sample were included. Pearson correlation coefficient and p value are shown. C Volcano plot showing species with differential relative abundances between BALF and OP samples. The horizontal dash line indicates the adjusted p value of 0.05. Differential species are color-coded and labeled with their names. In (A, B), Wilcoxon matched pairs test followed by FDR adjustment was used. D The proportion of OP-BALF-shared species abundance to the total abundance of species. Comparison of the proportion of OP-BALF-shared species between pneumonia and non-pneumonia cases, as per abundance (E) and number (F) of species. In (DF), only abundant species with relative abundances higher than 0.01 were included. G The relationship between strains detected in the upper and lower respiratory tracts. The number as well as the color and size of dots indicate the number of strains. Symbols on the top indicate the following relationships: strains in BALF were a subset of those in OP, BALF and OP had the same strains, BALF and OP had partially overlapping strains, BALF and OP had completely different strains, strains in OP were a subset of those in BALF. The boxplots on the left indicate the read number of species. In (EG), asterisks indicate significant differences, *p.adj < 0.05, **p.adj < 0.01, ***p.adj < 0.001, Wilcoxon matched pairs tests. In (A) and (EG), the center line of the boxplot represents the median, box limits represent upper and lower quartiles, and whiskers represent 1.5x interquartile range. F_ase treated samples were used. n = 34 for BALF and OP.

References

    1. Pérez-Cobas, A. E., Rodríguez-Beltrán, J., Baquero, F. & Coque, T. M. Ecology of the respiratory tract microbiome. Trends Microbiol.31, 972–984 (2023). - PubMed
    1. Natalini, J. G., Singh, S. & Segal, L. N. The dynamic lung microbiome in health and disease. Nat. Rev. Microbiol.21, 222–235 (2023). - PMC - PubMed
    1. Di Simone, S. K., Rudloff, I., Nold-Petry, C. A., Forster, S. C. & Nold, M. F. Understanding respiratory microbiome-immune system interactions in health and disease. Sci. Transl. Med.15, eabq5126 (2023). - PubMed
    1. He, Y. et al. Enhanced DNA and RNA pathogen detection via metagenomic sequencing in patients with pneumonia. J. Transl. Med.20, 195 (2022). - PMC - PubMed
    1. Qu, J. et al. Aetiology of severe community acquired pneumonia in adults identified by combined detection methods: a multi-centre prospective study in China. Emerg. Microbes Infect.11, 556–566 (2022). - PMC - PubMed

MeSH terms