Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 16;19(16):10107.
doi: 10.3390/ijerph191610107.

Compositional Data Analysis of 16S rRNA Gene Sequencing Results from Hospital Airborne Microbiome Samples

Affiliations

Compositional Data Analysis of 16S rRNA Gene Sequencing Results from Hospital Airborne Microbiome Samples

Maria Rita Perrone et al. Int J Environ Res Public Health. .

Abstract

The compositional analysis of 16S rRNA gene sequencing datasets is applied to characterize the bacterial structure of airborne samples collected in different locations of a hospital infection disease department hosting COVID-19 patients, as well as to investigate the relationships among bacterial taxa at the genus and species level. The exploration of the centered log-ratio transformed data by the principal component analysis via the singular value decomposition has shown that the collected samples segregated with an observable separation depending on the monitoring location. More specifically, two main sample clusters were identified with regards to bacterial genera (species), consisting of samples mostly collected in rooms with and without COVID-19 patients, respectively. Human pathogenic genera (species) associated with nosocomial infections were mostly found in samples from areas hosting patients, while non-pathogenic genera (species) mainly isolated from soil were detected in the other samples. Propionibacterium acnes, Staphylococcus pettenkoferi, Corynebacterium tuberculostearicum, and jeikeium were the main pathogenic species detected in COVID-19 patients' rooms. Samples from these locations were on average characterized by smaller richness/evenness and diversity than the other ones, both at the genus and species level. Finally, the ρ metrics revealed that pairwise positive associations occurred either between pathogenic or non-pathogenic taxa.

Keywords: 16S rRNA gene sequencing; Aitchison distance; CLR transformation; airborne microbiome; alpha-diversity; compositional data; singular value decomposition; ρ metrics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
(a) Heatmap based on the centered log-ratio (CLR) transformed values of the 25 selected bacterial genera; (b,c) show the Aitchison distance-based dendrograms highlighting the relatedness between different samples and genera, respectively. The red arrows in (b) allow identifying the two main sample clusters identified by the dendrogram, where Cluster 1 includes the samples A_HR, DD1, B_R2, B_BAT, and B_R1, and Cluster 2 consists of all the other samples. WPS in the genus legend stands for WPSUnclassified1_genera_incertae_sedis.
Figure 2
Figure 2
Boxplots displaying the (a) Shannon and (b) Simpson index value calculated at the genus level for the samples belonging to Cluster 1 (A_HR, DD1, B_R2, B_BAT, B_R1) and Cluster 2 (A_R1, HR1, PSY, RO1, RO2, HR2, DD2, R4, R3, B+C_R2, MED, B+C_R1). For each boxplot, the line within the box and the white dots represent the median and mean value, respectively. The bottom and top boundaries of each boxplot indicate the 25th and 75th percentiles, respectively. The whiskers are the 5th and 95th percentiles, respectively.
Figure 3
Figure 3
Two-dimensional principal component analysis biplot made via a singular value decomposition of the CLR-transformed values for the 25 selected bacterial genera. The reported biplot illustrates the relationships between samples (score plot, red dots) and genera (loading plot, black arrows). The percentages of the total variance explained by the first and second principal components are also reported. Note that WPS represents WPSUnclassified1_genera_incertae_sedis.
Figure 4
Figure 4
(a) Heatmap based on the centered log-ratio (CLR) transformed values of the 20 selected bacterial species reads; (b,c) show the Aitchison distance-based dendrograms highlighting the relatedness between different samples and species, respectively. The two main sample clusters defined by the corresponding dendrogram are also indicated in (b). Legend: S. (Staphylococcus) pettenkoferi, C. (Corynebacterium) tuberculostearicum, A. (Acinetobacter) lwoffii, C. (Corynebacterium) jeikeium, S. (Staphylococcus) cohnii, C. (Corynebacterium) vitaeruminis, P. (Propionibacterium) acnes, M. (Methyloversatilis) universalis, G. (Gemmatirosa) kalamazoonesis, R. (Rubellimicrobium) roseum, u. (uncultured) Eubacterium, B. (Blastococcus) aggregatus, N. (Nitrolancea) hollandica, S. (Solirubrobacter) ginsenosidimutans, u. (uncultured) Acidobacteria (EF457480), M. (Modestobacter) lapidis, G. (Gemmatimonas) phototrophica, M. (Microvirga) lupini, and u. (uncultured) Acidobacteria (EF457419).
Figure 5
Figure 5
Boxplots displaying (a) Shannon and (b) Simpson indices calculated at the species level for samples belonging to Cluster 1 (B_R2, B_BAT, B_R1, DD1, A_R1, A_HR, HR1) and Cluster 2 (RO1, R3, HR2, DD2, RO2, R4, B+C1_R1, PSY, MED, B+C_R2). For each boxplot, the line within the box and the white dots represent the median and mean value, respectively. The bottom and top boundaries of each boxplot indicate the 25th and 75th percentiles, respectively. The whiskers are the 5th and 95th percentiles, respectively.
Figure 6
Figure 6
Two-dimensional SVD-PCA of the CLR-transformed values for the 20 selected bacterial species by the score (red dots) and the loading (black arrows) plot. The percentages of the total variance explained by the first and second principal components are also reported. Legend: M.un (Methyloversatilis universalis), P.ac (Propionibacterium acnes), C.vi (Corynebacterium vitaeruminis), S.pe (Staphylococcus pettenkoferi), C.tu (Corynebacterium tuberculostearicum), C.je (Corynebacterium jeikeium), S.co (Staphylococcus cohnii), A.lw (Acinetobacter lwoffii), M.lu (Microvirga lupini), G.ph (Gemmatimonas phototrophica), N.ho (Nitrolancea hollandica), u.Eu (uncultured Eubacterium), R.ro (Rubellimicrobium roseum), B.ag (Blastococcus aggregatus), u.Aci (uncultured Acidobacteria EF457480), S.sp. (Solirubrobacter sp.), S.gi (Solirubrobacter ginsenosidimutans), G.ka (Gemmatirosa kalamazoonesis), M.la (Modestobacter lapidis), and u.Ac (uncultured Acidobacteria EF457419).

References

    1. Fernandes A.D., Reid J.N., Macklaim J.M., McMurrough T.A., Edgell D.R., Gloor G.B. Unifying the Analysis of High-Throughput Sequencing Datasets: Characterizing RNA-Seq, 16S RRNA Gene Sequencing and Selective Growth Experiments by Compositional Data Analysis. Microbiome. 2014;2:15. doi: 10.1186/2049-2618-2-15. - DOI - PMC - PubMed
    1. Gloor G.B., Macklaim J.M., Pawlowsky-Glahn V., Egozcue J.J. Microbiome Datasets Are Compositional: And This Is Not Optional. Front. Microbiol. 2017;8:2224. doi: 10.3389/fmicb.2017.02224. - DOI - PMC - PubMed
    1. Nearing J.T., Douglas G.M., Hayes M.G., MacDonald J., Desai D.K., Allward N., Jones C.M.A., Wright R.J., Dhanani A.S., Comeau A.M., et al. Microbiome Differential Abundance Methods Produce Different Results across 38 Datasets. Nat. Commun. 2022;13:342. doi: 10.1038/s41467-022-28034-z. - DOI - PMC - PubMed
    1. Xia Y., Sun J., Chen D.-G. Statistical Analysis of Microbiome Data with R. Springer; Singapore: 2018. Compositional Analysis of Microbiome Data; pp. 331–393. - DOI
    1. Kleine Bardenhorst S., Berger T., Klawonn F., Vital M., Karch A., Rübsamen N. Data Analysis Strategies for Microbiome Studies in Human Populations-a Systematic Review of Current Practice. mSystems. 2021;6:1. doi: 10.1128/mSystems.01154-20. - DOI - PMC - PubMed

Publication types

Substances