. 2022 Aug 16;19(16):10107.

doi: 10.3390/ijerph191610107.

Compositional Data Analysis of 16S rRNA Gene Sequencing Results from Hospital Airborne Microbiome Samples

Maria Rita Perrone¹, Salvatore Romano¹, Giuseppe De Maria², Paolo Tundo², Anna Rita Bruno², Luigi Tagliaferro², Michele Maffia³, Mattia Fragola¹

Affiliations

¹ Department of Mathematics and Physics, University of Salento, 73100 Lecce, Italy.
² Presidio Ospedaliero Santa Caterina Novella, Azienda Sanitaria Locale Lecce, 73013 Galatina, Italy.
³ Department of Biological and Environmental Sciences and Technologies, University of Salento, 73100 Lecce, Italy.

PMID: 36011742
PMCID: PMC9408509
DOI: 10.3390/ijerph191610107

Compositional Data Analysis of 16S rRNA Gene Sequencing Results from Hospital Airborne Microbiome Samples

Maria Rita Perrone et al. Int J Environ Res Public Health. 2022.

. 2022 Aug 16;19(16):10107.

doi: 10.3390/ijerph191610107.

Authors

Maria Rita Perrone¹, Salvatore Romano¹, Giuseppe De Maria², Paolo Tundo², Anna Rita Bruno², Luigi Tagliaferro², Michele Maffia³, Mattia Fragola¹

Affiliations

¹ Department of Mathematics and Physics, University of Salento, 73100 Lecce, Italy.
² Presidio Ospedaliero Santa Caterina Novella, Azienda Sanitaria Locale Lecce, 73013 Galatina, Italy.
³ Department of Biological and Environmental Sciences and Technologies, University of Salento, 73100 Lecce, Italy.

PMID: 36011742
PMCID: PMC9408509
DOI: 10.3390/ijerph191610107

Abstract

The compositional analysis of 16S rRNA gene sequencing datasets is applied to characterize the bacterial structure of airborne samples collected in different locations of a hospital infection disease department hosting COVID-19 patients, as well as to investigate the relationships among bacterial taxa at the genus and species level. The exploration of the centered log-ratio transformed data by the principal component analysis via the singular value decomposition has shown that the collected samples segregated with an observable separation depending on the monitoring location. More specifically, two main sample clusters were identified with regards to bacterial genera (species), consisting of samples mostly collected in rooms with and without COVID-19 patients, respectively. Human pathogenic genera (species) associated with nosocomial infections were mostly found in samples from areas hosting patients, while non-pathogenic genera (species) mainly isolated from soil were detected in the other samples. Propionibacterium acnes, Staphylococcus pettenkoferi, Corynebacterium tuberculostearicum, and jeikeium were the main pathogenic species detected in COVID-19 patients' rooms. Samples from these locations were on average characterized by smaller richness/evenness and diversity than the other ones, both at the genus and species level. Finally, the ρ metrics revealed that pairwise positive associations occurred either between pathogenic or non-pathogenic taxa.

Keywords: 16S rRNA gene sequencing; Aitchison distance; CLR transformation; airborne microbiome; alpha-diversity; compositional data; singular value decomposition; ρ metrics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
(a) Heatmap based on the centered log-ratio (CLR) transformed values of the 25 selected bacterial genera; (b,c) show the Aitchison distance-based dendrograms highlighting the relatedness between different samples and genera, respectively. The red arrows in (b) allow identifying the two main sample clusters identified by the dendrogram, where Cluster 1 includes the samples A_HR, DD1, B_R2, B_BAT, and B_R1, and Cluster 2 consists of all the other samples. WPS in the genus legend stands for WPSUnclassified1_genera_incertae_sedis.

**Figure 2**
Boxplots displaying the (a) Shannon and (b) Simpson index value calculated at the genus level for the samples belonging to Cluster 1 (A_HR, DD1, B_R2, B_BAT, B_R1) and Cluster 2 (A_R1, HR1, PSY, RO1, RO2, HR2, DD2, R4, R3, B+C_R2, MED, B+C_R1). For each boxplot, the line within the box and the white dots represent the median and mean value, respectively. The bottom and top boundaries of each boxplot indicate the 25th and 75th percentiles, respectively. The whiskers are the 5th and 95th percentiles, respectively.

**Figure 3**
Two-dimensional principal component analysis biplot made via a singular value decomposition of the CLR-transformed values for the 25 selected bacterial genera. The reported biplot illustrates the relationships between samples (score plot, red dots) and genera (loading plot, black arrows). The percentages of the total variance explained by the first and second principal components are also reported. Note that *WPS* represents *WPSUnclassified1_genera_incertae_sedis*.

**Figure 4**
(a) Heatmap based on the centered log-ratio (CLR) transformed values of the 20 selected bacterial species reads; (b,c) show the Aitchison distance-based dendrograms highlighting the relatedness between different samples and species, respectively. The two main sample clusters defined by the corresponding dendrogram are also indicated in (b). Legend: S. (*Staphylococcus*) *pettenkoferi*, C. (*Corynebacterium*) *tuberculostearicum*, A. (*Acinetobacter*) *lwoffii*, C. (*Corynebacterium*) *jeikeium*, S. (*Staphylococcus*) *cohnii*, C. (*Corynebacterium*) *vitaeruminis*, P. (*Propionibacterium*) *acnes*, M. (*Methyloversatilis*) *universalis*, G. (*Gemmatirosa*) *kalamazoonesis*, R. (*Rubellimicrobium*) *roseum*, u. (*uncultured*) *Eubacterium*, B. (*Blastococcus*) *aggregatus*, N. (*Nitrolancea*) *hollandica*, S. (*Solirubrobacter*) *ginsenosidimutans*, u. (*uncultured*) *Acidobacteria* (EF457480), M. (*Modestobacter*) *lapidis*, G. (*Gemmatimonas*) *phototrophica*, M. (*Microvirga*) *lupini*, and u. (*uncultured*) *Acidobacteria* (EF457419).

**Figure 5**
Boxplots displaying (a) Shannon and (b) Simpson indices calculated at the species level for samples belonging to Cluster 1 (B_R2, B_BAT, B_R1, DD1, A_R1, A_HR, HR1) and Cluster 2 (RO1, R3, HR2, DD2, RO2, R4, B+C1_R1, PSY, MED, B+C_R2). For each boxplot, the line within the box and the white dots represent the median and mean value, respectively. The bottom and top boundaries of each boxplot indicate the 25th and 75th percentiles, respectively. The whiskers are the 5th and 95th percentiles, respectively.

**Figure 6**
Two-dimensional SVD-PCA of the CLR-transformed values for the 20 selected bacterial species by the score (red dots) and the loading (black arrows) plot. The percentages of the total variance explained by the first and second principal components are also reported. Legend: *M.un* (*Methyloversatilis universalis*), *P.ac* (*Propionibacterium acnes*), *C.vi* (*Corynebacterium vitaeruminis*), *S.pe* (*Staphylococcus pettenkoferi*), *C.tu* (*Corynebacterium tuberculostearicum*), *C.je* (*Corynebacterium jeikeium*), *S.co* (*Staphylococcus cohnii*), *A.lw* (*Acinetobacter lwoffii*), *M.lu* (*Microvirga lupini*), *G.ph* (*Gemmatimonas phototrophica*), *N.ho* (*Nitrolancea hollandica*), *u.Eu* (*uncultured Eubacterium*), *R.ro* (*Rubellimicrobium roseum*), *B.ag* (*Blastococcus aggregatus*), *u.Aci* (*uncultured Acidobacteria* EF457480), S.sp. (*Solirubrobacter* sp.), *S.gi* (*Solirubrobacter ginsenosidimutans*), *G.ka* (*Gemmatirosa kalamazoonesis*), *M.la* (*Modestobacter lapidis*), and *u.Ac* (*uncultured Acidobacteria* EF457419).

See this image and copyright information in PMC

References

1. Fernandes A.D., Reid J.N., Macklaim J.M., McMurrough T.A., Edgell D.R., Gloor G.B. Unifying the Analysis of High-Throughput Sequencing Datasets: Characterizing RNA-Seq, 16S RRNA Gene Sequencing and Selective Growth Experiments by Compositional Data Analysis. Microbiome. 2014;2:15. doi: 10.1186/2049-2618-2-15. - DOI - PMC - PubMed
1. Gloor G.B., Macklaim J.M., Pawlowsky-Glahn V., Egozcue J.J. Microbiome Datasets Are Compositional: And This Is Not Optional. Front. Microbiol. 2017;8:2224. doi: 10.3389/fmicb.2017.02224. - DOI - PMC - PubMed
1. Nearing J.T., Douglas G.M., Hayes M.G., MacDonald J., Desai D.K., Allward N., Jones C.M.A., Wright R.J., Dhanani A.S., Comeau A.M., et al. Microbiome Differential Abundance Methods Produce Different Results across 38 Datasets. Nat. Commun. 2022;13:342. doi: 10.1038/s41467-022-28034-z. - DOI - PMC - PubMed
1. Xia Y., Sun J., Chen D.-G. Statistical Analysis of Microbiome Data with R. Springer; Singapore: 2018. Compositional Analysis of Microbiome Data; pp. 331–393. - DOI
1. Kleine Bardenhorst S., Berger T., Klawonn F., Vital M., Karch A., Rübsamen N. Data Analysis Strategies for Microbiome Studies in Human Populations-a Systematic Review of Current Practice. mSystems. 2021;6:1. doi: 10.1128/mSystems.01154-20. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Compositional Data Analysis of 16S rRNA Gene Sequencing Results from Hospital Airborne Microbiome Samples

Affiliations

Compositional Data Analysis of 16S rRNA Gene Sequencing Results from Hospital Airborne Microbiome Samples

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical