. 2013 Sep 23:14:279.

doi: 10.1186/1471-2105-14-279.

Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways

Juliane Charlotte Thøgersen¹, Morten Mørup, Søren Damkiær, Søren Molin, Lars Jelsbak

Affiliations

PMID: 24059747
PMCID: PMC3870984
DOI: 10.1186/1471-2105-14-279

Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways

Juliane Charlotte Thøgersen et al. BMC Bioinformatics. 2013.

. 2013 Sep 23:14:279.

doi: 10.1186/1471-2105-14-279.

Authors

Juliane Charlotte Thøgersen¹, Morten Mørup, Søren Damkiær, Søren Molin, Lars Jelsbak

Affiliation

¹ Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark. LJ@bio.dtu.dk.

PMID: 24059747
PMCID: PMC3870984
DOI: 10.1186/1471-2105-14-279

Abstract

Background: Analysis of global gene expression by DNA microarrays is widely used in experimental molecular biology. However, the complexity of such high-dimensional data sets makes it difficult to fully understand the underlying biological features present in the data.The aim of this study is to introduce a method for DNA microarray analysis that provides an intuitive interpretation of data through dimension reduction and pattern recognition. We present the first "Archetypal Analysis" of global gene expression. The analysis is based on microarray data from five integrated studies of Pseudomonas aeruginosa isolated from the airways of cystic fibrosis patients.

Results: Our analysis clustered samples into distinct groups with comprehensible characteristics since the archetypes representing the individual groups are closely related to samples present in the data set. Significant changes in gene expression between different groups identified adaptive changes of the bacteria residing in the cystic fibrosis lung. The analysis suggests a similar gene expression pattern between isolates with a high mutation rate (hypermutators) despite accumulation of different mutations for these isolates. This suggests positive selection in the cystic fibrosis lung environment, and changes in gene expression for these isolates are therefore most likely related to adaptation of the bacteria.

Conclusions: Archetypal analysis succeeded in identifying adaptive changes of P. aeruginosa. The combination of clustering and matrix factorization made it possible to reveal minor similarities among different groups of data, which other analytical methods failed to identify. We suggest that this analysis could be used to supplement current methods used to analyze DNA microarray data.

PubMed Disclaimer

Figures

**Figure 1**
**Flow diagram of the archetypal analysis.** First, data is collected and pre-processed. Then, Archetypal Analysis is applied resulting in a clustering of samples based on the closest defined archetype. Finally, the archetypes are characterized and evaluated in a biological context.

**Figure 2**
**Explained variance.** The explained variance plotted as a function of number of components for principal component analysis (PCA) archetypal analysis (AA) and k-means clustering (K-means). The plotted values are the mean of 10 repeated iterations. The standard deviations are indicated with error bars for k-means clustering. The standard deviations for archetypal analysis are very small and therefore not visible.

**Figure 3**
**Heatmap of archetypal analysis results. A**. The relation of each sample to the seven different archetypes shown as a heat map of the coefficient matrix S. Each row represents one of the archetypes and each column represents a sample. The corresponding studies are listed above the heat map. The shading indicates how much the individual archetypes contribute to each sample. A strong correlation close to 100% is black whereas a low or no correlation is white. The white dots that appear in archetype 5 indicate mucoid samples and the white dots that appear in archetype 6 indicate samples that are hypermutators. B. Phenotypic data i.e. adaptation state (early/late), mucoid/non-mucoid and hypermutability are indicated. Reference strains (PAO1 and PA14 from study 1 and 2) are categorized as “Early”, but they are distinguished with a blue color. C. The values of Explained Sample Variance (*ESV*) are included to show how well the samples are described by the model.

**Figure 4**
**Characterization of archetype 1, 2 and 5.** Number of up-and down-regulated genes within 26 gene ontology classes for archetype 1 **(A)**, archetype 2 **(B)** and archetype 5 **(C)**. Enriched gene-ontology classes are cross-hatched in green and red for up-and down-regulated genes respectively. The values on the x-axes are number of genes.

**Figure 5**
**Characterization of archetype 3, 6 and 7.** Number of up-and down-regulated genes within 26 gene ontology classes for each archetype for archetype 3 **(A)**, archetype 6 **(B)** and archetype 7 **(C)**. Enriched gene-ontology classes are cross-hatched in green and red for up-and down-regulated genes respectively. The values on the x-axes are number of genes.

**Figure 6**
**Comparison between archetypal analysis, principal component analysis and k-means clustering.** Visual representation of a seven-component analysis using archetypal analysis (AA), principal component analysis (PCA) and k-means clustering (K-means). Explained sample variance (*ESV*) for each analysis is included. For each PCA component the contribution to explained variance is indicated. The explained variance for a seven component analysis is indicated in brackets for each analysis.

**Figure 7**
**Principal component analysis scatter plot.** Each sample is plotted with respect to the loadings of first and second PCA component. The seven archetypes from archetypal analysis are transformed into the PCA space through a basis transformation. Each Study is indicated with a specific color. Study 1: GREEN, study 2: CYAN (samples #48-74) and BLUE (samples #75-128), study 3: RED, study 4: MAGENTA, study 5: ORANGE. The phenotypes are indicated with symbols as “Early”, “Late”, “Mucoid” and “Hypermutator”. The reference strains PAO1 and PA14 from study 1 and 2 are indicated with a symbol as “Reference”.

See this image and copyright information in PMC

Cited by

Inferring biological tasks using Pareto analysis of high-dimensional data.
Hart Y, Sheftel H, Hausser J, Szekely P, Ben-Moshe NB, Korem Y, Tendler A, Mayo AE, Alon U. Hart Y, et al. Nat Methods. 2015 Mar;12(3):233-5, 3 p following 235. doi: 10.1038/nmeth.3254. Epub 2015 Jan 26. Nat Methods. 2015. PMID: 25622107
Archetypal analysis of COVID-19 in Montana, USA, March 13, 2020 to April 26, 2022.
Stone E, Coombs S, Landguth E. Stone E, et al. PLoS One. 2024 Jan 3;19(1):e0283265. doi: 10.1371/journal.pone.0283265. eCollection 2024. PLoS One. 2024. PMID: 38170725 Free PMC article.
Tumour heterogeneity and the evolutionary trade-offs of cancer.
Hausser J, Alon U. Hausser J, et al. Nat Rev Cancer. 2020 Apr;20(4):247-257. doi: 10.1038/s41568-020-0241-6. Epub 2020 Feb 24. Nat Rev Cancer. 2020. PMID: 32094544 Review.
Environmental heterogeneity drives within-host diversification and evolution of Pseudomonas aeruginosa.
Markussen T, Marvig RL, Gómez-Lozano M, Aanæs K, Burleigh AE, Høiby N, Johansen HK, Molin S, Jelsbak L. Markussen T, et al. mBio. 2014 Sep 16;5(5):e01592-14. doi: 10.1128/mBio.01592-14. mBio. 2014. PMID: 25227464 Free PMC article.
Analysis of Genome-scale Expression Network in Four Major Bacterial Residents of Cystic Fibrosis Lung.
Hosseinkhan N, Zarrineh P, Masoudi-Nejad A. Hosseinkhan N, et al. Curr Genomics. 2014 Oct;15(5):408-18. doi: 10.2174/1389202915666140818205444. Curr Genomics. 2014. PMID: 25435803 Free PMC article.

See all "Cited by" articles

References

1. Liu W, Wang B, Glassey J, Martin E, Zhao J. A novel methodology for finding the regulation on gene expression data. Proc Natl Acad Sci U S A. 2009;19:267–272.
1. Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, Vingron M. Correspondence analysis applied to microarray data. Proc Natl Acad Sci. 2001;98:10781. doi: 10.1073/pnas.181597298. - DOI - PMC - PubMed
1. Kim MH, Seo HJ, Joung JG, Kim JH. Comprehensive evaluation of matrix factorization methods for the analysis of DNA microarray gene expression data. BMC bioinformatics. 2011;12(Suppl 1):S8. doi: 10.1186/1471-2105-12-S1-S8. - DOI - PMC - PubMed
1. Quackenbush J. Computational analysis of microarray data: nature reviews. Genetics. 2001;2:418–427. - PubMed
1. Mørup M, Hansen LK. Archetypal analysis for machine learning and data mining. Neurocomputing. 2012;80:54–63.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- Genetic Alliance
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways

Affiliation

Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical