Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 23:14:279.
doi: 10.1186/1471-2105-14-279.

Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways

Affiliations

Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways

Juliane Charlotte Thøgersen et al. BMC Bioinformatics. .

Abstract

Background: Analysis of global gene expression by DNA microarrays is widely used in experimental molecular biology. However, the complexity of such high-dimensional data sets makes it difficult to fully understand the underlying biological features present in the data.The aim of this study is to introduce a method for DNA microarray analysis that provides an intuitive interpretation of data through dimension reduction and pattern recognition. We present the first "Archetypal Analysis" of global gene expression. The analysis is based on microarray data from five integrated studies of Pseudomonas aeruginosa isolated from the airways of cystic fibrosis patients.

Results: Our analysis clustered samples into distinct groups with comprehensible characteristics since the archetypes representing the individual groups are closely related to samples present in the data set. Significant changes in gene expression between different groups identified adaptive changes of the bacteria residing in the cystic fibrosis lung. The analysis suggests a similar gene expression pattern between isolates with a high mutation rate (hypermutators) despite accumulation of different mutations for these isolates. This suggests positive selection in the cystic fibrosis lung environment, and changes in gene expression for these isolates are therefore most likely related to adaptation of the bacteria.

Conclusions: Archetypal analysis succeeded in identifying adaptive changes of P. aeruginosa. The combination of clustering and matrix factorization made it possible to reveal minor similarities among different groups of data, which other analytical methods failed to identify. We suggest that this analysis could be used to supplement current methods used to analyze DNA microarray data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow diagram of the archetypal analysis. First, data is collected and pre-processed. Then, Archetypal Analysis is applied resulting in a clustering of samples based on the closest defined archetype. Finally, the archetypes are characterized and evaluated in a biological context.
Figure 2
Figure 2
Explained variance. The explained variance plotted as a function of number of components for principal component analysis (PCA) archetypal analysis (AA) and k-means clustering (K-means). The plotted values are the mean of 10 repeated iterations. The standard deviations are indicated with error bars for k-means clustering. The standard deviations for archetypal analysis are very small and therefore not visible.
Figure 3
Figure 3
Heatmap of archetypal analysis results. A. The relation of each sample to the seven different archetypes shown as a heat map of the coefficient matrix S. Each row represents one of the archetypes and each column represents a sample. The corresponding studies are listed above the heat map. The shading indicates how much the individual archetypes contribute to each sample. A strong correlation close to 100% is black whereas a low or no correlation is white. The white dots that appear in archetype 5 indicate mucoid samples and the white dots that appear in archetype 6 indicate samples that are hypermutators. B. Phenotypic data i.e. adaptation state (early/late), mucoid/non-mucoid and hypermutability are indicated. Reference strains (PAO1 and PA14 from study 1 and 2) are categorized as “Early”, but they are distinguished with a blue color. C. The values of Explained Sample Variance (ESV) are included to show how well the samples are described by the model.
Figure 4
Figure 4
Characterization of archetype 1, 2 and 5. Number of up-and down-regulated genes within 26 gene ontology classes for archetype 1 (A), archetype 2 (B) and archetype 5 (C). Enriched gene-ontology classes are cross-hatched in green and red for up-and down-regulated genes respectively. The values on the x-axes are number of genes.
Figure 5
Figure 5
Characterization of archetype 3, 6 and 7. Number of up-and down-regulated genes within 26 gene ontology classes for each archetype for archetype 3 (A), archetype 6 (B) and archetype 7 (C). Enriched gene-ontology classes are cross-hatched in green and red for up-and down-regulated genes respectively. The values on the x-axes are number of genes.
Figure 6
Figure 6
Comparison between archetypal analysis, principal component analysis and k-means clustering. Visual representation of a seven-component analysis using archetypal analysis (AA), principal component analysis (PCA) and k-means clustering (K-means). Explained sample variance (ESV) for each analysis is included. For each PCA component the contribution to explained variance is indicated. The explained variance for a seven component analysis is indicated in brackets for each analysis.
Figure 7
Figure 7
Principal component analysis scatter plot. Each sample is plotted with respect to the loadings of first and second PCA component. The seven archetypes from archetypal analysis are transformed into the PCA space through a basis transformation. Each Study is indicated with a specific color. Study 1: GREEN, study 2: CYAN (samples #48-74) and BLUE (samples #75-128), study 3: RED, study 4: MAGENTA, study 5: ORANGE. The phenotypes are indicated with symbols as “Early”, “Late”, “Mucoid” and “Hypermutator”. The reference strains PAO1 and PA14 from study 1 and 2 are indicated with a symbol as “Reference”.

Similar articles

Cited by

References

    1. Liu W, Wang B, Glassey J, Martin E, Zhao J. A novel methodology for finding the regulation on gene expression data. Proc Natl Acad Sci U S A. 2009;19:267–272.
    1. Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, Vingron M. Correspondence analysis applied to microarray data. Proc Natl Acad Sci. 2001;98:10781. doi: 10.1073/pnas.181597298. - DOI - PMC - PubMed
    1. Kim MH, Seo HJ, Joung JG, Kim JH. Comprehensive evaluation of matrix factorization methods for the analysis of DNA microarray gene expression data. BMC bioinformatics. 2011;12(Suppl 1):S8. doi: 10.1186/1471-2105-12-S1-S8. - DOI - PMC - PubMed
    1. Quackenbush J. Computational analysis of microarray data: nature reviews. Genetics. 2001;2:418–427. - PubMed
    1. Mørup M, Hansen LK. Archetypal analysis for machine learning and data mining. Neurocomputing. 2012;80:54–63.

Publication types