Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(1):e53544.
doi: 10.1371/journal.pone.0053544. Epub 2013 Jan 2.

Visualising the cross-level relationships between pathological and physiological processes and gene expression: analyses of haematological diseases

Affiliations

Visualising the cross-level relationships between pathological and physiological processes and gene expression: analyses of haematological diseases

Masahiro Ono et al. PLoS One. 2013.

Abstract

The understanding of pathological processes is based on the comparison between physiological and pathological conditions, and transcriptomic analysis has been extensively applied to various diseases for this purpose. However, the way in which the transcriptomic data of pathological cells relate to the transcriptomes of normal cellular counterparts has not been fully explored, and may provide new and unbiased insights into the mechanisms of these diseases. To achieve this, it is necessary to develop a method to simultaneously analyse components across different levels, namely genes, normal cells, and diseases. Here we propose a multidimensional method that visualises the cross-level relationships between these components at three different levels based on transcriptomic data of physiological and pathological processes, by adapting Canonical Correspondence Analysis, which was developed in ecology and sociology, to microarray data (CCA on Microarray data, CCAM). Using CCAM, we have analysed transcriptomes of haematological disorders and those of normal haematopoietic cell differentiation. First, by analysing leukaemia data, CCAM successfully visualised known relationships between leukaemia subtypes and cellular differentiation, and their characteristic genes, which confirmed the relevance of CCAM. Next, by analysing transcriptomes of myelodysplastic syndromes (MDS), we have shown that CCAM was effective in both generating and testing hypotheses. CCAM showed that among MDS patients, high-risk patients had transcriptomes that were more similar to those of both haematopoietic stem cells (HSC) and megakaryocyte-erythroid progenitors (MEP) than low-risk patients, and provided a prognostic model. Collectively, CCAM reveals hidden relationships between pathological and physiological processes and gene expression, providing meaningful clinical insights into haematological diseases, and these could not be revealed by other univariate and multivariate methods. Furthermore, CCAM was effective in identifying candidate genes that are correlated with cellular phenotypes of interest. We expect that CCAM will benefit a wide range of medical fields.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. CCAM of transcriptomic data of leukaemias with haematopoietic cell differentiation as explanatory variables.
Leukaemia data were analysed with those of haematopoietic cell populations at distinct differentiation states (Granulocyte-monocyte progenitor [GMP], Neutrophilic metamyelocyte, Pro-B cell, and Mature B cell class switched,). (a) Schematic presentation of CCAM. Transcriptomic datasets of leukaemias (including AML, CML, ALL, and CLL) and haematopoietic cells were processed by CCA and the cross-level relationships between components at three different levels, namely disease, cell, and gene, were analysed. (b) All three levels are shown on a map (CCA triplot). Centroids of disease samples are shown by large closed circles, and 95% confident intervals (CI) are indicated by ellipsoids. Genes are shown by closed grey circles, and well-known genes that are key for either leukaemia or haematopoietic cell differentiation are annotated. Haematopoietic cells are represented by blue arrows, towards which genes and diseases that are closely related to the corresponding cell are aggregated. (c, d) The levels of disease and cell are shown. (c) Individual disease samples are shown in addition to 95% CI. (d) Two-dimensional plot of disease samples and haematopoietic cell populations. The amount of information (eigenvalue) retained in each axis is 68% and 18% of the total variation (precisely, constrained inertia, see Methods) for Axis 1 and 2, respectively. (e) Three-dimensional plots of disease samples and haematopoietic cell populations. The amount of information (eigenvalue) retained in each axis is 68%, 18%, and 11% (of the constrained inertia) for Axis 1, 2, and 3, respectively. See legend for symbols and colours in (d) and (e).
Figure 2
Figure 2. CCAM of MDS transcriptomic data of CD34 cells from MDS, non-MDS anaemia, and normal BM, analysed together with those of haematopoietic cell differentiation.
Microarray data of CD34formula image cells from MDS, non-MDS anaemia, and healthy controls were analysed by CCAM using five haematopoietic cell populations (Haematopoitic stem cell [HSC], Megakaryocyte-erythroid progenitors [MEP], Common myeloid progenitor [CMP], Granulocyte-monocyte progenitor [GMP], Pro-B cell) as explanatory variables. Genes were filtered by MDS data using an empirical Bayes t-statistic [formula image]). (a) CCA triplot. Centroids of MDS and normal CD34formula image cells are shown by large closed circles, and 95% confident intervals (CI) are indicated by ellipsoids. Genes are shown by closed grey circles, and well-known genes that are key for either MDS or corresponding haematopoietic cells are annotated. Axis 1 indicates the direction to which the variation (inertia) is the largest (87% of the total variation [constrained inertia]). Axis 2 has the second largest inertia (7%). (b) Individual disease samples are shown without genes to clearly show the relationships between disease samples and haematopoietic cell populations.
Figure 3
Figure 3. Hypothesis-testing by CCAM using another independent MDS dataset.
Microarray data of MDS patients (without normal BM) were analysed using five haematopoietic cell population (HSC, MEP, CMP, GMP, Pro-B cell) as explanatory variables. Genes were filtered only by haematopoietic cell data (formula image), therefore this is an unsupervised analysis in terms of MDS disease data. (a) CCAM result showing the disease and cell levels. Axis 1 indicates the direction to which the variation (inertia) is the largest (55% of the total variation [constrained inertia]). Axis 2 has the second largest inertia (21%). MDS patient samples are positioned according to the correlations with five haematopoietic cell populations in terms of gene expression. (b–d) The following clinical data of individual disease samples were superimposed on the map in (a): (b) cytopenia score; (c) blast score; (d) karyotype score; (e) IPSS category; and (f) disease classification.
Figure 4
Figure 4. Kaplan-Meier curves for overall survival for MDS patients according to the HSC-CMP score(a, b), which was made by CCAM, and the well-established classifications (c–g).
Patients were stratified into 2 to 6 groups by the followings: (a) HSC-CMP score, two groups (1: formula image and 2: formula image). (b) HSC-CMP score, three groups (1: formula image, 2: formula image, and 3: formula image). (c) Cytopenia score. (d) Blast score. (e) Karyotype score. (f) IPSS score. (g) Disease classification. P values are by log-rank test.
Figure 5
Figure 5. Kaplan-Meier curves for time to AML transformation for MDS patients according to the HSC-CMP score(a, b), which was made by CCAM, and the well-established classifications (c–g).
Patients were stratified into 2 to 6 groups by the followings: (a) HSC-CMP score, two groups (1: formula image and 2: formula image). (b) HSC-CMP score, three groups (1: formula image, 2: formula image, and 3: formula image). (c) Cytopenia score. (d) Blast score. (e) Karyotype score. (f) IPSS score. (g) Disease classification. P values are by log-rank test.
Figure 6
Figure 6. Schematic representation of CCAM and instructions for its practical usage.
(a) Overview of CCAM. See Methods for the full instructions. (b) Schematic representation of the decomposition of variation (inertia). (1) Total inertia is divided into constrained and unconstrained inertias by regression of main data on explanatory variables. (2) Constrained inertia is distributed to different axes by singular value decomposition.

References

    1. Liu R, Wang X, Chen GY, Dalerba P, Gurney A, et al. (2007) The prognostic role of a gene signature from tumorigenic breast-cancer cells. N Engl J Med 356: 217–26. - PubMed
    1. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–6. - PubMed
    1. Greenberg SA, Higgs BW, Morehouse C, Walsh RJ, Won Kong S, et al. (2012) Relationship between disease activity and type 1 interferon- and other cytokine-inducible gene expression in blood in dermatomyositis and polymyositis. Genes Immun 13: 207–13. - PubMed
    1. Symmans WF, Hatzis C, Sotiriou C, Andre F, Peintinger F, et al. (2010) Genomic index of sensitivity to endocrine therapy for breast cancer. J Clin Oncol 28: 4111–9. - PMC - PubMed
    1. Tanaka RJ, Ono M, Harrington HA (2011) Skin barrier homeostasis in atopic dermatitis: feedback regulation of kallikrein activity. PLoS One 6: e19895. - PMC - PubMed

Publication types

Associated data