Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov 27;15(1):1028.
doi: 10.1186/1471-2164-15-1028.

Visualisation of the T cell differentiation programme by Canonical Correspondence Analysis of transcriptomes

Affiliations

Visualisation of the T cell differentiation programme by Canonical Correspondence Analysis of transcriptomes

Masahiro Ono et al. BMC Genomics. .

Abstract

Background: Currently, in the era of post-genomics, immunology is facing a challenging problem to translate mutant phenotypes into gene functions based on high-throughput data, while taking into account the classifications and functions of immune cells, which requires new methods.

Results: Here we propose a novel application of a multidimensional analysis, Canonical Correspondence Analysis (CCA), to reveal the molecular characteristics of undefined cells in terms of cellular differentiation programmes by analysing two transcriptomic datasets. Using two independent datasets, whether RNA-seq or microarray data, CCA successfully visualised the cross-level relationships between genes, cells, and differentiation programmes, and thereby identified the immunological features of mutant cells (Gata3-KO T cells and Stat3-KO T cells) in a data-oriented manner. With a new concept, differentiation variable, CCA provides an automatic classification of cell samples, which had a high sensitivity and a comparable performance to other classification methods. In addition, we elaborate how CCA results can be interpreted, and reveal the features of CCA in comparison with other visualisation techniques.

Conclusions: CCA is a visualisation tool with a classification ability to reveal the cross-level relationships of genes, cells and differentiation programmes. This can be used for characterising the functional defect of cells of interest (e.g. mutant cells) in the context of cellular differentiation. The proposed approach fits with common hypothesis-oriented studies in immunology, and can be used for a wide range of molecular and genomic studies on cellular differentiation mechanisms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Delineation of the proposed approach. Delineation of (a) current and (b) proposed approaches for studies using transcriptomic analysis. Suppose that the hypothesis for transcriptomic experiment is that cell subset X is defective in the differentiation programme D. (a) Typical approach in immunological studies using transcriptomic analysis. Cell subset X and its controls are analysed by microarray analysis or RNA-seq. Note that the interpretation of the results of data analysis is made essentially by “current knowledge,” where considerable arbitrariness and bias can be introduced. (b) Proposed approach using Canonical Correspondence Analysis (CCA). The original hypothesis is decomposed into two parts, “cell subset X is defective…” and “…in the differentiation programme D,” based on which two transcriptomic datasets are prepared. Note that the same genes must be used in both matrices Z and X. X is standardised (S), and projected onto Z using a projection matrix Q. Thus, the projected space QS is the interpretable part of the main data X. SVD is applied to QS, producing sample and gene scores (‘X’ and ‘Gene’ in the new space). Differentiation programmes are visualised as regression coefficients between Z and the new axes. The results are visualised as a triplot that show relationships between cell subset X, genes, and differentiation programmes. The visualisation process ensures the transparency of the interpretation.
Figure 2
Figure 2
PCA sample scores of the datasets that were used in CCA analysis. PCA was applied to (a) the Gata3 dataset and (b) the Th dataset. Sample relationships (sample scores) of the first 3 axes are shown. Sample scores in 2D plots (b) are deliberately shown by arrows, in order to emphasise that these samples correspond to the explanatory variables that are shown by blue arrows in Figure 3. Percentage indicates that of the variance accounted for by the eigenvalue of the axis.
Figure 3
Figure 3
CCA results using the Gata3 dataset for the Th differentiation programmes. CCA was applied to the Gata3 dataset, using the microarray dataset that analysed Th1, Th2, Th17, and iTreg (the Th dataset) as explanatory variables for the Th differentiation programmes. (a) Sample relationships in the first three axes. The Th differentiation programmes are shown by black lines (pink texts). (b) CCA triplot of Gata3-KO and WT samples (red, green, blue and cyan closed circles and squares), genes (grey closed circles), and the Th differentiation programmes (blue arrows). (c) Gene plot of the CCA solution in (a) and (b), showing Th-specific genes only. (d) CCA triplot using PCA gene scores (PC1-3) of the Th dataset as explanatory variables. (e, f) CCA sample scores using (e) Th2 and (f) Th1 differentiation variables.
Figure 4
Figure 4
Comparison of CCA with other classification methods using the Gata3 dataset. The classification ability of CCA was compared with other classification methods. The Th dataset was used as a training data (explanatory variables for CCA), and WT data from the Gata3 dataset was used as a test data. Sensitivity and accuracy of those methods are plotted for each T cell subset (shown on the left side), using various numbers of genes (n; between 10 and 30). The numbers of condition positive (‘correct’ Th samples) and condition negative (all other samples) are two and six, respectively, in all the analyses.
Figure 5
Figure 5
Identify the T cell differentiation programmes that are disturbed in Stat3 -KO by CCA. The Stat3 dataset was analysed by CCA using the Th dataset as explanatory variables. (a) CCA biplot showing the relationships between samples (see legend) and Th differentiation programmes (arrows). Percentage indicates that of the variance accounted for by the inertia of the axis. (b) CCA triplot showing samples (see legend in (a)), Th differentiation programmes (arrows), and genes (small grey closed circles). (c) Gene plot of the CCA solution in (a) and (b) showing the ‘Th17-signature genes’ and the CCA top-ranked genes (2% top genes in Axis 1) only. Genes in the intersection of these two gene lists are shown as ‘Both’ in the legend. (d, e) CCA sample scores using (d) Th17/iTreg and (e) Th2/Th1 differentiation variables. Differentially expressed genes by the explanatory dataset (the Th dataset) were selected by false discovery rate (FDR) <0.01, and fold change (top/bottom 1%) in the comparison of Th2 and Th17, or that of Th1 and iTreg. (f) Heatmap analysis of the top-ranked genes in (c). Gene expression of those genes in the Stat3 dataset (left) and that in the Th dataset (right) were separately analysed by heatmap analysis, while clustering column (samples) only. Genes were ordered according to the CCA Axis 1 score. See Colour Key for expression levels.
Figure 6
Figure 6
Comparison of CCA with other classification methods using the Gata3 dataset. The classification ability of CCA was compared with other classification methods. The Th dataset was used as a training data (explanatory variables for CCA), and WT data from the Stat3 dataset was used as a test data. (a) Sensitivity and accuracy of those methods are plotted, using various numbers of genes (n; between 10 and 200). The numbers of condition positive (Th17 differentiated cells) and condition negative (all other samples) are four and twelve, respectively, in all the analyses. (b) Test dataset was resampled using a jackknife approach, and the classification methods were compared for sensitivity and accuracy. The number of genes used was either 20 (upper panels) or 200 (lower panels). Error bar indicates 95% confidence interval.

References

    1. Ihle JN. The challenges of translating knockout phenotypes into gene function. Cell. 2000;102(2):131–134. doi: 10.1016/S0092-8674(00)00017-9. - DOI - PubMed
    1. Vidal M. A biological atlas of functional maps. Cell. 2001;104(3):333–339. doi: 10.1016/S0092-8674(01)00221-5. - DOI - PubMed
    1. Hyatt G, Melamed R, Park R, Seguritan R, Laplace C, Poirot L, Zucchelli S, Obst R, Matos M, Venanzi E, Goldrath A, Nguyen L, Luckey J, Yamagata T, Herman A, Jacobs J, Mathis D, Benoist C. Gene expression microarrays: glimpses of the immunological genome. Nat Immunol. 2006;7(7):686–691. doi: 10.1038/ni0706-686. - DOI - PubMed
    1. Painter MW, Davis S, Hardy RR, Mathis D, Benoist C. Transcriptomes of the B and T lineages compared by multiplatform microarray profiling. J Immunol. 2011;186(5):3047–3057. doi: 10.4049/jimmunol.1002695. - DOI - PMC - PubMed
    1. Shay T, Kang J. Immunological Genome Project and systems immunology. Trends Immunol. 2013;34(12):602–609. doi: 10.1016/j.it.2013.03.004. - DOI - PMC - PubMed

Publication types

Substances