Dimension reduction techniques for the integrative analysis of multi-omics data

Chen Meng, Oana A Zeleznik, Gerhard G Thallinger, Bernhard Kuster, Amin M Gholami, Aedín C Culhane

PMID: 26969681
PMCID: PMC4945831
DOI: 10.1093/bib/bbv108

Review

Dimension reduction techniques for the integrative analysis of multi-omics data

Chen Meng et al. Brief Bioinform. 2016 Jul.

. 2016 Jul;17(4):628-41.

doi: 10.1093/bib/bbv108. Epub 2016 Mar 11.

Authors

Chen Meng, Oana A Zeleznik, Gerhard G Thallinger, Bernhard Kuster, Amin M Gholami, Aedín C Culhane

PMID: 26969681
PMCID: PMC4945831
DOI: 10.1093/bib/bbv108

Abstract

State-of-the-art next-generation sequencing, transcriptomics, proteomics and other high-throughput 'omics' technologies enable the efficient generation of large experimental data sets. These data may yield unprecedented knowledge about molecular pathways in cells and their role in disease. Dimension reduction approaches have been widely used in exploratory analysis of single omics data sets. This review will focus on dimension reduction approaches for simultaneous exploratory analyses of multiple data sets. These methods extract the linear relationships that best explain the correlated structure across data sets, the variability both within and between variables (or observations) and may highlight data issues such as batch effects or outliers. We explore dimension reduction techniques as one of the emerging approaches for data integration, and how these can be applied to increase our understanding of biological systems in normal physiological function and disease.

Keywords: dimension reduction; exploratory data analysis; integrative genomics; multi-assay; multi-omics data integration; multivariate analysis.

PubMed Disclaimer

Figures

**Figure 1.**
Results of a PCA analysis of mRNA gene expression data of melanoma (ME), leukemia (LE) and central nervous system (CNS) cell lines from the NCI-60 cell line panel. All variables were centered and scaled. Results show (A) a biplot where observations (cell lines) are points and gene expression profiles are arrows; (B) a heatmap showing the gene expression of the same 20 genes in the cell lines; red to blue scale represent high to low gene expression (light to dark gray represent high to low gene expression on the black and white figure); (C) correlation circle; (D) variance barplot of the first ten PCs. To improve the readability of the biplot, some labels of the variables (genes) in (A) have been moved slightly. A colour version of this figure is available online at BIB online: http://bib.oxfordjournals.org.

**Figure 2.**
MCIA of mRNA, miRNA and proteomics profiles of melanoma (ME), leukemia (LE) and central nervous system (CNS) cell lines. (A) shows a plot of the first two components in sample space (sample ‘type' is coded by the point shape; circles for mRNAs, triangles for proteins and squares for miRNAs). Each sample (cell line) is represented by a “star”, where the three omics data for each cell line are connected by lines to a center point, which is the global score (F) for that cell line, the shorter the line, the higher the level of concordance between the data types and the global structure. (B) shows the variable space of MCIA. A variable that is highly expressed in a cell line will be projected with a high weight (far from the origin) in the direction of that cell line. Some miRNAs with a large distance from the origin are labeled, as these miRNAs are the strongly associated with cancer tissue of origin. (C) shows the correlation coefficients of the proteome profiling of SR with other cell lines. The proteome profiling of SR cell line is more correlated with melanoma cell line. There may be a technical issue with the LE.SR proteomics data. (D) A scree plot of the eigenvalues and (E) a plot of data weighting space. A colour version of this figure is available online at BIB online: http://bib.oxfordjournals.org.

See this image and copyright information in PMC

References

1. Brazma A, Culhane AC. Algorithms for gene expression analysis. In: Jorde LB, Little PFR, Dunn MJ, Subramaniam S. (eds). Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. London: John Wiley & Sons, 2005, 3148–59.
1. Leek JT, Scharpf RB, Bravo HC, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 2010;11:733–9. - PMC - PubMed
1. Legendre P, Legendre LFJ. Numerical Ecology. Amsterdam: Elsevier Science; 3rd edition 2012.
1. Biton A, Bernard-Pierrot I, Lou Y, et al. Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. Cell Rep 2014,9:1235–45. - PubMed
1. Hoadley KA, Yau C, Wolf DM, et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 2014;158:929–44. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions

Grants and funding

P50 CA101942/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Dimension reduction techniques for the integrative analysis of multi-omics data

Dimension reduction techniques for the integrative analysis of multi-omics data

Authors

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases