Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Jul;17(4):628-41.
doi: 10.1093/bib/bbv108. Epub 2016 Mar 11.

Dimension reduction techniques for the integrative analysis of multi-omics data

Review

Dimension reduction techniques for the integrative analysis of multi-omics data

Chen Meng et al. Brief Bioinform. 2016 Jul.

Abstract

State-of-the-art next-generation sequencing, transcriptomics, proteomics and other high-throughput 'omics' technologies enable the efficient generation of large experimental data sets. These data may yield unprecedented knowledge about molecular pathways in cells and their role in disease. Dimension reduction approaches have been widely used in exploratory analysis of single omics data sets. This review will focus on dimension reduction approaches for simultaneous exploratory analyses of multiple data sets. These methods extract the linear relationships that best explain the correlated structure across data sets, the variability both within and between variables (or observations) and may highlight data issues such as batch effects or outliers. We explore dimension reduction techniques as one of the emerging approaches for data integration, and how these can be applied to increase our understanding of biological systems in normal physiological function and disease.

Keywords: dimension reduction; exploratory data analysis; integrative genomics; multi-assay; multi-omics data integration; multivariate analysis.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Results of a PCA analysis of mRNA gene expression data of melanoma (ME), leukemia (LE) and central nervous system (CNS) cell lines from the NCI-60 cell line panel. All variables were centered and scaled. Results show (A) a biplot where observations (cell lines) are points and gene expression profiles are arrows; (B) a heatmap showing the gene expression of the same 20 genes in the cell lines; red to blue scale represent high to low gene expression (light to dark gray represent high to low gene expression on the black and white figure); (C) correlation circle; (D) variance barplot of the first ten PCs. To improve the readability of the biplot, some labels of the variables (genes) in (A) have been moved slightly. A colour version of this figure is available online at BIB online: http://bib.oxfordjournals.org.
Figure 2.
Figure 2.
MCIA of mRNA, miRNA and proteomics profiles of melanoma (ME), leukemia (LE) and central nervous system (CNS) cell lines. (A) shows a plot of the first two components in sample space (sample ‘type' is coded by the point shape; circles for mRNAs, triangles for proteins and squares for miRNAs). Each sample (cell line) is represented by a “star”, where the three omics data for each cell line are connected by lines to a center point, which is the global score (F) for that cell line, the shorter the line, the higher the level of concordance between the data types and the global structure. (B) shows the variable space of MCIA. A variable that is highly expressed in a cell line will be projected with a high weight (far from the origin) in the direction of that cell line. Some miRNAs with a large distance from the origin are labeled, as these miRNAs are the strongly associated with cancer tissue of origin. (C) shows the correlation coefficients of the proteome profiling of SR with other cell lines. The proteome profiling of SR cell line is more correlated with melanoma cell line. There may be a technical issue with the LE.SR proteomics data. (D) A scree plot of the eigenvalues and (E) a plot of data weighting space. A colour version of this figure is available online at BIB online: http://bib.oxfordjournals.org.

References

    1. Brazma A, Culhane AC. Algorithms for gene expression analysis. In: Jorde LB, Little PFR, Dunn MJ, Subramaniam S. (eds). Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. London: John Wiley & Sons, 2005, 3148–59.
    1. Leek JT, Scharpf RB, Bravo HC, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 2010;11:733–9. - PMC - PubMed
    1. Legendre P, Legendre LFJ. Numerical Ecology. Amsterdam: Elsevier Science; 3rd edition 2012.
    1. Biton A, Bernard-Pierrot I, Lou Y, et al. Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. Cell Rep 2014,9:1235–45. - PubMed
    1. Hoadley KA, Yau C, Wolf DM, et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 2014;158:929–44. - PMC - PubMed

MeSH terms