Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 18;20(1):130-143.
doi: 10.1093/bib/bbx080.

'Multi-omic' data analysis using O-miner

Affiliations

'Multi-omic' data analysis using O-miner

Ajanthah Sangaralingam et al. Brief Bioinform. .

Abstract

Innovations in -omics technologies have driven advances in biomedical research. However, integrating and analysing the large volumes of data generated from different high-throughput -omics technologies remain a significant challenge to basic and clinical scientists without bioinformatics skills or access to bioinformatics support. To address this demand, we have significantly updated our previous O-miner analytical suite, to incorporate several new features and data types to provide an efficient and easy-to-use Web tool for the automated analysis of data from '-omics' technologies. Created from a biologist's perspective, this tool allows for the automated analysis of large and complex transcriptomic, genomic and methylomic data sets, together with biological/clinical information, to identify significantly altered pathways and prioritize novel biomarkers/targets for biological validation. Our resource can be used to analyse both in-house data and the huge amount of publicly available information from array and sequencing platforms. Multiple data sets can be easily combined, allowing for meta-analyses. Here, we describe the analytical pipelines currently available in O-miner and present examples of use to demonstrate its utility and relevance in maximizing research output. O-miner Web server is free to use and is available at http://www.o-miner.org.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Transcriptomics workflow. O-miner takes as input raw array data (CEL files) from Affymetrix array-based platforms and either normalized/unnormalized data from Illumina expression arrays. QC is performed on data from raw CEL files. Data are then normalized and filtered to remove redundant probes. Users performing meta-analysis have the option to apply the COMBAT algorithm to correct for batch effects when combining data from different studies. Tumour purity can be estimated for Affymetrix data using the ESTIMATE algorithm. Survival analysis can be run for data from all of the array-based platforms. The normalized expression matrix is then subjected to differential expression analysis using LIMMA to identify significantly DEGs between biological groups. Optionally, GO terms that are statistically over- or under-represented are identified using GOstats, and Venn diagrams may be generated. Results are displayed online in expandable tabs and easy to download as text and excel files. (A) Heatmaps of the statistically significant DEGs identified for each of the comparisons are available to download. (B) A boxplot displaying the expression profiles across the biological conditions can be viewed. (C) A Venn diagram showing common and unique genes that are differentially expressed across the biological groups is displayed, if selected from the output options.
Figure 2
Figure 2
RNA-Seq post-processing workflow. O-miner provides a workflow for the post-processing of data from RNA-Seq experiments. After the pre-processing stage, comprising QC and alignment steps, a matrix of either raw read counts or RPKM values for each sample are submitted to O-miner. A choice of differential expression analysis methods is available—LIMMA for raw read counts and RPKM values, and edgeR for raw read counts. Like the transcriptomics workflow, users can then select the output options that they wish to implement. These include GO analysis and Venn diagrams. All the results are available as text and excel files and are available for download. The result options and presentation are identical to those generated by the transcriptomics workflow. (A) Unsupervised hierarchical clustering plot from raw read counts data, displaying similarity between gene expression profiles. (B) Venn diagram showing the number of unique and common DEGs between the biological groups.
Figure 3
Figure 3
Workflow for CBS analysis. The CBS pipeline generates information about regions of gain and loss. Several steps comprise the CBS workflow, with the steps conducted being dependent on the input type. Raw image CEL files, log2ratios, segmented or binary coded data for a number of Affymetrix SNP arrays are used as input for the workflow. Aroma.affymetrix is applied to the raw CEL files to estimate copy numbers, data normalization and QC. Segmentation is applied using the CBS model. The quartile regression framework is applied to calculate the threshold used to call gains and losses. Regions of gain and loss are annotated from multiple sources. Minimal common regions can be generated using the CGHregions algorithm. (A) The results from each sample are displayed in expandable tabs. These tabs can be expanded further to obtain information about regions of loss and gain, with all findings available to download as an excel file by clicking on the ‘xls’ link. Log2ratio plots based on filtered and unfiltered data are displayed and can be downloaded as PDFs by clicking on the ‘PDF’ icon. (B). For each of the biological groups, frequency plots from both filtered and unfiltered data can be viewed either across all chromosomes or for individual chromosomes. All the filtered frequency plots are available for download as a zipped file by clicking on the arrow on the right-hand side of the window displaying chromosome number. Unfiltered frequency plots can be downloaded as PDFs by clicking on the ‘PDF’ icon. Results shown are from the analysis of data set GSE42525.
Figure 4
Figure 4
Workflow for ASCAT analysis. Raw data files are accepted as input. Log2ratios (LRR) and BAFs are calculated using the the R package CalMaTe. These are fitted to an ASPCF model. The ASCAT algorithm is used to estimate aberrant cell fraction, tumour ploidy and absolute allele-specific copy number calls. The results presented are from the analysis of the GSE7130 data set. (A) Raw LRR and BAF plots generated from ASCAT are shown for each sample. (B) Frequency plots of CNAs are also displayed for each biological group, with all frequency plots available for download as a zipped file. Frequency plots are shown across all the chromosomes and also for each individual chromosome. (C) Aberration plots are generated, showing regions of gain (red) and loss (blue) across each of the samples in the data set.
Figure 5
Figure 5
Methylation workflow. Raw (IDAT) files and normalized data from Illumina methylation array platforms are accepted as input to the methylation workflow. QC analysis is performed, using the Champ R package. One of the following normalization methods: BMIQ, SWAN and PBC can be chosen to normalize the data. After filtering of the normalized data, differentially methylated probes are identified using LIMMA, with user-defined thresholds for the delta beta value and adjusted P-values applied. Differentially methylated regions are annotated and users can choose to identify statistically significant GO terms from the list of differentially methylated probes. Results shown are from the analysis of data set GSE69118. (A) Sample quality, QC plots and cluster diagrams are presented. Sample quality displays a table showing the sample name and % of failed probes for each sample. QC plots consist of four plots that are available for display and download. These are raw density plot, normalized density plot, raw MDS plot and normalized MDS plot. Cluster diagram displays an unsupervised hierarchical cluster based on normalized methylation data. (B) Each comparison is displayed within an expandable tab alongside information about probeset ID, chromosomal location, HGNC symbol, gene description, whether the region is differentially methylated, location of CpG island, delta beta value and adjusted P-values. A boxplot, showing the difference in methylation values across biological groups, can be also viewed for each probeset ID. (C) Individual comparisons are displayed as separate tabs. Each of the probes reported as differentially methylated are mapped to GO terms, with those that were found to be statistically over- and under-represented listed in tabular format.
Figure 6
Figure 6
Application of the transcriptomics workflow for the multi-cohort analysis of BC data. Data Collection: A meta-analysis was conducted using O-miner to investigate the effect of basality on TN BCs. Two Affymetrix data sets GSE48390 and GSE21653 were downloaded using GEO data set as the data source option. The subset of samples defined as triple negative, were selected from the File Organiser window. Analysis Parameters: Once all the sample characteristics and survival covariates were provided, the raw data were normalized using RMA and filtered using SD (top 10%). Samples belonging to each of the data sets were specified and the COMBAT algorithm applied to adjust for batch effects. The resulting normalized matrix was subjected to differential expression and survival analyses. All the results are available and easy to download as text and excel files. Results: (A) Unsupervised hierarchical clustering of the gene expression profiles suggests that TNBL BCs are more similar to each other than to TNnonBL BCs. The cluster is annotated with the sample names and biological groups. Each biological group has its own colour. (B) The GABRP gene was reported differentially expressed between the two biological groups. The expression of GABRP between the TNBL and TNnonBL groups can be displayed by boxplots. (C) Survival, the 5-year KM survival plot suggests that the BLTN group has poorer overall survival relative to the BLnonTN group but this relationship is not significant (P>0.05). (D) Statistically significant GO terms between BLTN and BLnonTN groups are displayed, with hyperlinks to external resources provided.
Figure 7
Figure 7
Application of O-miner to the analysis of PCa sequencing data. Data collection: Sequencing data from the TCGA PRAD project were downloaded and subjected to the O-miner RNA-Seq post-processing workflow. Analysis parameters: Following pre-processing of data (QC and alignment steps), a matrix of raw read counts was generated. The matrix of normalized read counts was submitted to O-miner. LIMMA was used to identify DEGs, and statistically significant GO terms were identified. Users can choose to generate Venn diagrams. All of the results are available as text and excel files and are available to download. Results: (A) Significantly DEGs are displayed together with Ensembl gene ID, chromosomal location, fold-change and adjusted P-values. (B) Results of GO analysis of DEGs are displayed in tabular format. Over- and under-represented GO terms are listed and GO IDs, P-values and GO term annotations are present.

References

    1. Barrett T, Wilhite SE, Ledoux P. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res 2013;41:D991–5. - PMC - PubMed
    1. Kolesnikov N, Hastings E, Keays M. ArrayExpress update–simplifying data submissions. Nucleic Acids Res 2015;43:D1113–16. - PMC - PubMed
    1. Kodama Y, Shumway M, Leinonen R.. The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res 2012;40:D54–6. - PMC - PubMed
    1. Lappalainen I, Almeida-King J, Kumanduri V. The European genome-phenome archive of human data consented for biomedical research. Nat Genet 2015;47:692–5. - PMC - PubMed
    1. Alonso R, Salavert F, Garcia-Garcia F. Babelomics 5.0: functional interpretation for new generations of genomic data. Nucleic Acids Res 2015;43:W117–21. - PMC - PubMed

Publication types

MeSH terms