Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 10;12(11):e1005187.
doi: 10.1371/journal.pcbi.1005187. eCollection 2016 Nov.

MinePath: Mining for Phenotype Differential Sub-paths in Molecular Pathways

Affiliations

MinePath: Mining for Phenotype Differential Sub-paths in Molecular Pathways

Lefteris Koumakis et al. PLoS Comput Biol. .

Abstract

Pathway analysis methodologies couple traditional gene expression analysis with knowledge encoded in established molecular pathway networks, offering a promising approach towards the biological interpretation of phenotype differentiating genes. Early pathway analysis methodologies, named as gene set analysis (GSA), view pathways just as plain lists of genes without taking into account either the underlying pathway network topology or the involved gene regulatory relations. These approaches, even if they achieve computational efficiency and simplicity, consider pathways that involve the same genes as equivalent in terms of their gene enrichment characteristics. Most recent pathway analysis approaches take into account the underlying gene regulatory relations by examining their consistency with gene expression profiles and computing a score for each profile. Even with this approach, assessing and scoring single-relations limits the ability to reveal key gene regulation mechanisms hidden in longer pathway sub-paths. We introduce MinePath, a pathway analysis methodology that addresses and overcomes the aforementioned problems. MinePath facilitates the decomposition of pathways into their constituent sub-paths. Decomposition leads to the transformation of single-relations to complex regulation sub-paths. Regulation sub-paths are then matched with gene expression sample profiles in order to evaluate their functional status and to assess phenotype differential power. Assessment of differential power supports the identification of the most discriminant profiles. In addition, MinePath assess the significance of the pathways as a whole, ranking them by their p-values. Comparison results with state-of-the-art pathway analysis systems are indicative for the soundness and reliability of the MinePath approach. In contrast with many pathway analysis tools, MinePath is a web-based system (www.minepath.org) offering dynamic and rich pathway visualization functionality, with the unique characteristic to color regulatory relations between genes and reveal their phenotype inclination. This unique characteristic makes MinePath a valuable tool for in silico molecular biology experimentation as it serves the biomedical researchers' exploratory needs to reveal and interpret the regulatory mechanisms that underlie and putatively govern the expression of target phenotypes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Limitations of analyzing solely gene expression profiles.
(A) A dummy pathway. (B) The input (artificial) gene expression profile. (C) Functional status of sub-paths–the shaded cells indicate that sub-path A B–| C is functional in the corresponding samples.
Fig 2
Fig 2. MinePath learning curve on three BrCa/ER datasets
Fig 3
Fig 3. Venn diagram of the MinePath discriminant sub-paths that are shared among the three BrCa/ER datasets.
Fig 4
Fig 4. The MinePath identified sub-paths for the ‘3ER GSE2034-3494-7390’ merged dataset that discriminate between the ER+ and ER- phenotypes in the ErbB pathway.
Edges colored in red indicate regulatory relations functional for the ER- phenotype, green for the ER+ phenotype, and black for relations that are functional for both phenotypes.
Fig 5
Fig 5. Contrasted SPIA, GGEA and MinePath results for the p53 pathway using the GSE3494 ER+ vs. ER- dataset.
The legend shows the meaning of the edge colors used to contrast between the results produced by the three pathway analysis methodologies.
Fig 6
Fig 6. Venn diagram of the selected (significant) pathways shared among SPIA, GGEA, MinePath and the six significant BrCa/ER pathways
Fig 7
Fig 7. Part of the WNT signaling pathway for gastric cancer that shows the MinePath discriminant sub-paths.
Green edges indicate discriminant functional relations for GC cases; edges in black indicate discriminant functional relations for both CG and normal cases; undirected yellow edges denote binding/association relations.
Fig 8
Fig 8. Contrasting between RNAseq and microarray gene expression profiling technologies on the pathway level
Fig 9
Fig 9. The integrated network that reflects the CXCR4 mutation downstream signalling events
Fig 10
Fig 10. Discretization of gene expression values in MinePath.
At the left a ‘dummy’ gene expression profile is shown, the profile refers to five genes (rows) and to six samples (columns); at the right its discrete binarized version of the gene expression profile is shown; in between the respective (for each gene) computed discretization cut-off points are shown.
Fig 11
Fig 11. Identification of functional sub-paths in gene expression sample profiles (matching operation).
(A) Identification of functional sub-paths and their matching with gene expression profiles; a ‘dummy’ binary gene expression profile is used with four genes and six samples assigned to two phenotype classes; ‘1’ represents up-regulated and ‘0’ down-regulated statues of gene, respectively. (B) The binary sub-path expression matrix produced by MinePath.
Fig 12
Fig 12. MinePath pathway visualization functionality and capabilities.
(A) Sorted (by p-value) list of pathways accompanied by other statistics computed by MinePath–the user may select the pathway to visualize. (B) MinePath ‘Controls’ panel. (C) MinePath pathway editing panel. (D) The original ErbB pathway with its KEGG layout preserved. (E) The edited and simplified ErbB pathway

References

    1. Seqc/Maqc-Iii Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2014;32: 903–914. \ 10.1038/nbt.2957 - DOI - PMC - PubMed
    1. Shi L et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol. 2010;28: 827–38. 10.1038/nbt.1665 - DOI - PMC - PubMed
    1. Abdullah-Sayani A, Bueno-de-Mesquita JM, van de Vijver MJ. Technology Insight: tuning into the genetic orchestra using microarrays—limitations of DNA microarrays in clinical practice. Nat Clin Pract Oncol. 2006;3: 501–516. 10.1038/ncponc0587 - DOI - PubMed
    1. Langley P (Institute for the S of L and E. Selection of Relevant Features in Machine Learning. Proc AAAI Fall Symp Relev. 1994; 140–144.
    1. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007. pp. 2507–2517. 10.1093/bioinformatics/btm344 - DOI - PubMed