Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles
- PMID: 30221015
- PMCID: PMC6135253
- DOI: 10.1007/s40484-017-0119-0
Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles
Abstract
Background: Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles.
Methods: We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that ollows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, "mRMR" is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch t-test (parametric test) and Shrink t-test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, "RANWAR" to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for in silico validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel.
Results: The novel markers of AML are {ABCB11↑∪KRT17↓} (i.e., ABCB11 as up-regulated, & KRT17 as down-regulated), and {AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1¶∪APBA2‡∪C4orf31‡} (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper-methylated).
Conclusion: The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease.
Keywords: feature extraction; integrated markers; rule mining; statistical test.
Figures








Similar articles
-
RANWAR: rank-based weighted association rule mining from gene expression and methylation data.IEEE Trans Nanobioscience. 2015 Jan;14(1):59-66. doi: 10.1109/TNB.2014.2359494. Epub 2014 Sep 23. IEEE Trans Nanobioscience. 2015. PMID: 25265613
-
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9. IEEE Trans Nanobioscience. 2017. PMID: 28092570
-
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015. PLoS One. 2015. PMID: 25830807 Free PMC article.
-
ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.Genes (Basel). 2017 Dec 28;9(1):7. doi: 10.3390/genes9010007. Genes (Basel). 2017. PMID: 29283433 Free PMC article.
-
Integrated Analysis of Methylomic and Transcriptomic Data to Identify Potential Diagnostic Biomarkers for Major Depressive Disorder.Genes (Basel). 2021 Jan 27;12(2):178. doi: 10.3390/genes12020178. Genes (Basel). 2021. PMID: 33513891 Free PMC article.
Cited by
-
Single-cell genomic profile-based analysis of tissue differentiation in colorectal cancer.Sci China Life Sci. 2021 Aug;64(8):1311-1325. doi: 10.1007/s11427-020-1811-5. Epub 2020 Oct 30. Sci China Life Sci. 2021. PMID: 33141303
-
Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles.Genes (Basel). 2019 Aug 13;10(8):611. doi: 10.3390/genes10080611. Genes (Basel). 2019. PMID: 31412637 Free PMC article.
-
Whole-transcriptome bioinformatics revealed HTRA3, KRT8, KRT17, and RHEX as novel targets in acute myeloid leukaemia.J Taibah Univ Med Sci. 2022 Mar 10;17(5):897-903. doi: 10.1016/j.jtumed.2021.12.013. eCollection 2022 Oct. J Taibah Univ Med Sci. 2022. PMID: 36050959 Free PMC article.
-
Genome-Wide Correlation of DNA Methylation and Gene Expression in Postmortem Brain Tissues of Opioid Use Disorder Patients.Int J Neuropsychopharmacol. 2021 Nov 12;24(11):879-891. doi: 10.1093/ijnp/pyab043. Int J Neuropsychopharmacol. 2021. PMID: 34214162 Free PMC article.
-
Identification of gene signatures from RNA-seq data using Pareto-optimal cluster algorithm.BMC Syst Biol. 2018 Dec 21;12(Suppl 8):126. doi: 10.1186/s12918-018-0650-2. BMC Syst Biol. 2018. PMID: 30577846 Free PMC article.
References
-
- Renneville A, Roumier C, Biggio V, Nibourel O, Boissel N, Fenaux P, Preudhomme C. Cooperating gene mutations in acute myeloid leukemia: a review of the literature. Leukemia. 2008;22:915–931. - PubMed
-
- Opgen-Rhein R, Strimmer K. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Stat Appl Genet Mol Biol. 2007;6:e9. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous