Feature selection and classification over the network with missing node observations
- PMID: 34816464
- PMCID: PMC9773124
- DOI: 10.1002/sim.9267
Feature selection and classification over the network with missing node observations
Abstract
Jointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome-scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen-Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas.
Keywords: Bayesian nonparametrics; false discovery rate control; feature selection; gene networks.
© 2021 John Wiley & Sons Ltd.
Figures





Similar articles
-
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67. BMC Bioinformatics. 2007. PMID: 17328811 Free PMC article.
-
Gene-gene interaction analysis incorporating network information via a structured Bayesian approach.Stat Med. 2021 Dec 20;40(29):6619-6633. doi: 10.1002/sim.9202. Epub 2021 Sep 20. Stat Med. 2021. PMID: 34542187 Free PMC article.
-
Bayesian inference of hub nodes across multiple networks.Biometrics. 2019 Mar;75(1):172-182. doi: 10.1111/biom.12958. Epub 2018 Aug 23. Biometrics. 2019. PMID: 30051914 Free PMC article.
-
Bayesian network feature finder (BANFF): an R package for gene network feature selection.Bioinformatics. 2016 Dec 1;32(23):3685-3687. doi: 10.1093/bioinformatics/btw522. Epub 2016 Aug 8. Bioinformatics. 2016. PMID: 27503223 Free PMC article.
-
Computational models of melanoma.Theor Biol Med Model. 2020 May 14;17(1):8. doi: 10.1186/s12976-020-00126-7. Theor Biol Med Model. 2020. PMID: 32410672 Free PMC article. Review.
Cited by
-
Bayesian functional analysis for untargeted metabolomics data with matching uncertainty and small sample sizes.Brief Bioinform. 2024 Mar 27;25(3):bbae141. doi: 10.1093/bib/bbae141. Brief Bioinform. 2024. PMID: 38581417 Free PMC article.
-
Risk factors assessment and a Bayesian network model for predicting ischemic stroke in patients with cardiac myxoma.Front Cardiovasc Med. 2023 Mar 24;10:1128022. doi: 10.3389/fcvm.2023.1128022. eCollection 2023. Front Cardiovasc Med. 2023. PMID: 37034338 Free PMC article.
References
-
- Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genet Epidemiol. 2002;23:70–86. - PubMed
-
- Do KA, Müller P, Tang F. A Bayesian mixture model for differential gene expression. J Royal Stat Soc Ser C (Appl Stat). 2005;54:627–644.
-
- Apolloni J, Leguizamón G, Alba E. Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput. 2016;38:922–932.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical