Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 30;41(7):1242-1262.
doi: 10.1002/sim.9267. Epub 2021 Nov 23.

Feature selection and classification over the network with missing node observations

Affiliations

Feature selection and classification over the network with missing node observations

Zhuxuan Jin et al. Stat Med. .

Abstract

Jointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome-scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen-Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas.

Keywords: Bayesian nonparametrics; false discovery rate control; feature selection; gene networks.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
The impact of missing gene nodes in the network. (A) The missing gene is itself an up-regulated gene, it would be excluded if missing genes are removed from data analysis; (B) the missing gene serves as a “bridge” for information exchange. If it is simply removed, the light red node located on the left side would not be able to be recalled as up-regulated gene
FIGURE 2
FIGURE 2
An illustration of the distributions of test statistics under each simulation setting
FIGURE 3
FIGURE 3
Histogram of the test statistics, with estimated null density and frequencies of the selected genes. (A) Results by BNC; (B) results by locfdr with center matching estimation for a symmetric null. Local false discovery rate is controlled at 0.2 for both methods. Blue, low-risk genes; red, high-risk genes
FIGURE 4
FIGURE 4
Two example modules of selected genes. (A) An example module with 39 low-risk genes and 9 high-risk genes; (B) An example module with 23 high-risk genes and 17 low-risk genes
FIGURE 5
FIGURE 5
A module containing two nodes with missing observations being identified as low-risk genes by BNC

Similar articles

Cited by

References

    1. Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genet Epidemiol. 2002;23:70–86. - PubMed
    1. Do KA, Müller P, Tang F. A Bayesian mixture model for differential gene expression. J Royal Stat Soc Ser C (Appl Stat). 2005;54:627–644.
    1. Cun YP, Fröhlich H. Biomarker gene signature discovery integrating network knowledge. Biology. 2012;1:5–17. - PMC - PubMed
    1. Cun YP, Fröhlich H. Network and data integration for biomarker signature discovery via network smoothed T-statistics. PLoS One. 2013;8:e73074. - PMC - PubMed
    1. Apolloni J, Leguizamón G, Alba E. Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput. 2016;38:922–932.

Publication types