Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 1:11:3333-3369.

Learning Instance-Specific Predictive Models

Affiliations

Learning Instance-Specific Predictive Models

Shyam Visweswaran et al. J Mach Learn Res. .

Abstract

This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms.

Keywords: Bayesian model averaging; Bayesian network; Markov blanket; instance-specific.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A general characterization of the induction of and inference in population-wide (top panel) and instance-specific (bottom panel) models. In the bottom panel, there is an extra arc from instance to model, because the structure and parameters of the model are influenced by the features of the instance at hand.
Figure 2
Figure 2
A simple hypothetical Bayesian network for a medical domain. All the nodes represent binary variables, taking values in the domain T, F where T stands for True and F for False. The graph at the top represents the Bayesian network structure. Associated with each variable (node) is a conditional probability table representing the probability of each variable’s value conditioned on its parent set. (Note: these probabilities are for illustration only; they are not intended to reflect frequency of events in any actual patient population.)
Figure 3
Figure 3
Example of a Markov blanket. The Markov blanket of the node X6 (shown stippled) comprises the set of parents, children and spouses of the node and is indicated by the shaded nodes. The nodes in the Markov blanket include X2 and X3 as parents, X8 and X9 as children, and X5 and X7 as spouses of X6. X1, X4, X10 and X11 are not in the Markov blanket of X6.
Figure 4
Figure 4
Constraints on the Markov blanket operators. The nodes are categorized into five groups: T = target, P = parent, C = child, S = spouse, and O = other (not in the Markov blanket of T). The cells with check marks indicate valid operations and are the only ones that need to be considered in generating candidate structures. The cells with an asterisk indicate that the operation is valid only if the resulting graph is acyclic.
Figure 5
Figure 5
An example where the application of an operator leads to additional removal of arcs to produce a valid Markov blanket structure. Deletion of arc ZX5 leads to removal of the arc X4X5 since X5 is no longer a part of the Markov blanket of Z. Reversal of the same arc also leads to removal of the arc X4X5 since X5 is now a parent and is precluded from having incoming arcs. Also, unless X4X5 is removed there will be a cycle.
Figure 6
Figure 6
Pseudocode for the two-phase search procedure used by the ISMB algorithm. Phase 1 uses greedy hill-climbing search while phase 2 uses best-first search.
Figure 7
Figure 7
Training and test data sets derived from the deterministic function Z = A ∨ (BCD). The training set contains a total of 69 instances and the test set a total of three instances as shown; the test instances are not present in the training set. The training set simulates low prevalence of A = T since only five of the 69 instances have this variable-value combination.
Figure 8
Figure 8
Plots of model averaged estimate of P(Zt = T|xt,D) that is abbreviated as P(Z = T) and model score obtained by ISMB and NISMB algorithms on the three test cases given in Figure 7. Each row represents a single test case with the plot on the left obtained from the ISMB algorithm and the plot on the right obtained from the NISMB algorithm. The value of the final averaged estimate of P(Zt = T|xt,D) is the point where the darker curve meets the Y-axis on the right.

References

    1. Aha DW. Feature weighting for lazy learning algorithms. In: Huan L, Hiroshi M, editors. Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publisher; Norwell, MA: 1998. pp. 13–32.
    1. Aliferis CF, Tsamardinos I, Statnikov A. Hiton: A novel markov blanket algorithm for optimal variable selection. Proceedings of the American Medical Informatics Association (AMIA) Annual Symposium; 2003. pp. 21–5. - PMC - PubMed
    1. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD. Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. Journal of Machine Learning Research. 2010a Jan;11:171–234.
    1. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD. Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. Journal of Machine Learning Research. 2010b Jan;11:235–284.
    1. Atkeson CG, Moore AW, Schaal S. Locally weighted learning. Artificial Intelligence Review. 1997;11(1–5):11–73.

LinkOut - more resources