Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jul 20;101(29):10529-34.
doi: 10.1073/pnas.0403794101. Epub 2004 Jul 12.

Tree-structured supervised learning and the genetics of hypertension

Affiliations

Tree-structured supervised learning and the genetics of hypertension

Jing Huang et al. Proc Natl Acad Sci U S A. .

Abstract

This paper is about an algorithm, FlexTree, for general supervised learning. It extends the binary tree-structured approach (Classification and Regression Trees, CART) although it differs greatly in its selection and combination of predictors. It is particularly applicable to assessing interactions: gene by gene and gene by environment as they bear on complex disease. One model for predisposition to complex disease involves many genes. Of them, most are pure noise; each of the values that is not the prevalent genotype for the minority of genes that contribute to the signal carries a "score." Scores add. Individuals with scores above an unknown threshold are predisposed to the disease. For the additive score problem and simulated data, FlexTree has cross-validated risk better than many cutting-edge technologies to which it was compared when small fractions of candidate genes carry the signal. For the model where only a precise list of aberrant genotypes is predisposing, there is not a systematic pattern of absolute superiority; however, overall, FlexTree seems better than the other technologies. We tried the algorithm on data from 563 Chinese women, 206 hypotensive, 357 hypertensive, with information on ethnicity, menopausal status, insulin-resistant status, and 21 loci. FlexTree and Logic Regression appear better than the others in terms of Bayes risk. However, the differences are not significant in the usual statistical sense.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Application of FlexTree to Chinese women in SAPPHIRe. The ovals and rectangles, respectively, indicate internal and terminal nodes. The label assigned to each node is determined so as to minimize the misclassification cost. The number to the left of the slash is the number of hypotensives; the number to the right is the number of hypertensives. Here we assume equal prior probabilities and misclassification cost for hypotensives twice that for hypertensives.

Similar articles

Cited by

References

    1. Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984) Classification and Regression Trees (Wadsworth, Belmont, CA), 1st Ed.
    1. Zhang, H. (1998) J. Am. Stat. Assoc. 93, 180–193.
    1. Zhang, H. & Bonney, G. (2000) Genet. Epidemiol. 19, 323–332. - PubMed
    1. Tibshirani, R., Hastie, T. & Buja, A. (1995) Ann. Stat. 23, 73–102.
    1. Hastie, T., Friedman, J. H. & Tibshirani, R. (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer, New York), 1st Ed.

Publication types