Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep 30;105(39):14790-5.
doi: 10.1073/pnas.0807471105. Epub 2008 Sep 24.

Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

Affiliations

Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

David Donoho et al. Proc Natl Acad Sci U S A. .

Abstract

In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, ..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p - pi((i)))/sqrt{i/p(1-i/p)}. We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Illustration of HC thresholding. (A) The ordered |Z| scores. (B) The corresponding ordered P-values in a PP plot. (C) The HC objective function in Eq. 1; this is largest at î ≈ 0.01 N (x axes are i/N). Vertical lines indicate π(î) in B and |Z|(î) in A.
Fig. 2.
Fig. 2.
Monte Carlo performance of thresholding rules in the RW model. (A–D) P = 1,000, ε = 0.05, and x axes display τ. (A) MCR1/2. (B) Average threshold. (C) Average FDR. (D) Average MDR. Threshold procedures used: HC (black), Bonferroni (green), FDR (q = .5) (blue), FDRT (q = .1) (red). Averages from 1,000 Monte Carlo realizations.
Fig. 3.
Fig. 3.
Comparison of HCT functional with ideal functional. In all of the panels, ε = 1/100, and x axes display τ. (A) MCR1/2. (B) Threshold. (C) FDR. (D) MDR. Threshold procedures used: HC (black), Ideal (green). Curves for FDR thresholding with q = .5 (blue) and q = .1 (red) are also shown. In each measure, green and black curves are close for τ > 2. The discrepancy at small τ is caused by the limitation THC > t0.
Fig. 4.
Fig. 4.
Receiver operating characteristics curves for threshold detectors, together with operating points of max-HCT (h), max-SEP (s), and max-Proxy1 (p). Also included are the operating points of FDR (F) thresholding with q = .5. Note that h, s, and p are quite close to each other, but F can be very different.
Fig. 5.
Fig. 5.
Comparison of error rates by using Shrunken Centroids, threshold choice by cross-validation, and linear classifiers by using HCT-based threshold selection. Simulation assuming the RW model. Black, HCT-soft; red, Shrunken Centroids; green, HCT-clip; blue, HCT-hard. x axis displays τ.

References

    1. Anderson TW. An Introduction to Multivariate Statistical Analysis. 3rd Ed. New York: Wiley; 2003.
    1. Bickel P, Levina E. Some theory of Fisher's linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli. 2004;10:989–1010.
    1. Fan J, Fan Y. High dimensional classification using features annealed independence rules. Ann Statist. 2008 in press. - PMC - PubMed
    1. Donoho D, Johnstone I. Minimax risk over lp-balls for lq-error. Probab Theory Relat Fields. 1994;2:277–303.
    1. Donoho D, Johnstone I, Hoch JC, Stern AS. Maximum entropy and the nearly black object. J R Stat Soc B. 1992;54:41–81.

Publication types

MeSH terms