Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017;46(21):10823-10834.
doi: 10.1080/03610926.2016.1248783. Epub 2017 Aug 2.

An evaluation of common methods for dichotomization of continuous variables to discriminate disease status

Affiliations

An evaluation of common methods for dichotomization of continuous variables to discriminate disease status

Sybil L Prince Nelson et al. Commun Stat Theory Methods. 2017.

Abstract

Dichotomization of continuous variables to discriminate a dichotomous outcome is often useful in statistical applications. If a true threshold for a continuous variable exists, the challenge is identifying it. This paper examines common methods for dichotomization to identify which ones recover a true threshold. We provide mathematical and numeric proofs demonstrating that maximizing the odds ratio, Youden's statistic, Gini Index, chi-square statistic, relative risk and kappa statistic all theoretically recover a true threshold. A simulation study evaluating the ability of these statistics to recover a threshold when sampling from a population indicates that maximizing the chi-square statistic and Gini Index have the smallest bias and variability when the probability of being larger than the threshold is small while maximizing Kappa or Youden's statistics is best when this probability is larger. Maximizing odds ratio is the most variable and biased of the methods.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Graphical representation of possible thresholds for X presented in equations 1–3.
Figure 2
Figure 2
Simulation results showing mean-squared error (MSE) by Bias2 under the case-control study design for the estimated threshold obtained by maximizing the statistics: odds ratio, Youden’s, chi-square, Gini Index, and kappa. Rows represent strength of association between X and Y and columns represent the probability that the independent variable X is greater than the true threshold T.
Figure 3
Figure 3
Simulation results showing mean-squared error (MSE) by Bias2 under the case-control study design for the estimated threshold obtained by maximizing the statistics: Youden’s, chi-square, Gini Index, and kappa, excluding P(XT)=0.05. Rows represent strength of association between X and Y and columns represent the probability that the independent variable X is greater than the true threshold T.

References

    1. Alvarez-Garcıa G, Collantes-Fernandez E, Costas E, Rebordosa X, Ortega-Mora L. Influence of age and purpose for testing on the cut-off selection of serological methods in bovine neosporosis. Veterinary Research, BioMed Central. 2003;34(3):341–352. - PubMed
    1. Aoki K, Misumi J, Kimura T, Zhao W, Xie T. Evaluation of cutoff levels for screening of gastric cancer using serum pepsinogens and distributions of levels of serum pepsinogen i, ii and of pg i / pg ii ratios in a gastric cancer case-control study. Journal of Epidemiology. 1997;7(3):143–151. - PubMed
    1. Boehning D, Holling H, Patilea V. A limitation of the diagnostic-odds ratio in determining an optimal cut-off value for a continuous diagnostic test. Statistical Methods in Medical Research. 2011;20(5):541–550. - PubMed
    1. Bortheiry A, Malerbi D, Franco L. The roc curve in the evaluation of fasting capillary blood glucose as a screening test for diabetes and igt. Diabetes Care. 1994;17:1269–1272. - PubMed
    1. Breiman L, Friedman J, Stone C, Olshen R. Classification and regression trees. CRC press; 1984.

LinkOut - more resources