Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 8;10(12):e0144439.
doi: 10.1371/journal.pone.0144439. eCollection 2015.

Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features

Affiliations

Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features

Gregor Stiglic et al. PLoS One. .

Abstract

Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755-0.771) to 0.769 (95% CI: 0.761-0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Classification performance of the three observed approaches.
Four sets of boxplots represent predictive performance measured in Area under the ROC curve (AUC) for 1-Standard Error (1SE), boosted C5.0 decision trees (C5.0), glinternet (GLI) and a model using optimal lambda (OPT) setting obtained using cross-validation. Each set is obtained for a different setting of “Number of Discovered Interactions” (NDI)–i.e. 5, 10, 15 and 20 interactions.
Fig 2
Fig 2. Complexity of the three observed approaches.
Comparison of model complexity, measured as number of selected features, for the three compared approaches and four different settings of “Number of Discovered Interactions” (NDI).
Fig 3
Fig 3. Risk of readmission with and without the interaction term.
Surface plot of the response (risk of readmission) from the model without (left) and with interaction between length of stay (LOS_LOG) and number of chronic diseases (NCHRONIC).

Similar articles

Cited by

References

    1. Zhang W, Wan YW, Allen GI, Pang K, Anderson ML, Liu Z. Molecular pathway identification using biological network-regularized logistic models. BMC genomics. 2013; 14(Suppl 8), S7 10.1186/1471-2164-14-S8-S7 - DOI - PMC - PubMed
    1. Friedman JH. Fast sparse regression and classification. Int J Forecast. 2012; 28(3), 722–738.
    1. Bien J, Taylor J, Tibshirani R. A lasso for hierarchical interactions. Ann Stat. 2013; 41(3), 1111–1141. - PMC - PubMed
    1. Radchenko P, James G. Variable selection using adaptive nonlinear interaction structures in high dimensions. J Am Stat Assoc. 2010; 105: 1541–1553
    1. Choi N, Li W, Zhu J. Variable selection with the strong heredity constraint and its oracle property. J Am Stat Assoc. 2010; 105: 354–364.

Publication types

LinkOut - more resources