Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 15;22(1):14.
doi: 10.1186/s12911-022-01747-3.

Prediction and risk stratification from hospital discharge records based on Hierarchical sLDA

Affiliations

Prediction and risk stratification from hospital discharge records based on Hierarchical sLDA

Guanglei Yu et al. BMC Med Inform Decis Mak. .

Abstract

Background: The greatly accelerated development of information technology has conveniently provided adoption for risk stratification, which means more beneficial for both patients and clinicians. Risk stratification offers accurate individualized prevention and therapeutic decision making etc. Hospital discharge records (HDRs) routinely include accurate conclusions of diagnoses of the patients. For this reason, in this paper, we propose an improved model for risk stratification in a supervised fashion by exploring HDRs about coronary heart disease (CHD).

Methods: We introduced an improved four-layer supervised latent Dirichlet allocation (sLDA) approach called Hierarchical sLDA model, which categorized patient features in HDRs as patient feature-value pairs in one-hot way according to clinical guidelines for lab test of CHD. To address the data missing and imbalance problem, RFs and SMOTE methods are used respectively. After TF-IDF processing of datasets, variational Bayes expectation-maximization method and generalized linear model were used to recognize the latent clinical state of a patient, i.e., risk stratification, as well as to predict CHD. Accuracy, macro-F1, training and testing time performance were used to evaluate the performance of our model.

Results: According to the characteristics of our datasets, i.e., patient feature-value pairs, we construct a supervised topic model by adding one more Dirichlet distribution hyperparameter to sLDA. Compared with established supervised algorithm Multi-class sLDA model, we demonstrate that our proposed approach enhances training time by 59.74% and testing time by 25.58% but almost no loss of average prediction accuracy on our datasets.

Conclusions: A model for risk stratification and prediction of CHD based on sLDA model was proposed. Experimental results show that Hierarchical sLDA model we proposed is competitive in time performance and accuracy. Hierarchical processing of patient features can significantly improve the disadvantages of low efficiency and time-consuming Gibbs sampling of sLDA model.

Keywords: Hospital discharge records; Risk stratification; Supervised latent Dirichlet allocation; Topic models.

PubMed Disclaimer

Conflict of interest statement

The authors report that they have no conflicts.

Figures

Fig. 1
Fig. 1
Probabilistic graphical model. The probabilistic graphical model representation of Hierarchical sLDA (left); the graphical model representation of variational distribution (right)
Fig. 2
Fig. 2
Original HDR. The original HDR in Chinese (left); The corresponding English version (right)
Fig. 3
Fig. 3
Features annotation of HDRs. The process of features annotation of HDRs
Fig. 4
Fig. 4
Comparison of performance. Comparison of over all classes based on fivefold cross validation: training time (left); testing time (middle); average accuracy (right)
Fig. 5
Fig. 5
Comparison of confusion matrices. Comparison of confusion matrices of topic K=70; multi-class sLDA (left); Hierarchical sLDA (right)
Fig. 6
Fig. 6
Selection of optimal hyperparameter α. Using α from 0.1 to 1.5 with interval of 0.1, topic K from 10 to 70 with interval of 5, the 3-D representions show that we should optimize hyperparameter α with the fitted curve. The left view of 3-D represention (left); the front view of 3-D represention (right)

References

    1. Rod J, Carlene Mm L, et al. Treatment with drugs to lower blood pressure and blood cholesterol based on an individual's absolute cardiovascular risk. Lancet. 2014;384(9943):591–598. doi: 10.1016/S0140-6736(14)61212-5. - DOI - PubMed
    1. Schlesinger DE, Stultz CM. Deep learning for cardiovascular risk stratification. Curr Treat Options Cardiovasc Med. 2020 doi: 10.1007/s11936-020-00814-0. - DOI
    1. Brindle P, Beswick A, Fahey T, Ebrahim S. Accuracy and impact of risk assessment in the primary prevention of cardiovascular disease: a systematic review. Heart. 2006;92(12):1752–1759. doi: 10.1136/hrt.2006.087932. - DOI - PMC - PubMed
    1. Matheny M, Mcpheeters ML, et al. Systematic review of cardiovascular disease risk assessment tools [Internet] Rockville (MD): Agency for Healthcare Research and Quality (US); 2011. - PubMed
    1. Hsueh PYS, Zhu XX, et al. Automatic summarization of risk factors preceding disease progression an insight-driven healthcare service case study on using medical records of diabetic patients. World Wide Web Internet Web Inf Syst. 2015;18(4):1163–1175. doi: 10.1007/s11280-014-0304-2. - DOI

Publication types

LinkOut - more resources