Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep-Oct;20(5):906-14.
doi: 10.1136/amiajnl-2012-001334. Epub 2012 Dec 15.

Finding falls in ambulatory care clinical documents using statistical text mining

Affiliations

Finding falls in ambulatory care clinical documents using statistical text mining

James A McCart et al. J Am Med Inform Assoc. 2013 Sep-Oct.

Abstract

Objective: To determine how well statistical text mining (STM) models can identify falls within clinical text associated with an ambulatory encounter.

Materials and methods: 2241 patients were selected with a fall-related ICD-9-CM E-code or matched injury diagnosis code while being treated as an outpatient at one of four sites within the Veterans Health Administration. All clinical documents within a 48-h window of the recorded E-code or injury diagnosis code for each patient were obtained (n=26 010; 611 distinct document titles) and annotated for falls. Logistic regression, support vector machine, and cost-sensitive support vector machine (SVM-cost) models were trained on a stratified sample of 70% of documents from one location (dataset Atrain) and then applied to the remaining unseen documents (datasets Atest-D).

Results: All three STM models obtained area under the receiver operating characteristic curve (AUC) scores above 0.950 on the four test datasets (Atest-D). The SVM-cost model obtained the highest AUC scores, ranging from 0.953 to 0.978. The SVM-cost model also achieved F-measure values ranging from 0.745 to 0.853, sensitivity from 0.890 to 0.931, and specificity from 0.877 to 0.944.

Discussion: The STM models performed well across a large heterogeneous collection of document titles. In addition, the models also generalized across other sites, including a traditionally bilingual site that had distinctly different grammatical patterns.

Conclusions: The results of this study suggest STM-based models have the potential to improve surveillance of falls. Furthermore, the encouraging evidence shown here that STM is a robust technique for mining clinical documents bodes well for other surveillance-related topics.

Keywords: Accidental Falls; Ambulatory Care; Electronic Health Records; Text Mining.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Ten document types with the most documents labeled ‘fall’. ED, emergency department; E&M
Figure 2
Figure 2
Cost-sensitive classification on Atrain (support vector machine cost sensitive; SVM-cost). FN, false negative; FP, false positive. Access the article online to view this figure in colour.
Figure 3
Figure 3
Performance measure charts by model and dataset. AUC, area under the receiver operating characteristic curve; LR, logistic regression; NPV, negative predictive value; PPV, positive predictive value; SVM, support vector machine; SVM-cost, support vector machine cost sensitive. Access the article online to view this figure in colour.
Figure 4
Figure 4
Receiver operating characteristic curves by model and dataset. LR, logistic regression; SVM, support vector machine. Access the article online to view this figure in colour.
Figure 5
Figure 5
Misclassification counts by model across all test datasets (Atest–D). LR, logistic regression; SVM, support vector machine; SVM-cost, support vector machine cost sensitive. Access the article online to view this figure in colour.

References

    1. Hausdorff JM, Rios DA, Edelberg HK. Gait variability and fall risk in community-living older adults: a 1-year prospective study. Arch Phys Med Rehabil 2001;82:1050–6 - PubMed
    1. Alamgir H, Muazzam S, Nasrullah M. Unintentional falls mortality among elderly in the United States: time for action. Injury Published Online First: 20 January 2012.10.1016/j.injury.2011.12.001 - DOI - PubMed
    1. Stevens JA, Corso PS, Finkelstein EA, et al. The costs of fatal and non-fatal falls among older adults. Inj Prev 2006;12:290–5 - PMC - PubMed
    1. Hartholt KA, Stevens JA, Polinder S, et al. Increase in fall-related hospitalizations in the United States, 2001–2008. J Trauma 2011;71:255–8 - PubMed
    1. Betz ME, Li G. Epidemiologic patterns of injuries treated in ambulatory care settings. Ann Emerg Med 2005;46:544–51 - PubMed

Publication types