Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr;23(e1):e113-7.
doi: 10.1093/jamia/ocv155. Epub 2015 Nov 13.

Classification of radiology reports for falls in an HIV study cohort

Affiliations

Classification of radiology reports for falls in an HIV study cohort

Jonathan Bates et al. J Am Med Inform Assoc. 2016 Apr.

Abstract

Objective: To identify patients in a human immunodeficiency virus (HIV) study cohort who have fallen by applying supervised machine learning methods to radiology reports of the cohort.

Methods: We used the Veterans Aging Cohort Study Virtual Cohort (VACS-VC), an electronic health record-based cohort of 146 530 veterans for whom radiology reports were available (N=2 977 739). We created a reference standard of radiology reports, represented each report by a feature set of words and Unified Medical Language System concepts, and then developed several support vector machine (SVM) classifiers for falls. We compared mutual information (MI) ranking and embedded feature selection approaches. The SVM classifier with MI feature selection was chosen to classify all radiology reports in VACS-VC.

Results: Our SVM classifier with MI feature selection achieved an area under the curve score of 97.04 on the test set. When applied to all the radiology reports in VACS-VC, 80 416 of these reports were classified as positive for a fall. Of these, 11 484 were associated with a fall-related external cause of injury code (E-code) and 68 932 were not, corresponding to 29 280 patients with potential fall-related injuries who could not have been found using E-codes.

Discussion: Feature selection was crucial to improving the classifier's performance. Feature selection with MI allowed us to select the number of discriminative features to use for classification, in contrast to the embedded feature selection method, in which the number of features is chosen automatically.

Conclusion: Machine learning is an effective method of identifying patients who have suffered a fall. The development of this classifier supplements the clinical researcher's toolkit and reduces dependence on under-coded structured electronic health record data.

Keywords: HIV; aging; falls; information retrieval; text mining.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Bag of words and concepts model. Unified Medical Language System (UMLS) concepts are indicated by parentheses.
Figure 2:
Figure 2:
Most predictive features. Parentheses around a word – eg, “(fall)” – indicate a Unified Medical Language System (UMLS) concept, which we try to summarize inside the parentheses. Protected health information is redacted with “[PHI].”

References

    1. National Center for HIV/AIDS, STD, and TB Prevention. HIV Surveillance - Epidemiology of HIV Infection . CDC; 2008.
    1. Womack JA, Goulet JL, Gibert C, et al. . Increased risk of fragility fractures among HIV infected compared to uninfected male veterans . PLoS ONE. 2011. ; 6 ( 2 ): 1 – 6 . - PMC - PubMed
    1. Yin MT, Shi Q, Hoover DR, et al. . Fracture incidence in HIV-infected women: results from the Women’s Interagency HIV Study . AIDS. 2010. ; 24 ( 17 ): 2679 – 2686 . - PMC - PubMed
    1. McCart JA, Berndt DJ, Jarman J, et al. . Finding falls in ambulatory care clinical documents using statistical text mining . JAMIA. 2013. ; 20 ( 5 ): 906 – 914 . - PMC - PubMed
    1. Tremblay MC, Berndt DJ, Luther SL, et al. . Identifying fall-related injuries: text mining the electronic medical record . Inf Technol Manag. 2009. ; 10 : 253 – 265 .

Publication types