Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 9;19(5):e0303519.
doi: 10.1371/journal.pone.0303519. eCollection 2024.

Natural language processing augments comorbidity documentation in neurosurgical inpatient admissions

Affiliations

Natural language processing augments comorbidity documentation in neurosurgical inpatient admissions

Rahul A Sastry et al. PLoS One. .

Abstract

Objective: To establish whether or not a natural language processing technique could identify two common inpatient neurosurgical comorbidities using only text reports of inpatient head imaging.

Materials and methods: A training and testing dataset of reports of 979 CT or MRI scans of the brain for patients admitted to the neurosurgery service of a single hospital in June 2021 or to the Emergency Department between July 1-8, 2021, was identified. A variety of machine learning and deep learning algorithms utilizing natural language processing were trained on the training set (84% of the total cohort) and tested on the remaining images. A subset comparison cohort (n = 76) was then assessed to compare output of the best algorithm against real-life inpatient documentation.

Results: For "brain compression", a random forest classifier outperformed other candidate algorithms with an accuracy of 0.81 and area under the curve of 0.90 in the testing dataset. For "brain edema", a random forest classifier again outperformed other candidate algorithms with an accuracy of 0.92 and AUC of 0.94 in the testing dataset. In the provider comparison dataset, for "brain compression," the random forest algorithm demonstrated better accuracy (0.76 vs 0.70) and sensitivity (0.73 vs 0.43) than provider documentation. For "brain edema," the algorithm again demonstrated better accuracy (0.92 vs 0.84) and AUC (0.45 vs 0.09) than provider documentation.

Discussion: A natural language processing-based machine learning algorithm can reliably and reproducibly identify selected common neurosurgical comorbidities from radiology reports.

Conclusion: This result may justify the use of machine learning-based decision support to augment provider documentation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Example of frequency and TF-IDF tokenization strategies illustrating how TF-IDF controls for words that frequently occur in the corpus.
TF-IDF = Term Frequency Inverse Document Frequency.
Fig 2
Fig 2. Machine learning and deep learning model performance on brain compression data.
(A) Machine learning classifiers’ performance methods with both frequency (TF) and term frequency-inverse document frequency (TFIDF) tokenization strategies. (B) Deep learning classifiers’ performance methods with both frequency and term frequency-inverse document frequency (TFIDF) tokenization strategies. SVM = support vector machine; NB = Naïve bayes; Log = Logistic regression.
Fig 3
Fig 3. Machine learning and deep learning model performance on brain edema data.
(A) Machine learning classifiers’ performance methods with both frequency (TF) and term frequency-inverse document frequency (tfidf) tokenization strategies. (B) Deep learning classifiers’ performance methods with both frequency and term frequency-inverse document frequency (TFIDF) tokenization strategies. SVM = support vector machine; NB = Naïve bayes; Log = Logistic regression.
Fig 4
Fig 4. Receiver operating characteristic (ROC) curves for random forest classifier with TF-IDF tokenization.
(A) Estimator trained for brain compression classification. (B) Estimator trained for brain edema classification. AUC = area under the curve.
Fig 5
Fig 5. Machine learning estimator and provider documentation comparison.
(A) Estimators for compression dataset. (B) Estimators for edema dataset. SVM = support vector machine; NB = Naïve bayes; Log = Logistic regression.

References

    1. Aiello FA, Judelson DR, Durgin JM, Doucet DR, Simons JP, Durocher DM, et al.. A physician-led initiative to improve clinical documentation results in improved health care documentation, case mix index, and increased contribution margin. J Vasc Surg. 2018;68: 1524–1532. doi: 10.1016/j.jvs.2018.02.038 - DOI - PubMed
    1. Barnes SL, Waterman M, MacIntyre D, Coughenour J, Kessel J. Impact of standardized trauma documentation to the hospital’s bottom line. Surgery. 2010;148: 793–798. doi: 10.1016/j.surg.2010.07.040 - DOI - PubMed
    1. Reyes C, Greenbaum A, Porto C, Russell JC. Implementation of a Clinical Documentation Improvement Curriculum Improves Quality Metrics and Hospital Charges in an Academic Surgery Department. J Am Coll Surg. 2017;224: 301–309. doi: 10.1016/j.jamcollsurg.2016.11.010 - DOI - PubMed
    1. Ali R, Syed S, Sastry RA, Abdulrazeq H, Shao B, Roye GD, et al.. Toward more accurate documentation in neurosurgical care. Neurosurg Focus. 2021;51: E11. doi: 10.3171/2021.8.FOCUS21387 - DOI - PubMed
    1. Spurgeon A, Hiser B, Hafley C, Litofsky NS. Does Improving Medical Record Documentation Better Reflect Severity of Illness in Neurosurgical Patients? Neurosurgery. 2011;58: 155–163. doi: 10.1227/neu.0b013e318227049 - DOI - PubMed

Publication types