Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 10;10(11):1307.
doi: 10.3390/bioengineering10111307.

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Affiliations

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Tim Dong et al. Bioengineering (Basel). .

Abstract

Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis.

Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use.

Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated.

Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R2 values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance.

Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.

Keywords: Big Data; data extraction; echo report; echocardiography analysis; electronic health records (EHR); natural language processing (NLP); unstructured data; validation.

PubMed Disclaimer

Conflict of interest statement

All authors declare that there are no competing interests.

Figures

Figure 1
Figure 1
Design of the NLP Echo extraction system.
Figure 2
Figure 2
Graphical user interface of system to explainability of extraction for the continuous variable, EF. varValue1 shows the lower range value if this exists; varValue2 shows the upper range value; varValue shows the average of lower and upper range values if these exist.
Figure 3
Figure 3
In depth analysis of clinician and system extractions for AV max PG; green and pink region indicates region of features involved in the annotation process.
Figure 4
Figure 4
Bubble plot analysis of magnitude and calibration for continuous variable sets. X-axis and y-axis show the total magnitude of extracted outcome measures for clinician and NLP, respectively. The size of circles represents the frequency of each variable extracted by the NLP system.
Figure 5
Figure 5
Confusion matrix analysis of classification performance for discrete variable sets. system (Predicted) is shown on the y-axis while the clinicians (Actual) is shown on the x-axis. Accuracies are shown in green and errors in red. % represents percentage of total.

Similar articles

Cited by

References

    1. Thompson J., Hu J., Mudaranthakam D.P., Streeter D., Neums L., Park M., Koestler D.C., Gajewski B., Jensen R., Mayo M.S. Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records. Sci. Rep. 2019;9:9253. doi: 10.1038/s41598-019-45705-y. - DOI - PMC - PubMed
    1. Zhang Y., Liu M., Hu S., Shen Y., Lan J., Jiang B., de Bock G.H., Vliegenthart R., Chen X., Xie X. Development and multicenter validation of chest X-ray radiography interpretations based on natural language processing. Commun. Med. 2021;1:43. doi: 10.1038/s43856-021-00043-x. - DOI - PMC - PubMed
    1. Kim Y., Lee J.H., Choi S., Kim J.-H., Seok J., Joo H.J. Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records. Sci. Rep. 2020;10:20265. doi: 10.1038/s41598-020-77258-w. - DOI - PMC - PubMed
    1. Morgan S.E., Diederen K., Vértes P.E., Ip S.H.Y., Wang B., Thompson B., Demjaha A., De Micheli A., Oliver D., Liakata M., et al. Natural Language Processing markers in first episode psychosis and people at clinical high-risk. Transl. Psychiatry. 2021;11:630. doi: 10.1038/s41398-021-01722-y. - DOI - PMC - PubMed
    1. Dickerson L.K., Rouhizadeh M., Korotkaya Y., Bowring M.G., Massie A.B., McAdams-Demarco M.A., Segev D.L., Cannon A., Guerrerio A.L., Chen P.-H., et al. Language impairment in adults with end-stage liver disease: Application of natural language processing towards patient-generated health records. NPJ Digit. Med. 2019;2:106. doi: 10.1038/s41746-019-0179-9. - DOI - PMC - PubMed

LinkOut - more resources