. 2023 Nov 10;10(11):1307.

doi: 10.3390/bioengineering10111307.

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Affiliations

¹ Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK.
² School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK.
³ Faculty of Medicine, University of Porto, 4100 Porto, Portugal.
⁴ University Hospitals Bristol and Weston, Marlborough St, Bristol BS1 3NU, UK.

PMID: 38002431
PMCID: PMC10669818
DOI: 10.3390/bioengineering10111307

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Tim Dong et al. Bioengineering (Basel). 2023.

. 2023 Nov 10;10(11):1307.

doi: 10.3390/bioengineering10111307.

Affiliations

¹ Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK.
² School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK.
³ Faculty of Medicine, University of Porto, 4100 Porto, Portugal.
⁴ University Hospitals Bristol and Weston, Marlborough St, Bristol BS1 3NU, UK.

PMID: 38002431
PMCID: PMC10669818
DOI: 10.3390/bioengineering10111307

Abstract

Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis.

Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use.

Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated.

Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R² values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance.

Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.

Keywords: Big Data; data extraction; echo report; echocardiography analysis; electronic health records (EHR); natural language processing (NLP); unstructured data; validation.

PubMed Disclaimer

Conflict of interest statement

All authors declare that there are no competing interests.

Figures

**Figure 1**
Design of the NLP Echo extraction system.

**Figure 2**
Graphical user interface of system to explainability of extraction for the continuous variable, EF. varValue1 shows the lower range value if this exists; varValue2 shows the upper range value; varValue shows the average of lower and upper range values if these exist.

**Figure 3**
In depth analysis of clinician and system extractions for AV max PG; green and pink region indicates region of features involved in the annotation process.

**Figure 4**
Bubble plot analysis of magnitude and calibration for continuous variable sets. X-axis and y-axis show the total magnitude of extracted outcome measures for clinician and NLP, respectively. The size of circles represents the frequency of each variable extracted by the NLP system.

**Figure 5**
Confusion matrix analysis of classification performance for discrete variable sets. system (Predicted) is shown on the y-axis while the clinicians (Actual) is shown on the x-axis. Accuracies are shown in green and errors in red. % represents percentage of total.

See this image and copyright information in PMC

Cited by

Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification.
Arends B, Vessies M, van Osch D, Teske A, van der Harst P, van Es R, van Es B. Arends B, et al. BMC Med Inform Decis Mak. 2025 Mar 7;25(1):115. doi: 10.1186/s12911-025-02897-w. BMC Med Inform Decis Mak. 2025. PMID: 40050820 Free PMC article.
Ontology-guided machine learning outperforms zero-shot foundation models for cardiac ultrasound text reports.
Subramaniam S, Rizvi S, Ramesh R, Sehgal V, Gurusamy B, Arif H, Tran J, Thamman R, Anyanwu EC, Mastouri R, Mackensen GB, Arnaout R. Subramaniam S, et al. Sci Rep. 2025 Feb 14;15(1):5456. doi: 10.1038/s41598-024-83540-y. Sci Rep. 2025. PMID: 39953053 Free PMC article.
Triglyceride index as a predictor of mortality after cardiac surgery.
Li H, Xiao F, Ren H, Xu F, Che H, Zhu H, Zhou C, Wang S. Li H, et al. iScience. 2024 Oct 5;27(11):111107. doi: 10.1016/j.isci.2024.111107. eCollection 2024 Nov 15. iScience. 2024. PMID: 39620137 Free PMC article.
Determinants of artificial intelligence electrocardiogram-derived age and its association with cardiovascular events and mortality: a systematic review and meta-analysis.
Mossavarali S, Vaezi A, Gholami Z, Molaei A, Yekaninejad MS, Asselbergs FW, Shafiee A. Mossavarali S, et al. NPJ Digit Med. 2025 May 29;8(1):322. doi: 10.1038/s41746-025-01727-7. NPJ Digit Med. 2025. PMID: 40442323 Free PMC article.
Identifying the Severity of Heart Valve Stenosis and Regurgitation Among a Diverse Population Within an Integrated Health Care System: Natural Language Processing Approach.
Xie F, Lee MS, Allahwerdy S, Getahun D, Wessler B, Chen W. Xie F, et al. JMIR Cardio. 2024 Sep 30;8:e60503. doi: 10.2196/60503. JMIR Cardio. 2024. PMID: 39348175 Free PMC article.

See all "Cited by" articles

References

1. Thompson J., Hu J., Mudaranthakam D.P., Streeter D., Neums L., Park M., Koestler D.C., Gajewski B., Jensen R., Mayo M.S. Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records. Sci. Rep. 2019;9:9253. doi: 10.1038/s41598-019-45705-y. - DOI - PMC - PubMed
1. Zhang Y., Liu M., Hu S., Shen Y., Lan J., Jiang B., de Bock G.H., Vliegenthart R., Chen X., Xie X. Development and multicenter validation of chest X-ray radiography interpretations based on natural language processing. Commun. Med. 2021;1:43. doi: 10.1038/s43856-021-00043-x. - DOI - PMC - PubMed
1. Kim Y., Lee J.H., Choi S., Kim J.-H., Seok J., Joo H.J. Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records. Sci. Rep. 2020;10:20265. doi: 10.1038/s41598-020-77258-w. - DOI - PMC - PubMed
1. Morgan S.E., Diederen K., Vértes P.E., Ip S.H.Y., Wang B., Thompson B., Demjaha A., De Micheli A., Oliver D., Liakata M., et al. Natural Language Processing markers in first episode psychosis and people at clinical high-risk. Transl. Psychiatry. 2021;11:630. doi: 10.1038/s41398-021-01722-y. - DOI - PMC - PubMed
1. Dickerson L.K., Rouhizadeh M., Korotkaya Y., Bowring M.G., Massie A.B., McAdams-Demarco M.A., Segev D.L., Cannon A., Guerrerio A.L., Chen P.-H., et al. Language impairment in adults with end-stage liver disease: Application of natural language processing towards patient-generated health records. NPJ Digit. Med. 2019;2:106. doi: 10.1038/s41746-019-0179-9. - DOI - PMC - PubMed

Grants and funding

CH/17/1/32804/BHF_/British Heart Foundation/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Affiliations

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources