Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 31;25(1):154.
doi: 10.1186/s12911-025-02871-6.

Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages

Affiliations

Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages

Izzet Turkalp Akbasli et al. BMC Med Inform Decis Mak. .

Abstract

Background: The integration of big data and artificial intelligence (AI) in healthcare, particularly through the analysis of electronic health records (EHR), presents significant opportunities for improving diagnostic accuracy and patient outcomes. However, the challenge of processing and accurately labeling vast amounts of unstructured data remains a critical bottleneck, necessitating efficient and reliable solutions. This study investigates the ability of domain specific, fine-tuned large language models (LLMs) to classify unstructured EHR texts with typographical errors through named entity recognition tasks, aiming to improve the efficiency and reliability of supervised learning AI models in healthcare.

Methods: Turkish clinical notes from pediatric emergency room admissions at Hacettepe University İhsan Doğramacı Children's Hospital from 2018 to 2023 were analyzed. The data were preprocessed with open source Python libraries and categorized using a pretrained GPT-3 model, "text-davinci-003," before and after fine-tuning with domain-specific data on respiratory tract infections (RTI). The model's predictions were compared against ground truth labels established by pediatric specialists.

Results: Out of 24,229 patient records classified as poorly labeled, 18,879 were identified without typographical errors and confirmed for RTI through filtering methods. The fine-tuned model achieved a 99.88% accuracy, significantly outperforming the pretrained model's 78.54% accuracy in identifying RTI cases among the remaining records. The fine-tuned model demonstrated superior performance metrics across all evaluated aspects compared to the pretrained model.

Conclusions: Fine-tuned LLMs can categorize unstructured EHR data with high accuracy, closely approximating the performance of domain experts. This approach significantly reduces the time and costs associated with manual data labeling, demonstrating the potential to streamline the processing of large-scale healthcare data for AI applications.

Keywords: Artificial intelligence; Electronic healthcare records; Large language models; Respiratory tract infections.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The Hacettepe University Clinical Research Ethics Committee approved our study’s design and procedures under protocol number GO-23/508, ensuring adherence to the ethical standards in clinical research. The data sourced from Hacettepe University İhsan Doğramacı Children’s Hospital, which underwent a de-identification process through the redaction of protected health information, received approval for utilization in a quality improvement project by the hospital. In this context, the Hacettepe University Research Ethics Board granted a waiver for the necessity of its approval and the procurement of informed consent for this study. Furthermore, all procedures complied with the relevant guidelines and standards outlined in the Declaration of Helsinki. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Data Processing and Model Performance for RTI Identification: This figure shows the filtering of URTI symptoms from the dataset after processing all cases. It focuses on the analysis of 5,350 poorly labeled cases, comparing the ROC-AUC performance of the pretrained and fine-tuned GPT-3 models. The fine-tuned model demonstrates significant improvement in identifying RTI cases

Similar articles

References

    1. Saggi MK, Jain S. A survey towards an integration of big data analytics to big insights for value-creation. Inf Process Manag. 2018;54(5):758–90.
    1. Pastorino R, De Vito C, Migliara G, Glocker K, Binenbaum I, Ricciardi W, et al. Benefits and challenges of Big Data in healthcare: an overview of the European initiatives. Eur J Public Health. 2019;29(Suppl 3):23–7. - PMC - PubMed
    1. Mishra S, Tripathy HK, Mishra BK, Sahoo S. Usage and Analysis of Big Data in E-Health Domain. In: Research Anthology on Big Data Analytics, Architectures, and Applications. IGI Global; 2022 [cited 2024 Feb 8]. pp. 417–30. Available from: https://www.igi-global.com/chapter/usage-and-analysis-of-big-data-in-e-h...
    1. Yin J, Ngiam KY, Teo HH. Role of artificial intelligence applications in real-life clinical practice: systematic review. J Med Internet Res. 2021;23(4):e25759. - PMC - PubMed
    1. Bates DW, Levine D, Syrowatka A, Kuznetsova M, Craig KJT, Rui A, et al. The potential of artificial intelligence to improve patient safety: a scoping review. NPJ Digit Med. 2021;4(1):54. - PMC - PubMed

LinkOut - more resources