Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

doi:10.2196/76433

Comparative Study

. 2025 Jul 17:27:e76433.

doi: 10.2196/76433.

Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

Jacinto Mata¹, Victoria Pachón¹, Ana Manovel², Manuel J Maña¹, Manuel de la Villa¹

Affiliations

¹ I²C Research Group, Universidad de Huelva, Huelva, 21007, Spain, +34 687862089.
² Cardiology Department, Juan Ramón Jiménez University Hospital, Multidisciplinary Amyloidosis Unit Huelva, Hospital Juan Ramón Jiménez, Huelva, Spain.

PMID: 40674251
PMCID: PMC12288768
DOI: 10.2196/76433

Comparative Study

Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

Jacinto Mata et al. J Med Internet Res. 2025.

. 2025 Jul 17:27:e76433.

doi: 10.2196/76433.

Authors

Jacinto Mata¹, Victoria Pachón¹, Ana Manovel², Manuel J Maña¹, Manuel de la Villa¹

Affiliations

¹ I²C Research Group, Universidad de Huelva, Huelva, 21007, Spain, +34 687862089.
² Cardiology Department, Juan Ramón Jiménez University Hospital, Multidisciplinary Amyloidosis Unit Huelva, Hospital Juan Ramón Jiménez, Huelva, Spain.

PMID: 40674251
PMCID: PMC12288768
DOI: 10.2196/76433

Abstract

Background: Heart failure with preserved ejection fraction (HFpEF) is a major clinical manifestation of cardiac amyloidosis, a condition frequently underdiagnosed due to its nonspecific symptomatology. Electronic health records (EHRs) offer a promising avenue for supporting early symptom detection through natural language processing. However, identifying relevant clinical cues within unstructured narratives, particularly in Spanish, remains a significant challenge due to the scarcity of annotated corpora and domain-specific models. This study proposes and evaluates a Transformer-based natural language processing framework for automated detection of HFpEF-related symptoms in Spanish EHRs.

Objective: The aim of this study is to assess the feasibility of leveraging unstructured clinical narratives to support early identification of heart failure phenotypes indicative of cardiac amyloidosis. It also examines how domain-specific language models and clinically guided optimization strategies can improve the reliability, sensitivity, and generalizability of symptom detection in real-world EHRs.

Methods: A novel corpus of 15,304 Spanish clinical documents was manually annotated and validated by cardiology experts. The corpus was derived from the records of 262 patients (173 with suspected cardiac amyloidosis and 89 without). In total, 8 Transformer-based language models were evaluated, including general-purpose models, biomedical-specialized variants, and Longformers. Three clinically motivated optimization strategies were implemented to align models' behavior with different diagnostic priorities: maximizing area under the curve (AUC) to enhance overall discrimination, optimizing F1-score to balance sensitivity and precision, and prioritizing sensitivity to minimize false negatives. These strategies were independently applied during the fine-tuning of the models to assess their impact on performance under different clinical constraints. To ensure robust evaluation, testing was conducted on a dataset composed exclusively of previously unseen patients, allowing performance to be assessed under realistic and generalizable conditions.

Results: All models achieved high performance, with AUC values above 0.940. The best-performing model, Longformer Biomedical-clinical, reached an AUC of 0.987, F1-score of 0.985, sensitivity of 0.987, and specificity of 0.987 on the test dataset. Models optimized for sensitivity reduced the false-negative rate to under 3%, a key threshold for clinical safety. Comparative analyses confirmed that domain-adapted, long-sequence models are better suited for the semantic and structural complexity of Spanish clinical texts than general-purpose models.

Conclusions: Transformer-based models can reliably detect HFpEF-related symptoms from Spanish EHRs, even in the presence of class imbalance and substantial linguistic complexity. The results show that combining domain-specific pretraining with long-context modeling architectures and clinically aligned optimization strategies leads to substantial gains in classification performance, particularly in sensitivity. These models not only achieve high accuracy and generalization on unseen patients but also demonstrate robustness in handling the semantic nuances and narrative structure of real-world clinical documentation. These findings support the potential deployment of Transformer-based systems as effective screening tools to prioritize patients at risk for cardiac amyloidosis in Spanish-speaking health care settings.

Keywords: clinical language models; early diagnosis support; manual corpus annotation; natural language processing; symptom extraction; transformer.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1.. Prodigy interface for dataset annotation.**

**Figure 2.. Confusion matrix showing interannotator agreement. Values represent the number of documents assigned to each label by annotator A and annotator B.**

Figure 3.. ROC curves for the test dataset, illustrating the performance of models optimized using 3 different hyperparameter optimization strategies. (A) Optimization based on AUC using the *Longformer Biomedical-clinical* model; (B) Optimization based on F₁-score using the same model; (C) Optimization based on sensitivity using the *bsc-bio-ehr* model. For each strategy, the model shown corresponds to the one that achieved the highest AUC on the test dataset.AUC: area under the curve; ROC: receiver operating characteristic.

Figure 4.. Distribution of document lengths (in tokens) across the full dataset, computed using each model’s tokenizer. The x-axis represents token count ranges, and the y-axis lists the tokenizers associated with the evaluated language models.

Figure 5.. Confusion matrices showing classification results on the test dataset using 3 different hyperparameter optimization strategies: (A) AUC-based optimization using the *Longformer Biomedical-Clinical* model; (B) F₁-score–based optimization using the same model; and (C) sensitivity-based optimization using the *Longformer RoBERTa* model. Each matrix shows the number of true positives, true negatives, false positives, and false negatives predicted by the respective model. These results illustrate the trade-offs introduced by each optimization criterion in terms of sensitivity and specificity. AUC: area under the curve.

See this image and copyright information in PMC

References

1. Borlaug BA, Sharma K, Shah SJ, Ho JE. Heart failure with preserved ejection fraction: JACC scientific statement. J Am Coll Cardiol. 2023 May 9;81(18):1810–1834. doi: 10.1016/j.jacc.2023.01.049. doi. Medline. - DOI - PubMed
1. Yamamoto H, Yokochi T. Transthyretin cardiac amyloidosis: an update on diagnosis and treatment. ESC Heart Fail. 2019 Dec;6(6):1128–1139. doi: 10.1002/ehf2.12518. doi. Medline. - DOI - PMC - PubMed
1. Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023 Mar 30;388(13):1201–1208. doi: 10.1056/NEJMra2302038. doi. Medline. - DOI - PubMed
1. Reading Turchioe M, Volodarskiy A, Pathak J, Wright DN, Tcheng JE, Slotwiner D. Systematic review of current natural language processing methods and applications in cardiology. Heart. 2022 May 25;108(12):909–916. doi: 10.1136/heartjnl-2021-319769. doi. Medline. - DOI - PMC - PubMed
1. García Subies G, Barbero Jiménez Á, Martínez Fernández P. A comparative analysis of Spanish Clinical encoder-based models on NER and classification tasks. J Am Med Inform Assoc. 2024 Sep 1;31(9):2137–2146. doi: 10.1093/jamia/ocae054. doi. Medline. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central
Medical
- MedlinePlus Health Information

[1] Borlaug BA, Sharma K, Shah SJ, Ho JE. Heart failure with preserved ejection fraction: JACC scientific statement. J Am Coll Cardiol. 2023 May 9;81(18):1810–1834. doi: 10.1016/j.jacc.2023.01.049. doi. Medline. - DOI - PubMed

[2] Borlaug BA, Sharma K, Shah SJ, Ho JE. Heart failure with preserved ejection fraction: JACC scientific statement. J Am Coll Cardiol. 2023 May 9;81(18):1810–1834. doi: 10.1016/j.jacc.2023.01.049. doi. Medline. - DOI - PubMed

[3] Yamamoto H, Yokochi T. Transthyretin cardiac amyloidosis: an update on diagnosis and treatment. ESC Heart Fail. 2019 Dec;6(6):1128–1139. doi: 10.1002/ehf2.12518. doi. Medline. - DOI - PMC - PubMed

[4] Yamamoto H, Yokochi T. Transthyretin cardiac amyloidosis: an update on diagnosis and treatment. ESC Heart Fail. 2019 Dec;6(6):1128–1139. doi: 10.1002/ehf2.12518. doi. Medline. - DOI - PMC - PubMed

[5] Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023 Mar 30;388(13):1201–1208. doi: 10.1056/NEJMra2302038. doi. Medline. - DOI - PubMed

[6] Haug CJ, Drazen JM. Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023 Mar 30;388(13):1201–1208. doi: 10.1056/NEJMra2302038. doi. Medline. - DOI - PubMed

[7] Reading Turchioe M, Volodarskiy A, Pathak J, Wright DN, Tcheng JE, Slotwiner D. Systematic review of current natural language processing methods and applications in cardiology. Heart. 2022 May 25;108(12):909–916. doi: 10.1136/heartjnl-2021-319769. doi. Medline. - DOI - PMC - PubMed

[8] Reading Turchioe M, Volodarskiy A, Pathak J, Wright DN, Tcheng JE, Slotwiner D. Systematic review of current natural language processing methods and applications in cardiology. Heart. 2022 May 25;108(12):909–916. doi: 10.1136/heartjnl-2021-319769. doi. Medline. - DOI - PMC - PubMed

[9] García Subies G, Barbero Jiménez Á, Martínez Fernández P. A comparative analysis of Spanish Clinical encoder-based models on NER and classification tasks. J Am Med Inform Assoc. 2024 Sep 1;31(9):2137–2146. doi: 10.1093/jamia/ocae054. doi. Medline. - DOI - PMC - PubMed

[10] García Subies G, Barbero Jiménez Á, Martínez Fernández P. A comparative analysis of Spanish Clinical encoder-based models on NER and classification tasks. J Am Med Inform Assoc. 2024 Sep 1;31(9):2137–2146. doi: 10.1093/jamia/ocae054. doi. Medline. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

Affiliations

Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical