RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records
- PMID: 37464410
- PMCID: PMC10353087
- DOI: 10.1186/s12911-023-02239-8
RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records
Abstract
Background: The ovarian reserve is a reservoir for reproductive potential. In clinical practice, early detection and treatment of premature ovarian decline characterized by abnormal ovarian reserve tests is regarded as a critical measure to prevent infertility. However, the relevant data are typically stored in an unstructured format in a hospital's electronic medical record (EMR) system, and their retrieval requires tedious manual abstraction by domain experts. Computational tools are therefore needed to reduce the workload.
Methods: We presented RegEMR, an artificial intelligence tool composed of a rule-based natural language processing (NLP) extractor and a knowledge-based disease scoring model, to automatize the screening procedure of premature ovarian decline using Chinese reproductive EMRs. We used regular expressions (REs) as a text mining method and explored whether REs automatically synthesized by the genetic programming-based online platform RegexGenerator + + could be as effective as manually formulated REs. We also investigated how the representativeness of the learning corpus affected the performance of machine-generated REs. Additionally, we translated the clinical diagnostic criteria into a programmable disease diagnostic model for disease scoring and risk stratification. Four hundred outpatient medical records were collected from a Chinese fertility center. Manual review served as the gold standard, and fivefold cross-validation was used for evaluation.
Results: The overall F-score of manually built REs was 0.9444 (95% CI 0.9373 to 0.9515), with no significant difference (paired t test p > 0.05) compared with machine-generated REs that could be affected by training set sizes and annotation portions. The extractor performed effectively in automatically tracing the dynamic changes in hormone levels (F-score 0.9518-0.9884) and ultrasonographic measures (F-score 0.9472-0.9822). Applying the extracted information to the proposed diagnostic model, the program obtained an accuracy of 0.98 and a sensitivity of 0.93 in risk screening. For each specific disease, the automatic diagnosis in 76% of patients was consistent with that of the clinical diagnosis, and the kappa coefficient was 0.63.
Conclusion: A Chinese NLP system named RegEMR was developed to automatically identify high risk of early ovarian aging and diagnose related diseases from Chinese reproductive EMRs. We hope that this system can aid EMR-based data collection and clinical decision support in fertility centers.
Keywords: Diminished ovarian reserve; Electronic medical records; Natural language processing; Ovarian reserve; Premature ovarian failure; Premature ovarian insufficiency.
© 2023. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures








Similar articles
-
[A customized method for information extraction from unstructured text data in the electronic medical records].Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263. Beijing Da Xue Xue Bao Yi Xue Ban. 2018. PMID: 29643524 Chinese.
-
Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22. Ups J Med Sci. 2020. PMID: 32696698 Free PMC article.
-
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z. BMC Med Inform Decis Mak. 2022. PMID: 35321705 Free PMC article.
-
Clinical Decision Support and Natural Language Processing in Medicine: Systematic Literature Review.J Med Internet Res. 2024 Sep 30;26:e55315. doi: 10.2196/55315. J Med Internet Res. 2024. PMID: 39348889 Free PMC article.
-
Application of Natural Language Processing in Electronic Health Record Data Extraction for Navigating Prostate Cancer Care: A Narrative Review.J Endourol. 2024 Aug;38(8):852-864. doi: 10.1089/end.2023.0690. Epub 2024 May 13. J Endourol. 2024. PMID: 38613805 Review.
Cited by
-
Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8. Yearb Med Inform. 2024. PMID: 40199311 Free PMC article.
-
Clinical applications of large language models in medicine and surgery: A scoping review.J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4. J Int Med Res. 2025. PMID: 40615349 Free PMC article.
-
Construction and Application of a Traditional Chinese Medicine Syndrome Differentiation Model for Dysmenorrhea Based on Machine Learning.Comb Chem High Throughput Screen. 2025;28(4):664-674. doi: 10.2174/0113862073293191240212091028. Comb Chem High Throughput Screen. 2025. PMID: 38351686
-
The Role of Artificial Intelligence in Female Infertility Diagnosis: An Update.J Clin Med. 2025 Apr 30;14(9):3127. doi: 10.3390/jcm14093127. J Clin Med. 2025. PMID: 40364156 Free PMC article. Review.
References
-
- Sun H, Gong TT, Jiang YT, Zhang S, Zhao YH, Wu QJ. Global, regional, and national prevalence and disability-adjusted life-years for infertility in 195 countries and territories, 1990–2017: results from a global burden of disease study, 2017. Aging (Albany NY) 2019;11:10952–10991. doi: 10.18632/aging.102497. - DOI - PMC - PubMed
-
- Barratt CLR, Björndahl L, De Jonge CJ, Lamb DJ, Osorio Martini F, McLachlan R, Oates RD, van der Poel S, St John B, Sigman M, et al. The diagnosis of male infertility: an analysis of the evidence to support the development of global WHO guidance-challenges and future research opportunities. Hum Reprod Update. 2017;23:660–680. doi: 10.1093/humupd/dmx021. - DOI - PMC - PubMed
-
- Nguyen HH, Milat F, Vincent A. Premature ovarian insufficiency in general practice: Meeting the needs of women. Aust Fam Physician. 2017;46:360–366. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical