Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 18;23(1):126.
doi: 10.1186/s12911-023-02239-8.

RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records

Affiliations

RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records

Jie Cai et al. BMC Med Inform Decis Mak. .

Abstract

Background: The ovarian reserve is a reservoir for reproductive potential. In clinical practice, early detection and treatment of premature ovarian decline characterized by abnormal ovarian reserve tests is regarded as a critical measure to prevent infertility. However, the relevant data are typically stored in an unstructured format in a hospital's electronic medical record (EMR) system, and their retrieval requires tedious manual abstraction by domain experts. Computational tools are therefore needed to reduce the workload.

Methods: We presented RegEMR, an artificial intelligence tool composed of a rule-based natural language processing (NLP) extractor and a knowledge-based disease scoring model, to automatize the screening procedure of premature ovarian decline using Chinese reproductive EMRs. We used regular expressions (REs) as a text mining method and explored whether REs automatically synthesized by the genetic programming-based online platform RegexGenerator + + could be as effective as manually formulated REs. We also investigated how the representativeness of the learning corpus affected the performance of machine-generated REs. Additionally, we translated the clinical diagnostic criteria into a programmable disease diagnostic model for disease scoring and risk stratification. Four hundred outpatient medical records were collected from a Chinese fertility center. Manual review served as the gold standard, and fivefold cross-validation was used for evaluation.

Results: The overall F-score of manually built REs was 0.9444 (95% CI 0.9373 to 0.9515), with no significant difference (paired t test p > 0.05) compared with machine-generated REs that could be affected by training set sizes and annotation portions. The extractor performed effectively in automatically tracing the dynamic changes in hormone levels (F-score 0.9518-0.9884) and ultrasonographic measures (F-score 0.9472-0.9822). Applying the extracted information to the proposed diagnostic model, the program obtained an accuracy of 0.98 and a sensitivity of 0.93 in risk screening. For each specific disease, the automatic diagnosis in 76% of patients was consistent with that of the clinical diagnosis, and the kappa coefficient was 0.63.

Conclusion: A Chinese NLP system named RegEMR was developed to automatically identify high risk of early ovarian aging and diagnose related diseases from Chinese reproductive EMRs. We hope that this system can aid EMR-based data collection and clinical decision support in fertility centers.

Keywords: Diminished ovarian reserve; Electronic medical records; Natural language processing; Ovarian reserve; Premature ovarian failure; Premature ovarian insufficiency.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Framework of the study design. EMRs: electronic medical records. RE: regular expression
Fig. 2
Fig. 2
Example of extracting information from a Chinese reproductive clinical record. The color of the font, underlining and table shading are consistent; e.g., a keyword in blue font in the free text has the rule in blue underlining applied to it, yielding the structured table with blue shading
Fig. 3
Fig. 3
Flowchart of automated regular expression generation using the tree-based genetic programming in [30]
Fig. 4
Fig. 4
Flowchart for the automatic diagnosis and risk stratification model. The DOR, POI, and POF scores were calculated as in Table 2. We classified the machine diagnosis of “DOR”, “POI” and “POF” as the high-risk group, and “Healthy” as the low-risk group
Fig. 5
Fig. 5
The performance of each target concept using manually created versus machine-generated regular expressions (REs)
Fig. 6
Fig. 6
The performance of each target concept with different training set sizes
Fig. 7
Fig. 7
The performance of each target concept with different annotation portions
Fig. 8
Fig. 8
Comparison of machine diagnosis and human verification for 380 patients

Similar articles

Cited by

References

    1. Sun H, Gong TT, Jiang YT, Zhang S, Zhao YH, Wu QJ. Global, regional, and national prevalence and disability-adjusted life-years for infertility in 195 countries and territories, 1990–2017: results from a global burden of disease study, 2017. Aging (Albany NY) 2019;11:10952–10991. doi: 10.18632/aging.102497. - DOI - PMC - PubMed
    1. Gerrits T, Van Rooij F, Esho T, Ndegwa W, Goossens J, Bilajbegovic A, Jansen A, Kioko B, Koppen L, Kemunto Migiro S, et al. Infertility in the global south: raising awareness and generating insights for policy and practice. Facts Views Vis Obgyn. 2017;9:39–44. - PMC - PubMed
    1. Barratt CLR, Björndahl L, De Jonge CJ, Lamb DJ, Osorio Martini F, McLachlan R, Oates RD, van der Poel S, St John B, Sigman M, et al. The diagnosis of male infertility: an analysis of the evidence to support the development of global WHO guidance-challenges and future research opportunities. Hum Reprod Update. 2017;23:660–680. doi: 10.1093/humupd/dmx021. - DOI - PMC - PubMed
    1. Grisendi V, Mastellari E, La Marca A. Ovarian reserve markers to identify poor responders in the context of Poseidon classification. Front Endocrinol (Lausanne) 2019;10:281. doi: 10.3389/fendo.2019.00281. - DOI - PMC - PubMed
    1. Nguyen HH, Milat F, Vincent A. Premature ovarian insufficiency in general practice: Meeting the needs of women. Aust Fam Physician. 2017;46:360–366. - PubMed

Publication types