Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 27:10:20552076241286260.
doi: 10.1177/20552076241286260. eCollection 2024 Jan-Dec.

Construction, evaluation, and application of an electronic medical record corpus for cerebral palsy rehabilitation

Affiliations

Construction, evaluation, and application of an electronic medical record corpus for cerebral palsy rehabilitation

Meirong Xiao et al. Digit Health. .

Abstract

Objective: The electronic medical records (EMRs) corpus for cerebral palsy rehabilitation and its application in downstream tasks, such as named entity recognition (NER), requires further revision and testing to enhance its effectiveness and reliability.

Methods: We have devised an annotation principle and have developed an EMRs corpus for cerebral palsy rehabilitation. The introduction of test-retest reliability was employed for the first time to ensure consistency of each annotator. Additionally, we established a baseline NER model using the proposed EMRs corpus. The NER model leveraged Chinese clinical BERT and adversarial training as the embedding layer, and incorporated multi-head attention mechanism and rotary position embedding in the encoder layer. For multi-label decoding, we employed the span matrix of global pointer along with softmax and cross-entropy.

Results: The corpus consisted of 1405 EMRs, containing a total of 127,523 entities across six different entity types, with 24,424 unique entities after de-duplication. The inter-annotator agreement of two annotators was 97.57%, the intra-annotator agreement of each annotator exceeded 98%. Our proposed baseline NER model demonstrates impressive performance, achieving a F1-score of 93.59% for flat entities and 90.15% for nested entities in this corpus.

Conclusions: We believe that the proposed annotation principle, corpus, and baseline model are highly effective and hold great potential as tools for cerebral palsy rehabilitation scenarios.

Keywords: Electronic medical record; cerebral palsy; information extraction; medical entity corpus; named entity recognition.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Overall flow chart of corpus construction. (Step 1) Under the guidance of medical experts and by referencing existing annotation guidelines, an initial draft of entity annotation principles for CP EMRs was developed. After the annotators (A1 and A2) became familiar with the annotation principles, a portion of EMRs were chosen for training on the annotation platform. (Step 2) Three rounds of pre-annotation were then conducted. In each round of pre-annotation, two annotators independently labeled the same 10 × 5 EMRs. The annotation principles were refined based on evaluations of inter-annotator agreement (IAA (2,2)) and analyses of discrepancies between the annotators. (Step 3) The objective of the fourth round is to focus on intra-annotator agreement (IAA (2,1)) and the annotation of the remaining EMRs. First, 10 × 5 EMRs were selected to assess the final IAA (2,2). Then, [1 × 5 EMRs] were randomly selected to each annotator, two annotators repeated the labeling of the [1 × 5 EMRs] five times without the annotator's awareness. The IAA (2,1) was calculated across these five annotations to evaluate the consistency of each annotator's entities over time.
Figure 2.
Figure 2.
Flowchart for selection criteria of patients with cerebral palsy (CP).
Figure 3.
Figure 3.
Network structure of the NER model. The leftmost section of the figure is labeled with the names of the model modules, while on the right side, the corresponding details for an example are provided. Take the character string “双下肢肌张力下降” (Decreased muscle tone in the lower limbs.)” as input, the model outputs the result of entity recognition: [双下肢]bod([both lower limbs]bod), [肌张力] che([muscle tone]che), and [双下肢肌张力下降]sym ([Decreased muscle tone in the lower limbs]sym).
Figure 4.
Figure 4.
Label level inter-annotator agreement (IAA (2,2)) statistical analysis diagram of four rounds annotation of six types of entities. The label-level IAA (2,2) of all six types entities exhibited a rising trend, and the B-label, I-label, and E-label of each entity eventually tended to be high and consistent.
Figure 5.
Figure 5.
Intra-annotator agreement (IAA (2,1)) of entity-level. The horizontal and vertical axes of the confusion matrix depict the results of same [1 × 5 EMRs] labeled during the n-th repetition. Results of A1 are presented on the left and results of A2 are shown on the right.
Figure 6.
Figure 6.
Demographic information of 281 cerebral palsy patients in this corpus.
Figure 7.
Figure 7.
Performance of our NER model on six types of entities (P: precision; R: recall; F1: F1-score).
Figure 8.
Figure 8.
Performance of our NER model across five categories of EMRs (P: precision; R: recall; F1: F1-score).

References

    1. McIntyre S, Goldsmith S, Webb A, et al. Global prevalence of cerebral palsy: a systematic analysis. Dev Med Child Neurol 2022; 64: 1494–1506. - PMC - PubMed
    1. Shengyi Y, Jiayue X, Jing G, et al. Increasing prevalence of cerebral palsy among children and adolescents in China 1988–2020: a systematic review and meta-analysis. J Rehabil Med 2021; 53: jrm00195. - PMC - PubMed
    1. Vos T, Lim SS, Abbafati C, et al. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet 2020; 396: 1204–1222. - PMC - PubMed
    1. Demont A, Gedda M, Lager C, et al. Evidence-based, implementable motor rehabilitation guidelines for individuals with cerebral palsy. Neurology 2022; 99: 283–297. - PubMed
    1. Ji B, Liu R, Li S, et al. A hybrid approach for named entity recognition in Chinese electronic medical record. BMC Med Inform Decis Mak 2019; 19: 149–158. - PMC - PubMed

LinkOut - more resources