Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 1;12(1):2017.
doi: 10.1038/s41467-021-22328-4.

Ontology-driven weak supervision for clinical entity classification in electronic health records

Affiliations

Ontology-driven weak supervision for clinical entity classification in electronic health records

Jason A Fries et al. Nat Commun. .

Abstract

In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove's ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Trove pipeline for ontology-driven weak supervision for medical entity classification.
Users specify (I) a mapping of an ontology’s class categories to entity classes, (II) a set of label sources (e.g., ontologies, task-specific rules) for weak supervision, and (III) a collection of unlabeled document sentences with which to build a training set. Ontologies instantiate labeling function templates that are applied to sentences to generate a label matrix. This matrix is used to train the label model that learns source accuracies and corrects for label noise to predict a consensus probability per word. Consensus labels are transformed into the probabilistic sequence label dataset that is used as training data for an end model (e.g., BioBERT). Alternatively, the label model can also be used as the final classifier.
Fig. 2
Fig. 2. Ablation study of F1 performance by labeling source.
Majority vote (MV) vs. weakly supervised BioBERT (WS) vs. fully supervised (FS) for all labeling source ablations showing the absolute F1 score for all labeling tiers. The colored region of each bar indicates MV performance and the white regions denote performance improvements of WS over MV. The mean performance of FS is indicated by the green lines and square points. WS and FS consist of n = 10 experiment replicates using different random initialization seeds, presented as the mean with error bars ± SD. MV is deterministic and does not include replicates.
Fig. 3
Fig. 3. The relationship between the number of UMLS partitions and the learned accuracies of label sources.
a BC5CDR chemical entities. b BC5CDR disease entities. c ShARe/CLEF 2014 disorder entities. d i2b2/n2c2 2009 drug entities. The UMLS is partitioned into s terminologies (x axis, log-scale) ordered by term coverage on the unlabeled training set. Red (MV) and blue (LM) lines are the mean difference in F1 performance (y axis) of n = 5 random weight initializations. Error bars are represented using the solid colored line to denote the mean value of data points and the shaded regions corresponding to ± SD. The gray region indicates performance worse than the best possible MV, discovered via the validation set. Across virtually all partitioning choices, modeling source accuracies outperformed MV, with k = 1–10 performing best overall.
Fig. 4
Fig. 4. An example of combining ontology-based labeling functions.
Here four ontology labeling functions (MTH, CHV, LNC, SNOMEDCT) are used to label a sequence of words Xi containing the entity diabetes type 2. Majority vote estimates Yi as a word-level sum of positive class labels, weighing each equally (aMV). The label model learns a latent class-conditional accuracy (aLM) for each ontology, which is used to reweight labels to generate a more accurate consensus prediction of Yi.

Update of

References

    1. Ravì D, et al. Deep learning for health informatics. IEEE J. Biomed. Health Informat. 2017;21:4–21. doi: 10.1109/JBHI.2016.2636665. - DOI - PubMed
    1. Esteva A, et al. A guide to deep learning in healthcare. Nat. Med. 2019;25:24–29. doi: 10.1038/s41591-018-0316-z. - DOI - PubMed
    1. Wang, L. L. et al. CORD-19: The COVID-19 open research dataset. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020 (eds Karin Verspoor, Kevin Bretonnel Cohen, Mark Dredze, Emilio Ferrara, Jonathan May, Robert Munro, Cecile Paris & Byron Wallace) (Association for Computational Linguistics, Online, 2020) https://www.aclweb.org/anthology/2020.nlpcovid19-acl.1.
    1. Kuleshov V, et al. A machine-compiled database of genome-wide association studies. Nat. Commun. 2019;10:3341. doi: 10.1038/s41467-019-11026-x. - DOI - PMC - PubMed
    1. Fries JA, et al. Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences. Nat. Commun. 2019;10:3111. doi: 10.1038/s41467-019-11012-3. - DOI - PMC - PubMed

Publication types