Pre-training phenotyping classifiers

Dmitriy Dligach¹, Majid Afshar², Timothy Miller³

Affiliations

¹ Loyola University Chicago, Department of Computer Science, Chicago, IL, United States. Electronic address: dd@cs.luc.edu.
² Department of Medicine, School of Medicine and Public Health, University of Wisconsin Madison, Madison, WI, United States. Electronic address: mafshar@medicine.wisc.edu.
³ Computational Health Informatics Program (CHIP), Boston Children's Hospital and Harvard Medical School, Boston, MA, United States. Electronic address: timothy.miller@childrens.harvard.edu.

PMID: 33259943
PMCID: PMC7856089
DOI: 10.1016/j.jbi.2020.103626

Pre-training phenotyping classifiers

Dmitriy Dligach et al. J Biomed Inform. 2021 Jan.

. 2021 Jan:113:103626.

doi: 10.1016/j.jbi.2020.103626. Epub 2020 Nov 28.

Authors

Dmitriy Dligach¹, Majid Afshar², Timothy Miller³

Affiliations

¹ Loyola University Chicago, Department of Computer Science, Chicago, IL, United States. Electronic address: dd@cs.luc.edu.
² Department of Medicine, School of Medicine and Public Health, University of Wisconsin Madison, Madison, WI, United States. Electronic address: mafshar@medicine.wisc.edu.
³ Computational Health Informatics Program (CHIP), Boston Children's Hospital and Harvard Medical School, Boston, MA, United States. Electronic address: timothy.miller@childrens.harvard.edu.

PMID: 33259943
PMCID: PMC7856089
DOI: 10.1016/j.jbi.2020.103626

Abstract

Recent transformer-based pre-trained language models have become a de facto standard for many text classification tasks. Nevertheless, their utility in the clinical domain, where classification is often performed at encounter or patient level, is still uncertain due to the limitation on the maximum length of input. In this work, we introduce a self-supervised method for pre-training that relies on a masked token objective and is free from the limitation on the maximum input length. We compare the proposed method with supervised pre-training that uses billing codes as a source of supervision. We evaluate the proposed method on one publicly-available and three in-house datasets using the standard evaluation metrics such as the area under the ROC curve and F1 score. We find that, surprisingly, even though self-supervised pre-training performs slightly worse than supervised, it still preserves most of the gains from pre-training.

Keywords: Automatic phenotyping; Natural language processing; Pre-training.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Figure 1:**
Masked concept unique identifier (CUI) model. A small number of CUIs from a document are masked and used as prediction targets to train a feed-forward fully-connected neural network. After pre-training, the features computed in the hidden layer can be used as a document representation for training a phenotyping classifier.

**Figure 2:**
Supervised pre-training. Concept unique identifiers (CUIs) from a document are used to train a feed-forward fully-connected neural network to predict ICD codes. After pre-training, the features computed in the hidden layer can be used as a document representation for training a phenotyping classifier.

**Figure 3:**
Pre-trained model as a feature extractor. After self-supervised pre-training completes, we save the model and use it to obtain representations of the target data. We then use these representations to train a phenotyping classifier.

**Figure 4:**
Data pre-processing. All notes associated with a hospital admission (encounter) for a patient are concatenated into a single document and pre-processed with cTAKES to extract UMLS concept unique identifiers (CUIs). For self-supervised pre-training, a small number of CUIs are masked and used as prediction targets. For supervised pre-training, all CUIs are used to predict the billing codes associated with an hospital admission (not shown here).

See this image and copyright information in PMC

References

1. Devlin J, Chang M-W, Lee K, Toutanova K, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
1. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V, Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
1. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV, Xlnet: Generalized autoregressive pretraining for language understanding, in: Advances in neural information processing systems, 2019, pp. 5754–5764.
1. Pathak J, Kho AN, Denny JC, Electronic health records-driven phenotyping: challenges, recent advances, and perspectives (2013). - PMC - PubMed
1. Azzam HC, Khalsa SS, Urbani R, Shah CV, Christie JD, Lanken PN, Fuchs BD, Validation study of an automated electronic acute lung injury screening tool, Journal of the American Medical Informatics Association 16 (4) (2009) 503–508. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pre-training phenotyping classifiers

Affiliations

Pre-training phenotyping classifiers

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources