Expert-augmented machine learning

Affiliations

¹ Department of Radiation Oncology, University of California, San Francisco, CA 94143; gennatas@stanford.edu.
² Department of Statistics, Stanford University, Stanford, CA 94305.
³ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104.
⁴ Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA 94143.
⁵ Data Institute, University of San Francisco, CA 94105.
⁶ Department of Radiation Oncology, University of Pennsylvania, Philadelphia, PA 19104.
⁷ Department of Radiation Oncology, New York Proton Center, New York, NY 10035.
⁸ Division of Hospital Medicine, University of California, San Francisco, CA 94143.
⁹ Innova Montreal, Inc., Montreal, QC J4W 2P2, Canada.
¹⁰ Division of Biostatistics, University of California, Berkeley, CA 94720.
¹¹ Department of Radiation Oncology, University of California, San Francisco, CA 94143.

PMID: 32071251
PMCID: PMC7060733
DOI: 10.1073/pnas.1906831117

Expert-augmented machine learning

Efstathios D Gennatas et al. Proc Natl Acad Sci U S A. 2020.

. 2020 Mar 3;117(9):4571-4577.

doi: 10.1073/pnas.1906831117. Epub 2020 Feb 18.

Authors

Affiliations

¹ Department of Radiation Oncology, University of California, San Francisco, CA 94143; gennatas@stanford.edu.
² Department of Statistics, Stanford University, Stanford, CA 94305.
³ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104.
⁴ Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA 94143.
⁵ Data Institute, University of San Francisco, CA 94105.
⁶ Department of Radiation Oncology, University of Pennsylvania, Philadelphia, PA 19104.
⁷ Department of Radiation Oncology, New York Proton Center, New York, NY 10035.
⁸ Division of Hospital Medicine, University of California, San Francisco, CA 94143.
⁹ Innova Montreal, Inc., Montreal, QC J4W 2P2, Canada.
¹⁰ Division of Biostatistics, University of California, Berkeley, CA 94720.
¹¹ Department of Radiation Oncology, University of California, San Francisco, CA 94143.

PMID: 32071251
PMCID: PMC7060733
DOI: 10.1073/pnas.1906831117

Abstract

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications.

Keywords: computational medicine; machine learning; medicine.

PubMed Disclaimer

Conflict of interest statement

Competing interest statement: The editor, P.J.B., and one of the authors, M.J.v.d.L., are at the same institution (University of California, Berkeley).

Figures

**Fig. 1.**
Overview of the methods. RuleFit involves 1) training a gradient boosting model on the input data, 2) converting boosted trees to rules by concatenating conditions from the root node to each leaf node, and 3) training an L1-regularized (LASSO) logistic regression model. Each rule defines a subpopulation that satisfies all conditions in the rule. Clinician experts assess the mortality risk of the subpopulation defined by each rule compared to the whole sample on a web application. For each rule, delta ranking is calculated as the difference between the subpopulation’s empirical risk as suggested by the data and the clinicians’ estimate. A final model is trained by reducing the influence of those rules with highest delta ranking. This forms an efficient procedure where experts are asked to assess 126 simple rules of 3 to 5 variables each instead of assessing 24,508 cases with 17 variables each.

**Fig. 2.**
Example of a rule presented to clinicians. Age, GCS (1, <6; 2, 6 to 8; 3, 9 to 10; 4, 11 to 13; 5, 14 to 15), ratio of oxygen blood concentration to fractional inspired oxygen concentration (PaO₂/FiO₂), and BUN concentration are the variables selected for this rule. The decision tree rules derived from gradient boosting, e.g., age ≤ 73.65 and GCS ≤ 4, were converted to the form “median (range)”, e.g., age, 56.17 (16.01 to 73.65), for continuous variables and to the form “mode (included levels)” for categorical variables. Rules were presented in a randomized order, one at a time. The top line (blue box) displays the values for the subpopulation defined by the given rule. The bottom line (gray box) displays the values of the whole population. Participants were asked to assess the risk of belonging to the defined subpopulation compared to the whole sample using a five-point system: highly decrease, moderately decrease, no effect, moderately increase, and highly increase.

**Fig. 3.**
Mortality ratio by average clinicians’ risk ranking. Rules were binned into quintiles based on average clinicians’ assessment. The mean empirical risk for each quintile was plotted. Error bars indicate 1.96 * SE.

**Fig. 4.**
Variable importance estimated using a Random Forest model predicting mortality (A), and clinicians’ assessments (B). While PaO₂/FiO₂ is the most important variable in both cases, in the former case it is used to learn intubation status, while in the latter clinicians are responding based on its physiological influence on mortality.

**Fig. 5.**
Example of variable shift: heart rate distribution of the same set of patients from MIMIC-II and MIMIC-III1 (A). Models were trained on MIMIC-II data using different subsets of rules defined by the extent of clinicians’ agreement with the empirical risk (delta ranking cut in five bins, ΔR) (B). Mean AUC of models trained on MIMIC-II and tested on MIMIC-III (C) and models trained and tested on MIMIC-III (D). Subsamples of different sizes were used for each subset of rules defined by ΔR to test the hypothesis that eliminating bad rules helps the algorithm train with less data. Error bars represent 1 SD across 10 stratified subsamples.

See this image and copyright information in PMC

References

1. Lenat D. B., Prakash M., Shepherd M., CYC: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. AI Magazine 6, 65 (1985).
1. Steyerberg E. W., et al. ; PROGRESS Group , Prognosis Research Strategy (PROGRESS) 3: Prognostic model research. PLoS Med. 10, e1001381 (2013). - PMC - PubMed
1. Hingorani A. D., et al. ; PROGRESS Group , Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ 346, e5793 (2013). - PMC - PubMed
1. Cooper G. F., et al. , Predicting dire outcomes of patients with community acquired pneumonia. J. Biomed. Inform. 38, 347–366 (2005). - PubMed
1. Mullainathan S., Obermeyer Z., Does machine learning automate moral hazard and error? Am. Econ. Rev. 107, 476–480 (2017). - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Expert-augmented machine learning

Affiliations

Expert-augmented machine learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources