Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 23;17(2):e0263248.
doi: 10.1371/journal.pone.0263248. eCollection 2022.

Combining explainable machine learning, demographic and multi-omic data to inform precision medicine strategies for inflammatory bowel disease

Affiliations

Combining explainable machine learning, demographic and multi-omic data to inform precision medicine strategies for inflammatory bowel disease

Laura-Jayne Gardiner et al. PLoS One. .

Abstract

Inflammatory bowel diseases (IBDs), including ulcerative colitis and Crohn's disease, affect several million individuals worldwide. These diseases are heterogeneous at the clinical, immunological and genetic levels and result from complex host and environmental interactions. Investigating drug efficacy for IBD can improve our understanding of why treatment response can vary between patients. We propose an explainable machine learning (ML) approach that combines bioinformatics and domain insight, to integrate multi-modal data and predict inter-patient variation in drug response. Using explanation of our models, we interpret the ML models' predictions to infer unique combinations of important features associated with pharmacological responses obtained during preclinical testing of drug candidates in ex vivo patient-derived fresh tissues. Our inferred multi-modal features that are predictive of drug efficacy include multi-omic data (genomic and transcriptomic), demographic, medicinal and pharmacological data. Our aim is to understand variation in patient responses before a drug candidate moves forward to clinical trials. As a pharmacological measure of drug efficacy, we measured the reduction in the release of the inflammatory cytokine TNFα from the fresh IBD tissues in the presence/absence of test drugs. We initially explored the effects of a mitogen-activated protein kinase (MAPK) inhibitor; however, we later showed our approach can be applied to other targets, test drugs or mechanisms of interest. Our best model predicted TNFα levels from demographic, medicinal and genomic features with an error of only 4.98% on unseen patients. We incorporated transcriptomic data to validate insights from genomic features. Our results showed variations in drug effectiveness (measured by ex vivo assays) between patients that differed in gender, age or condition and linked new genetic polymorphisms to patient response variation to the anti-inflammatory treatment BIRB796 (Doramapimod). Our approach models IBD drug response while also identifying its most predictive features as part of a transparent ML precision medicine strategy.

PubMed Disclaimer

Conflict of interest statement

The authors APC, LJG, EPK, MM declare that they have no competing interests. REPROCELL Europe Ltd is a commercial provider of laboratory-based tests for preclinical research. GM, KB and DCB are all paid employees of REPROCELL Europe. These commercial affiliations do not alter adherence of the authors to the journal policies on sharing data and materials.

Figures

Fig 1
Fig 1. Schematic representation of the study.
Detailing (to the left) the steps undertaken to generate datasets, process these datasets, build and train ML models to make predictions and the interpretation of those predictions. Detailing (to the right) the different approaches, in order of usage where possible, which we used for dimensionality reduction of the medicinal, demographic, genomic and transcriptomic feature sets that were used to train our models and provide explanations for the predictions. Ultimately models were trained to predict the TNFα level or inflammatory response after compound treatment of fresh tissues.
Fig 2
Fig 2. Spearman correlation of demographic information with TNFα response per drug.
To compute the Spearman correlation for binary features we used the following encoding; condition Crohn’s/ulcerative colitis = 0/1, gender female/male = 0/1, resection area colon/ileum = 0/1. See section demographic feature preparation for ML in Methods for more details.
Fig 3
Fig 3. Comparison of ML model error rates for the prediction of BIRB796 (10nM) drug response for different combinations of demographic and medicinal features.
Here we show box plots (left) of mean absolute error (MAE) values (as percentages) computed during 10-fold cross validation. The horizontal line in each boxplot is the median of the MAE over 10 folds, where each of the test folds has 3 randomly chosen patients. Note all target drug responses have been normalized on a scale of 0–1 and here we show percentages of MAE values. On the right we report median, average and standard deviation MAE as percentages for each ML method. We computed the predictive error for demographic features age, gender, condition plus (a) all 53 medicinal features or (b) only those medicinal features correlated to BIRB796 at |rs| > 0.3.
Fig 4
Fig 4. Comparison of ML model error rates for the prediction of BIRB796 (10nM) drug response for different combinations of demographic, medicinal and genomic features.
Here we show box plots (left) of mean absolute error values (as percentages) computed during 10-fold cross validation. The horizontal line in each boxplot is the median of the MAE over 10 folds, where each of the test folds has 3 randomly chosen patients. Note that all target drug responses have been normalized on a scale of 0–1 and here we show percentages of MAE values. On the right, we report median, average and standard deviation MAE as percentages for each ML method. We computed the predictive error using demographic features age, gender and condition and correlated medicinal features plus (a) all SNPs (33,577) or (b) the 71 curated known and associated SNPs.
Fig 5
Fig 5. Comparison of ML model explanations for the prediction of BIRB796 (10nM) drug response for demographic, medicinal and genomic features.
Here we show SHAP plots that contain explanations for the predictions generated by our best model, KNN, using (a, c) our best “demographic+medicinal” feature set and (b, d) our best “demographic+medicinal+SNP” feature set. The SHAP bar plots (a, b) show the top 20 features ranked by their impact on the model prediction. The SHAP dot plot (c, d) shows the same top 20 ranked features together with the weight of each feature (row) for the prediction of TNFα level for each donor (a donor is a blue or red dot). The figure legend (top right) details the colour and corresponding value that each feature has for each donor (coloured dot). For example, donors that are older are red dots in the plot, while younger donors are blue dots. Similarly, donors that have a SNP allele are shown as red dots, while donors with the reference allele are represented as blue dots.
Fig 6
Fig 6. Comparison of ML model explanations for the prediction of response to three drugs (x2 doses) for demographic, medicinal and genomic features.
Here we show SHAP plots that contain explanations for the predictions as generated by our best ML models (as defined in the table to the right) using “demographic+medicinal+SNP” features as appropriate per drug for (a) BIRB796 100nM (b) Pred 1uM (c) Pred 100 nM (d) 5-ASA. Plots (a-d) show the top 20 ranked features together with the weight of each feature (row) for the prediction of TNFα level for each donor (a donor is a blue or red dot). Figure legend details the colour and corresponding value that each feature has for each donor (coloured dot). The table to the right shows, for each drug, the best ML model and the median, average and standard deviation mean absolute error values (MAEs) computed during 10-fold cross validation. Note all target drug responses have been normalized on a scale of 0–1 and here we show percentages of MAE values.

References

    1. Chen Y, Guzauskas GF, Gu C, Wang BC, Furnback WE, Xie G, et al. Precision Health Economics and Outcomes Research to Support Precision Medicine: Big Data Meets Patient Heterogeneity on the Road to Value. J Pers Med. 2016. Nov 2;6(4):20. - PMC - PubMed
    1. Agyeman AA, Ofori-Asenso R. Perspective: Does personalized medicine hold the future for medicine? J Pharm Bioallied Sci. 2015; 7: 239–44. doi: 10.4103/0975-7406.160040 - DOI - PMC - PubMed
    1. Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, et al. How to improve RD productivity: The pharmaceutical industry’s grand challenge. Nature Reviews Drug Discovery. 2010. pp. 203–214. doi: 10.1038/nrd3078 - DOI - PubMed
    1. Trusheim MR, Burgess B, Hu SX, Long T, Averbuch SD, Flynn AA, et al. Quantifying factors for the success of stratified medicine. Nat Rev Drug Discov. 2011;10: 817–33. doi: 10.1038/nrd3557 . - DOI - PubMed
    1. Cook D, Brown D, Alexander R, March R, Morgan P, Satterthwaite G, et al. Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework. Nat Rev Drug Discov. 2014;13: 419–31. doi: 10.1038/nrd4309 - DOI - PubMed

Publication types

MeSH terms