. 2020 Nov 2;16(11):e1008399.

doi: 10.1371/journal.pcbi.1008399. eCollection 2020 Nov.

Transfer learning enables prediction of CYP2D6 haplotype function

Gregory McInnes¹, Rachel Dalton^{2

3}, Katrin Sangkuhl⁴, Michelle Whirl-Carrillo⁴, Seung-Been Lee⁵, Philip S Tsao^{6

7}, Andrea Gaedigk^{8

9}, Russ B Altman^{4

10}, Erica L Woodahl²

Affiliations

¹ Biomedical Informatics Training Program, Stanford University, Stanford, California, United States of America.
² Department of Biomedical and Pharmaceutical Sciences, University of Montana, Missoula, Montana, United States of America.
³ Department of Biomedical and Translational Research, University of Florida, Gainesville, Florida, United States of America.
⁴ Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America.
⁵ Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America.
⁶ VA Palo Alto Epidemiology Research and Information Center for Genomics, VAPAHCS, Palo Alto, California, United States of America.
⁷ Department of Medicine, Stanford University School of Medicine, Stanford, California, United States of America.
⁸ Division of Clinical Pharmacology, Toxicology, and Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri, United States of America.
⁹ School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America.
¹⁰ Departments of Bioengineering, Genetics, and Medicine, Stanford University, Stanford, California, United States of America.

PMID: 33137098
PMCID: PMC7660895
DOI: 10.1371/journal.pcbi.1008399

Transfer learning enables prediction of CYP2D6 haplotype function

Gregory McInnes et al. PLoS Comput Biol. 2020.

. 2020 Nov 2;16(11):e1008399.

doi: 10.1371/journal.pcbi.1008399. eCollection 2020 Nov.

Authors

Gregory McInnes¹, Rachel Dalton^{2

3}, Katrin Sangkuhl⁴, Michelle Whirl-Carrillo⁴, Seung-Been Lee⁵, Philip S Tsao^{6

7}, Andrea Gaedigk^{8

9}, Russ B Altman^{4

10}, Erica L Woodahl²

Affiliations

¹ Biomedical Informatics Training Program, Stanford University, Stanford, California, United States of America.
² Department of Biomedical and Pharmaceutical Sciences, University of Montana, Missoula, Montana, United States of America.
³ Department of Biomedical and Translational Research, University of Florida, Gainesville, Florida, United States of America.
⁴ Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America.
⁵ Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America.
⁶ VA Palo Alto Epidemiology Research and Information Center for Genomics, VAPAHCS, Palo Alto, California, United States of America.
⁷ Department of Medicine, Stanford University School of Medicine, Stanford, California, United States of America.
⁸ Division of Clinical Pharmacology, Toxicology, and Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri, United States of America.
⁹ School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America.
¹⁰ Departments of Bioengineering, Genetics, and Medicine, Stanford University, Stanford, California, United States of America.

PMID: 33137098
PMCID: PMC7660895
DOI: 10.1371/journal.pcbi.1008399

Abstract

Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene whose protein product metabolizes more than 20% of clinically used drugs. Genetic variations in CYP2D6 are responsible for interindividual heterogeneity in drug response that can lead to drug toxicity and ineffective treatment, making CYP2D6 one of the most important pharmacogenes. Prediction of CYP2D6 phenotype relies on curation of literature-derived functional studies to assign a functional status to CYP2D6 haplotypes. As the number of large-scale sequencing efforts grows, new haplotypes continue to be discovered, and assignment of function is challenging to maintain. To address this challenge, we have trained a convolutional neural network to predict functional status of CYP2D6 haplotypes, called Hubble.2D6. Hubble.2D6 predicts haplotype function from sequence data and was trained using two pre-training steps with a combination of real and simulated data. We find that Hubble.2D6 predicts CYP2D6 haplotype functional status with 88% accuracy in a held-out test set and explains 47.5% of the variance in in vitro functional data among star alleles with unknown function. Hubble.2D6 may be a useful tool for assigning function to haplotypes with uncurated function, and used for screening individuals who are at risk of being poor metabolizers.

PubMed Disclaimer

Conflict of interest statement

The authors of this manuscript have the following competing interests: RBA is a stockholder in Personalis.com and 23andme.com.

Figures

**Fig 1. Schematic overview of the Hubble.2D6 workflow.**
(A) Sequences and functions for all existing star alleles in PharmVar were collected and divided into training and validation datasets. Star alleles with uncurated function were held from training. (B) Star allele sequences were annotated with functional annotations and one-hot encoded as preparation for input into the deep learning model. (C) One-hot encoded sequence and annotation data was read into a convolutional neural network that output scores for two classes: a score indicating a normal function allele, and a score indicating a no function allele. (D) The two score outputs from the model were transformed into one of the three functional classes using cutoffs that were set to optimize sensitivity and specificity in the training data.

**Fig 2. Star allele classification results.**
The figure depicts performance metrics for the prediction of star allele function in the training and validation sets; confusion matrices for class prediction in training and validation are shown in (a) and (b), for Hubble.2D6 and in (c) and (d) for the baseline model. (e) shows the frequency of predicted function for uncurated star alleles.

**Fig 3. Prediction of star allele function with *in vitro* data.**
The figures summarize the distribution of metabolic activity measured *in vitro* for star alleles whose function was predicted by Hubble. The distribution of functional activity is shown in (a) and (b) for star alleles with CPIC-assigned clinical function assignments. (a) star alleles included in the training process are depicted with a triangle, and those held for testing are depicted with a circle. Error bars depict the standard error of the measured function. The outer edge of each point indicates the true, curator-assigned phenotype, while the inner color represents predicted function. (b) distribution of values for each predicted functional class for data shown in (a). (c) star alleles without assigned function status; colors represent the predicted function. (d) variance in measured activity of the star alleles for each predicted label for data shown in (c).

**Fig 4. Importance scores for core variants in each star allele used for training and test of Hubble.2D6.**
Star alleles are along the y-axis and core variants (both amino acid changes and non-coding changes) are listed along the x-axis. Each dot represents the importance of the core variant to the final prediction as determined by DeepLIFT. The size of the dot represents the value of the importance score, with larger dots indicating variants with larger importance scores, typically associated with a negative impact on function. Star alleles are annotated with the curated function as well as the Hubble.2D6 predicted function. Star alleles are divided along the y-axis between star alleles that were included in the training data (top) and those used as test samples (bottom). Star alleles are sorted by the sum of the importance scores, with those with the largest sums at the bottom. Core variants are divided along the x-axis by those that are uniquely in either the training or test samples (right), and those that are shared between star alleles in train and test (left). Core variants are sorted by their mean importance score across all star alleles. Core variants are annotated with the deleteriousness prediction used in the functional variant representation with red indicating a variant predicted to be deleterious and blue indicating a variant predicted to be benign (described in Methods).

**Fig 5. Evaluation of the contribution of deep learning model components.**
The figure depicts the training and test classification for models trained under various constraints. Under “Component evaluation”, we test the contribution of transfer learning and the inclusion of annotations in the variant encoding by training new models to predict star allele function. Each model is identical in every way to the full Hubble.2D6 model except for the stated difference. We tested the effect of including annotations and transfer learning individually, together, and one model was built with neither component. Under “Annotation evaluation” we depict classification accuracy for models trained with a single added annotation. Each point represents the accuracy of a model trained to predict star allele function using transfer learning with a one-hot encoding of the nucleotide sequence, but only the specified annotation was included in the encoding of the variant. The full model contained all listed annotations together.

See this image and copyright information in PMC

Cited by

Targeted haplotyping in pharmacogenomics using Oxford Nanopore Technologies' adaptive sampling.
Deserranno K, Tilleman L, Rubben K, Deforce D, Van Nieuwerburgh F. Deserranno K, et al. Front Pharmacol. 2023 Nov 13;14:1286764. doi: 10.3389/fphar.2023.1286764. eCollection 2023. Front Pharmacol. 2023. PMID: 38026945 Free PMC article.
Review on Databases and Bioinformatic Approaches on Pharmacogenomics of Adverse Drug Reactions.
Tong H, Phan NVT, Nguyen TT, Nguyen DV, Vo NS, Le L. Tong H, et al. Pharmgenomics Pers Med. 2021 Jan 13;14:61-75. doi: 10.2147/PGPM.S290781. eCollection 2021. Pharmgenomics Pers Med. 2021. PMID: 33469342 Free PMC article. Review.
Comprehensive Allele Genotyping in Critical Pharmacogenes Reduces Residual Clinical Risk in Diverse Populations.
Luo S, Jiang R, Grzymski JJ, Lee W, Lu JT, Washington NL. Luo S, et al. Clin Pharmacol Ther. 2021 Sep;110(3):759-767. doi: 10.1002/cpt.2279. Epub 2021 Jun 7. Clin Pharmacol Ther. 2021. PMID: 33930192 Free PMC article.
Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence.
Chen L, Wang Y, Zhao F. Chen L, et al. Bioinformatics. 2022 Jun 13;38(12):3164-3172. doi: 10.1093/bioinformatics/btac214. Bioinformatics. 2022. PMID: 35389435 Free PMC article.
From gene to dose: Long-read sequencing and *-allele tools to refine phenotype predictions of CYP2C19.
Graansma LJ, Zhai Q, Busscher L, Menafra R, van den Berg RR, Kloet SL, van der Lee M. Graansma LJ, et al. Front Pharmacol. 2023 Mar 1;14:1076574. doi: 10.3389/fphar.2023.1076574. eCollection 2023. Front Pharmacol. 2023. PMID: 36937863 Free PMC article.

See all "Cited by" articles

References

1. Zhou S-F. Polymorphism of human cytochrome P450 2D6 and its clinical significance: Part I. Clin Pharmacokinet. 2009;48:689–723. 10.2165/11318030-000000000-00000 - DOI - PubMed
1. Zhou S-F. Polymorphism of human cytochrome P450 2D6 and its clinical significance: part II. Clin Pharmacokinet. 2009;48:761–804. 10.2165/11318070-000000000-00000 - DOI - PubMed
1. Saravanakumar A, Sadighi A, Ryu R, Akhlaghi F. Physicochemical Properties, Biotransformation, and Transport Pathways of Established and Newly Approved Medications: A Systematic Review of the Top 200 Most Prescribed Drugs vs. the FDA-Approved Drugs Between 2005 and 2016. Clin Pharmacokinet. 2019. 10.1007/s40262-019-00750-8 - DOI - PMC - PubMed
1. Gaedigk A. Complexities of CYP2D6 gene analysis and interpretation. Int Rev Psychiatry. 2013;25:534–553. 10.3109/09540261.2013.825581 - DOI - PubMed
1. Nofziger C, Turner AJ, Sangkuhl K, Whirl-Carrillo M, Agúndez JAG, Black JL, et al. PharmVar GeneFocus: CYP2D6. Clin Pharmacol Ther. 2019. 10.1002/cpt.1643 - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

T32 LM012409/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Transfer learning enables prediction of CYP2D6 haplotype function

Affiliations

Transfer learning enables prediction of CYP2D6 haplotype function

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials