Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 2;16(11):e1008399.
doi: 10.1371/journal.pcbi.1008399. eCollection 2020 Nov.

Transfer learning enables prediction of CYP2D6 haplotype function

Affiliations

Transfer learning enables prediction of CYP2D6 haplotype function

Gregory McInnes et al. PLoS Comput Biol. .

Abstract

Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene whose protein product metabolizes more than 20% of clinically used drugs. Genetic variations in CYP2D6 are responsible for interindividual heterogeneity in drug response that can lead to drug toxicity and ineffective treatment, making CYP2D6 one of the most important pharmacogenes. Prediction of CYP2D6 phenotype relies on curation of literature-derived functional studies to assign a functional status to CYP2D6 haplotypes. As the number of large-scale sequencing efforts grows, new haplotypes continue to be discovered, and assignment of function is challenging to maintain. To address this challenge, we have trained a convolutional neural network to predict functional status of CYP2D6 haplotypes, called Hubble.2D6. Hubble.2D6 predicts haplotype function from sequence data and was trained using two pre-training steps with a combination of real and simulated data. We find that Hubble.2D6 predicts CYP2D6 haplotype functional status with 88% accuracy in a held-out test set and explains 47.5% of the variance in in vitro functional data among star alleles with unknown function. Hubble.2D6 may be a useful tool for assigning function to haplotypes with uncurated function, and used for screening individuals who are at risk of being poor metabolizers.

PubMed Disclaimer

Conflict of interest statement

The authors of this manuscript have the following competing interests: RBA is a stockholder in Personalis.com and 23andme.com.

Figures

Fig 1
Fig 1. Schematic overview of the Hubble.2D6 workflow.
(A) Sequences and functions for all existing star alleles in PharmVar were collected and divided into training and validation datasets. Star alleles with uncurated function were held from training. (B) Star allele sequences were annotated with functional annotations and one-hot encoded as preparation for input into the deep learning model. (C) One-hot encoded sequence and annotation data was read into a convolutional neural network that output scores for two classes: a score indicating a normal function allele, and a score indicating a no function allele. (D) The two score outputs from the model were transformed into one of the three functional classes using cutoffs that were set to optimize sensitivity and specificity in the training data.
Fig 2
Fig 2. Star allele classification results.
The figure depicts performance metrics for the prediction of star allele function in the training and validation sets; confusion matrices for class prediction in training and validation are shown in (a) and (b), for Hubble.2D6 and in (c) and (d) for the baseline model. (e) shows the frequency of predicted function for uncurated star alleles.
Fig 3
Fig 3. Prediction of star allele function with in vitro data.
The figures summarize the distribution of metabolic activity measured in vitro for star alleles whose function was predicted by Hubble. The distribution of functional activity is shown in (a) and (b) for star alleles with CPIC-assigned clinical function assignments. (a) star alleles included in the training process are depicted with a triangle, and those held for testing are depicted with a circle. Error bars depict the standard error of the measured function. The outer edge of each point indicates the true, curator-assigned phenotype, while the inner color represents predicted function. (b) distribution of values for each predicted functional class for data shown in (a). (c) star alleles without assigned function status; colors represent the predicted function. (d) variance in measured activity of the star alleles for each predicted label for data shown in (c).
Fig 4
Fig 4. Importance scores for core variants in each star allele used for training and test of Hubble.2D6.
Star alleles are along the y-axis and core variants (both amino acid changes and non-coding changes) are listed along the x-axis. Each dot represents the importance of the core variant to the final prediction as determined by DeepLIFT. The size of the dot represents the value of the importance score, with larger dots indicating variants with larger importance scores, typically associated with a negative impact on function. Star alleles are annotated with the curated function as well as the Hubble.2D6 predicted function. Star alleles are divided along the y-axis between star alleles that were included in the training data (top) and those used as test samples (bottom). Star alleles are sorted by the sum of the importance scores, with those with the largest sums at the bottom. Core variants are divided along the x-axis by those that are uniquely in either the training or test samples (right), and those that are shared between star alleles in train and test (left). Core variants are sorted by their mean importance score across all star alleles. Core variants are annotated with the deleteriousness prediction used in the functional variant representation with red indicating a variant predicted to be deleterious and blue indicating a variant predicted to be benign (described in Methods).
Fig 5
Fig 5. Evaluation of the contribution of deep learning model components.
The figure depicts the training and test classification for models trained under various constraints. Under “Component evaluation”, we test the contribution of transfer learning and the inclusion of annotations in the variant encoding by training new models to predict star allele function. Each model is identical in every way to the full Hubble.2D6 model except for the stated difference. We tested the effect of including annotations and transfer learning individually, together, and one model was built with neither component. Under “Annotation evaluation” we depict classification accuracy for models trained with a single added annotation. Each point represents the accuracy of a model trained to predict star allele function using transfer learning with a one-hot encoding of the nucleotide sequence, but only the specified annotation was included in the encoding of the variant. The full model contained all listed annotations together.

Similar articles

Cited by

References

    1. Zhou S-F. Polymorphism of human cytochrome P450 2D6 and its clinical significance: Part I. Clin Pharmacokinet. 2009;48:689–723. 10.2165/11318030-000000000-00000 - DOI - PubMed
    1. Zhou S-F. Polymorphism of human cytochrome P450 2D6 and its clinical significance: part II. Clin Pharmacokinet. 2009;48:761–804. 10.2165/11318070-000000000-00000 - DOI - PubMed
    1. Saravanakumar A, Sadighi A, Ryu R, Akhlaghi F. Physicochemical Properties, Biotransformation, and Transport Pathways of Established and Newly Approved Medications: A Systematic Review of the Top 200 Most Prescribed Drugs vs. the FDA-Approved Drugs Between 2005 and 2016. Clin Pharmacokinet. 2019. 10.1007/s40262-019-00750-8 - DOI - PMC - PubMed
    1. Gaedigk A. Complexities of CYP2D6 gene analysis and interpretation. Int Rev Psychiatry. 2013;25:534–553. 10.3109/09540261.2013.825581 - DOI - PubMed
    1. Nofziger C, Turner AJ, Sangkuhl K, Whirl-Carrillo M, Agúndez JAG, Black JL, et al. PharmVar GeneFocus: CYP2D6. Clin Pharmacol Ther. 2019. 10.1002/cpt.1643 - DOI - PMC - PubMed

Publication types