Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr;21(2):251-261.
doi: 10.1038/s41397-020-00205-5. Epub 2021 Jan 18.

Cyrius: accurate CYP2D6 genotyping using whole-genome sequencing data

Affiliations

Cyrius: accurate CYP2D6 genotyping using whole-genome sequencing data

Xiao Chen et al. Pharmacogenomics J. 2021 Apr.

Erratum in

Abstract

Responsible for the metabolism of ~21% of clinically used drugs, CYP2D6 is a critical component of personalized medicine initiatives. Genotyping CYP2D6 is challenging due to sequence similarity with its pseudogene paralog CYP2D7 and a high number and variety of common structural variants (SVs). Here we describe a novel bioinformatics method, Cyrius, that accurately genotypes CYP2D6 using whole-genome sequencing (WGS) data. We show that Cyrius has superior performance (96.5% concordance with truth genotypes) compared to existing methods (84-86.8%). After implementing the improvements identified from the comparison against the truth data, Cyrius's accuracy has since been improved to 99.3%. Using Cyrius, we built a haplotype frequency database from 2504 ethnically diverse samples and estimate that SV-containing star alleles are more frequent than previously reported. Cyrius will be an important tool to incorporate pharmacogenomics in WGS-based precision medicine initiatives.

PubMed Disclaimer

Conflict of interest statement

XC, FS, NG, AM, CR, RJT, DRB, and MAE are employees of Illumina Inc.

Figures

Fig. 1
Fig. 1. WGS data quality in CYP2D6/CYP2D7 region.
Mean mapping quality (red line) averaged across 2504 1kGP samples plotted for each position in the CYP2D6/CYP2D7 region (GRCh38). A median filter is applied in a 200 bp window. The nine exons of CYP2D6/CYP2D7 are shown as orange (CYP2D6) and green (CYP2D7) boxes. Two 2.8 kb repeat regions downstream of CYP2D6 (REP6, chr22:42123192-42125972) and CYP2D7 (REP7, chr22:42135344-42138124) are near-identical and essentially unalignable. The purple dashed line box denotes the unique spacer region (chr22:42138124-42139676) between CYP2D7 and REP7. Two major homology regions within the genes are shaded in pink and highlight areas of low mapping accuracy.
Fig. 2
Fig. 2. Cyrius workflow, using NA12878 (*3/*68 + *4) as an example.
A CN(CYP2D6 + CYP2D7) is derived by counting and modeling all reads that align to either CYP2D6 or CYP2D7. The histogram shows the distribution of normalized CYP2D6 + CYP2D7 depth in 2504 1kGP samples, showing peaks at CN2, 3, 4, 5, 6, and 7. The red vertical line represents the value for NA12878, corresponding to CN5 that indicates an additional copy (could be CYP2D6 or hybrid). B SVs are called by examining the CNs of CYP2D6/CYP2D7 differentiating bases. Exons are denoted by yellow boxes. Blue dots denote raw CYP2D6 CNs, calculated as CN(CYP2D6 + CYP2D7) multiplied by the ratio of CYP2D6 supporting reads out of CYP2D6 and CYP2D7 supporting reads. The red diamond denotes the CN of genes that are CYP2D6-derived at the 3′ end (can be complete CYP2D6 or CYP2D7–CYP2D6 hybrid), calculated as CN(CYP2D6 + CYP2D7) minus CN(spacer). The CYP2D6 CN is called at each CYP2D6/CYP2D7 differentiating site and a change in CYP2D6 CN within the gene indicates the presence of a hybrid. In NA12878, the CYP2D6 CN changes from 2 to 3 between Exon 2 and Exon 1, indicating a CYP2D6-CYP2D7 hybrid (*68). C Supporting read counts of the star-allele defining protein-changing small variants are used to call the CN of each variant. The y axis shows the read counts for all queried small variant positions. Six variants are called in NA12878, one of which, g.100C>T, is called as two copies (one copy belongs to *4 and the other belongs to *68). Finally, star alleles are called based on detected SVs and small variants.
Fig. 3
Fig. 3. Depth patterns in samples with different types of SVs.
Depth plots as described in Fig. 2B. CYP2D6 CN is called at each CYP2D6/CYP2D7 differentiating site and a change in CYP2D6 CN within the gene indicates the presence of a hybrid. The depth profiles for different SV patterns are shown in NA19239 (no SV), HG02465 (deletion, *5), HG01624 (duplication), HG01161 (CYP2D7–CYP2D6 hybrid, *13), NA24631 (CYP2D6-CYP2D7 hybrid, *36), NA12878 (CYP2D6-CYP2D7 hybrid, *68), HG00290 (tandem arrangement *13 + *2), and NA19982 (two different SVs, *13 and *68, one on each haplotype). The hybrids in NA24631 and NA12878 are confirmed with PacBio reads in Fig. 4.
Fig. 4
Fig. 4. Structural variants validated by PacBio HiFi reads.
PacBio reads supporting CYP2D6-CYP2D7 hybrid *36 and *68, confirming SVs called in NA24631 and NA12878 (third row, Fig. 3). PacBio reads were realigned against modified sequence contigs representing the hybrids and plotted using sv-viz2 [42]. The black vertical lines mark the boundaries of the duplicated sequences, represented by the gray region. The red and blue regions represent flanking sequences.
Fig. 5
Fig. 5. CYP2D6 allele frequencies across five ethnic populations.
A Ten most common haplotypes with altered CYP2D6 function. Those with increased function are labeled in red, those with no function in black and those with decreased function in blue. B Comparison between 1kGP and PharmGKB frequencies. Each dot represents a haplotype with a frequency ≥ 0.5% in either 1kGP or PharmGKB. SV-related haplotypes are marked in red, including the two haplotypes with the largest deviation (*36 + *10 in East Asians and *68 + *4 in Europeans). Other haplotypes with deviated values are annotated in blue. A diagonal line is drawn for each panel. Correlation coefficients are listed for each population.

References

    1. Evans WE, Relling MV. Moving towards individualized medicine with pharmacogenomics. Nature. 2004;429:464–8. doi: 10.1038/nature02626. - DOI - PubMed
    1. Zhou S-F. Polymorphism of human cytochrome P450 2D6 and its clinical significance: Part I. Clin Pharmacokinet. 2009;48:689–723. doi: 10.2165/11318030-000000000-00000. - DOI - PubMed
    1. Gaedigk A, Ingelman-Sundberg M, Miller NA, Leeder JS, Whirl-Carrillo M, Klein TE, et al. The Pharmacogene Variation (PharmVar) Consortium: incorporation of the human cytochrome P450 (CYP) allele nomenclature database. Clin Pharmacol Ther. 2018;103:399–401. doi: 10.1002/cpt.910. - DOI - PMC - PubMed
    1. Nofziger C, Turner AJ, Sangkuhl K, Whirl-Carrillo M, Agúndez JAG, Black JL, et al. PharmVar GeneFocus: CYP2D6. Clin Pharmacol Ther. 2020;107:154–70. doi: 10.1002/cpt.1643. - DOI - PMC - PubMed
    1. Gaedigk A, Simon SD, Pearce RE, Bradford LD, Kennedy MJ, Leeder JS. The CYP2D6 activity score: translating genotype information into a qualitative measure of phenotype. Clin Pharmacol Ther. 2008;83:234–42. doi: 10.1038/sj.clpt.6100406. - DOI - PubMed

Substances