Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 1;32(4):159-172.
doi: 10.1097/FPC.0000000000000466. Epub 2022 Feb 21.

Accuracy and applications of sequencing and genotyping approaches for CYP2A6 and homologous genes

Affiliations

Accuracy and applications of sequencing and genotyping approaches for CYP2A6 and homologous genes

Alec W R Langlois et al. Pharmacogenet Genomics. .

Abstract

Objectives: We evaluated multiple genotyping/sequencing approaches in a homologous region of chromosome 19, and investigated associations of two common 3'-UTR CYP2A6 variants with activity in vivo.

Methods: Individuals (n = 1704) of European and African ancestry were phenotyped for the nicotine metabolite ratio (NMR), an index of CYP2A6 activity, and genotyped/sequenced using deep amplicon exon sequencing, SNP array, genotype imputation and targeted capture sequencing. Amplicon exon sequencing was the gold standard to which other methods were compared within-individual for CYP2A6, CYP2A7, CYP2A13, and CYP2B6 exons to identify highly discordant positions. Linear regression models evaluated the association of CYP2A6*1B and rs8192733 genotypes (coded additively) with logNMR.

Results: All approaches were ≤2.6% discordant with the gold standard; discordant calls were concentrated at few positions. Fifteen positions were discordant in >10% of individuals, with 12 appearing in regions of high identity between homologous genes (e.g. CYP2A6 and CYP2A7). For six, allele frequencies in our study and online databases were discrepant, suggesting errors in online sources. In the European-ancestry group (n = 935), CYP2A6*1B and rs8192733 were associated with logNMR (P < 0.001). A combined model found main effects of both variants on increasing logNMR. Similar trends were found in those of African ancestry (n = 506).

Conclusion: Multiple genotyping/sequencing approaches used in this chromosome 19 region contain genotyping/sequencing errors, as do online databases. Gene-specific primers and SNP array probes must consider gene homology; short-read sequencing of related genes in a single reaction should be avoided. Using improved sequencing approaches, we characterized two gain-of-function 3'-UTR variants, including the relatively understudied rs8192733.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: RF Tyndale has consulted for Quinn Emanuel and Ethismos Research Inc; all other authors declared no conflict of interest.

Figures

Figure 1.
Figure 1.. CYP2A6*1B’s identity with CYP2A7 leads to spurious read alignments which can be resolved by masking CYP2A7.
a. Multiple sequence alignment showing the 3’-UTR of CYP2A6*1A (NG_008377.1: 11700–11758; GRCh37: 41349653–41349595), CYP2A6*1B, WT CYP2A7 (NG_007960.1: 12108–12165; GRCh37: 41381550–41381493), and a newly discovered allele of CYP2A6*1B (CYP2A6*1B (novel)). Vertical dashes between alignments indicate identity, while dots indicate non-identical sequence. b. Alignment of CYP2A6*1B reads with and without masking of CYP2A7. Without masking of CYP2A7, CYP2A6*1B FASTQ reads are interpreted by the read aligner as CYP2A7, and are incorrectly aligned to the 3’-UTR of CYP2A7 during .bam file generation. Masking of CYP2A7 forces alignment of CYP2A6*1B reads to CYP2A6, allowing for accurate genotype calling.”
Figure 2.
Figure 2.. Coverage of approaches A1–6 through CYP2A6, CYP2A7, CYP2B6, and CYP2A13.
The four genes analyzed in this study are indicated by name; arrows above the gene name indicate direction of transcription (genomic position according to GRCh37 is shown increasing from left-to-right). DNA sequence within genes is shown as a solid black line, while intergenic sequence is shown as a dotted line (not to scale). Exons are shown as black rectangles with exon number indicated above, while the 5’- and 3’-UTRs are shown as grey speckled rectangles attached to exons 1 9, respectively. Gene exons, introns (except CYP2B6 intron 1), and UTRs are displayed to scale; double diagonal bars indicate shortened sequence (not to scale). The 109 kb gap between CYP2A7 and CYP2B6 contains the pseudogenes CYP2B7 and CYP2G1P (not shown), while the 70 kb gap between CYP2B6 and CYP2A13 contains CYP2A7P1 and CYP2G2P (not shown). A1 coverage, indicated by black boxes with grey outlines, is limited to the exons in addition to partial coverage of the CYP2A6 3’-UTR (used for genotyping of CYP2A6*1B and rs8192733 in Experiment 2). A2-A5, indicated by a continuous box with a white/grey hatched pattern, covers a limited number of positions for the entire region. A6, indicated by a continuous black box with a grey outline, continuously covers ~300 kb which encompasses the entire region presented.
Figure 3.
Figure 3.. Discordant calls between A6 targeted capture sequencing and A1 amplicon exon sequencing in CYP2A6 are concentrated in specific exons.
The y-axis represents total discordant calls (i.e. the sum of all discordant calls within each exon across the group) within each exon for a. EUR (n=209), and b. AFR (n=166). Discordant calls in exons 2, 3, 5, and 9 make up ~90% of overall CYP2A6 discordant calls in EUR and AFR.
Figure 4.
Figure 4.. CYP2A6*5’s identity with CYP2A7 leads to spurious read alignments which can be resolved by masking CYP2A7.
a. Multiple sequence alignment showing exon 9 of CYP2A6*1 (NG_008377.1: 11574–11652; GRCh37: 41349779–41349701), CYP2A6*5, and WT CYP2A7 (NG_007960.1: 11982–12060; GRCh37: 41381676–41381598). Vertical dashes between alignments indicate identity, while dots indicate non-identical sequence. b. Alignment of CYP2A6*5 reads with and without masking of CYP2A7. Without masking of CYP2A7, CYP2A6*5 reads are interpreted by the read aligner as CYP2A7, and are incorrectly aligned to exon 9 of CYP2A7 during .bam file generation. Masking of CYP2A7 forces alignment of CYP2A6*5 reads to CYP2A6, allowing for accurate genotype calling.
Figure 5.
Figure 5.. CYP2A6*1B and rs8192733 genotypes are significantly associated with logNMR in European-ancestry individuals.
A. Plot showing mean NMR (horizontal black bars) and individual NMR values (points) within CYP2A6*1B diplotype groups in EUR (n=597). Individuals with known CYP2A6 star variants, structural variants, or other non-synonymous variants were excluded. A linear regression model (“*1B Model”) with CYP2A6*1B genotype, coded additively, and known NMR covariates (age, sex, BMI) included in the model found a significant association of CYP2A6*1B genotype with logNMR (p<0.001, r2=0.10). B. Plot showing mean NMR (horizontal black bars) and individual NMR values (points) within rs8192733 diplotype groups in EUR (n=597). Individuals with known CYP2A6 star variants, structural variants, or other non-synonymous variants were excluded. A linear regression model (“rs819 Model”) with rs8192733 genotype, coded additively, and known NMR covariates (age, sex, BMI) included in the model found a significant association of rs8192733 genotype with logNMR (p<0.001, r2=0.11). C. 3-dimensional bar graph of mean NMR by CYP2A6*1B and rs8192733 genotype in EUR (n=597). Columns with n<5 were not shown. D. Summary table of multiple linear regression of CYP2A6*1B and rs8192733 genotype on logNMR in EUR (n=597). Sex, age, and BMI were included as covariates; all were significantly associated with logNMR. Significant main effects of CYP2A6*1B (p=0.045) and rs8192733 (p=0.001) genotypes were found.
Figure 6.
Figure 6.. Heatmap of exonic identity between CYP2A6 and CYP2A7 with highly discordant positions indicated.
The number of non-identical nucleotides within a 40 bp window (+/−20 bp) was calculated for each position. White areas indicate 100% identity within the 40 bp window, while black areas indicate the maximum number of non-identical bases within a 40 bp window (in this analysis, 10 was the maximum); increasing grey intensity indicates greater non-identity. The 13 highly discordant positions in CYP2A6 or CYP2A7 were indicated at their equivalent exonic positions (the other two positions were in CYP2B6).

References

    1. Lung Cancer Fact Sheet | American Lung Association. 2019; Available from: https://www.lung.org/lung-health-and-diseases/lung-disease-lookup/lung-c....
    1. Tobacco. 2019; Available from: https://www.who.int/news-room/fact-sheets/detail/tobacco.
    1. Benowitz NL, Clinical pharmacology of nicotine: implications for understanding, preventing, and treating tobacco addiction. Clin Pharmacol Ther, 2008. 83(4): p. 531–41. - PubMed
    1. Nakajima M, et al., Role of human cytochrome P4502A6 in C-oxidation of nicotine. Drug Metab Dispos, 1996. 24(11): p. 1212–7. - PubMed
    1. Dempsey D, et al., Nicotine metabolite ratio as an index of cytochrome P450 2A6 metabolic activity. Clin Pharmacol Ther, 2004. 76(1): p. 64–72. - PubMed

Publication types

Substances