Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;33(1):61-70.
doi: 10.1101/gr.277075.122. Epub 2023 Jan 19.

An efficient genotyper and star-allele caller for pharmacogenomics

Affiliations

An efficient genotyper and star-allele caller for pharmacogenomics

Ananth Hari et al. Genome Res. 2023 Jan.

Abstract

High-throughput sequencing provides sufficient means for determining genotypes of clinically important pharmacogenes that can be used to tailor medical decisions to individual patients. However, pharmacogene genotyping, also known as star-allele calling, is a challenging problem that requires accurate copy number calling, structural variation identification, variant calling, and phasing within each pharmacogene copy present in the sample. Here we introduce Aldy 4, a fast and efficient tool for genotyping pharmacogenes that uses combinatorial optimization for accurate star-allele calling across different sequencing technologies. Aldy 4 adds support for long reads and uses a novel phasing model and improved copy number and variant calling models. We compare Aldy 4 against the current state-of-the-art star-allele callers on a large and diverse set of samples and genes sequenced by various sequencing technologies, such as whole-genome and targeted Illumina sequencing, barcoded 10x Genomics, and Pacific Biosciences (PacBio) HiFi. We show that Aldy 4 is the most accurate star-allele caller with near-perfect accuracy in all evaluated contexts, and hope that Aldy remains an invaluable tool in the clinical toolbox even with the advent of long-read sequencing technologies.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An example of an incorrect long read alignment to the reference genome and its correction. If a donor genome (above) contains two copies of CYP2D6 pharmacogene, any long read (gray rectangle) that spans both copies will get aligned to the reference genome (below) that contains only a single CYP2D6 copy. However, this read will get its second half (containing CYP2D6 sequence) incorrectly aligned to the CYP2D7 pseudogene owing to the high sequence similarity between these genes. The final result is the overabundance of coverage in the pseudogene region compared with the CYP2D6 region (an Integrative Genomics Viewer [IGV; Robinson et al. 2011] coverage plot is shown above the reference genome).
Figure 2.
Figure 2.
A sample decomposition of aggregate coverage into individual structural configurations. (A) An example database of CYP2D6 structural configurations containing three such configurations (vCYP2D6, vCYP2D7, and vCYP2D6*13). Regions on top of the configurations that were defined (i.e., e1, i1, etc.) are shaded with lighter color. In this example, vCYP2D6 corresponds to the g1. (B) Sample decomposition of the aggregate coverage vector cn, observed after aligning the reads originating from the donor genome (above) to the reference genome (below). As can be seen, cn can be expressed as the sum of four structural configuration vectors from the database.

Similar articles

Cited by

References

    1. Berger E, Yorukoglu D, Zhang L, Nyquist SK, Shalek AK, Kellis M, Numanagić I, Berger B. 2020. Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets. Nat Commun 11: 4662. 10.1038/s41467-020-18320-z - DOI - PMC - PubMed
    1. Browning BL, Tian X, Zhou Y, Browning SR. 2021. Fast two-stage phasing of large-scale sequence data. Am J Hum Genet 108: 1880–1890. 10.1016/j.ajhg.2021.08.005 - DOI - PMC - PubMed
    1. Caspar SM, Schneider T, Meienberg J, Matyas G. 2020. Added value of clinical sequencing: WGS-based profiling of pharmacogenes. Int J Mol Sci 21: 2308. 10.3390/ijms21072308 - DOI - PMC - PubMed
    1. Chen X, Shen F, Gonzaludo N, Malhotra A, Rogert C, Taft RJ, Bentley DR, Eberle MA. 2021. Cyrius: accurate CYP2D6 genotyping using whole-genome sequencing data. Pharmacogenomics J 21: 251–261. 10.1038/s41397-020-00205-5 - DOI - PMC - PubMed
    1. Crews KR, Gaedigk A, Dunnenberger HM, Steve Leeder J, Klein TE, Caudle KE, Haidar CE, Shen DD, Callaghan JT, Sadhasivam S, et al. 2014. Clinical pharmacogenetics implementation consortium guidelines for cytochrome P450 2D6 genotype and codeine therapy: 2014 update. Clin Pharmacol Ther 95: 376–382. 10.1038/clpt.2013.254 - DOI - PMC - PubMed

Publication types

LinkOut - more resources