Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Dec 7:2023.02.22.529427.
doi: 10.1101/2023.02.22.529427.

Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries

Affiliations

Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries

Jochen Weile et al. bioRxiv. .

Update in

Abstract

Long read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library. Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or non-unique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues.

PubMed Disclaimer

References

    1. Amorosi Clara J., Chiasson Melissa A., McDonald Matthew G., Wong Lai Hong, Sitko Katherine A., Boyle Gabriel, Kowalski John P., Rettie Allan E., Fowler Douglas M., and Dunham Maitreya J.. 2021. “Massively Parallel Characterization of CYP2C9 Variant Enzyme Activity and Abundance.” American Journal of Human Genetics 108 (9): 1735–51. - PMC - PubMed
    1. Boyle Gabriel E., Sitko Katherine, Galloway Jared G., Haddox Hugh K., Bianchi Aisha Haley, Dixon Ajeya, Thomson Raine E. S., et al. 2023. “Deep Mutational Scanning of CYP2C19 Reveals a Substrate Specificity-Abundance Tradeoff.” bioRxiv. 10.1101/2023.10.06.561250. - DOI - PMC - PubMed
    1. Hiatt Joseph B., Patwardhan Rupali P., Turner Emily H., Lee Choli, and Shendure Jay. 2010. “Parallel, Tag-Directed Assembly of Locally Derived Short Sequence Reads.” Nature Methods. 10.1038/nmeth.1416. - DOI - PMC - PubMed
    1. Karst Søren M., Ziels Ryan M., Kirkegaard Rasmus H., Sørensen Emil A., McDonald Daniel, Zhu Qiyun, Knight Rob, and Albertsen Mads. 2021. “High-Accuracy Long-Read Amplicon Sequences Using Unique Molecular Identifiers with Nanopore or PacBio Sequencing.” Nature Methods 18 (2): 165–69. - PubMed
    1. Matreyek Kenneth A., Starita Lea M., Stephany Jason J., Martin Beth, Chiasson Melissa A., Gray Vanessa E., Kircher Martin, et al. 2018. “Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing.” Nature Genetics 50 (6): 874–82. - PMC - PubMed

Publication types