Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries

Jochen Weile^{1

2

3

4}, Gabrielle Ferra⁵, Gabriel Boyle⁵, Sriram Pendyala⁵, Clara Amorosi⁵, Chiann-Ling Yeh⁵, Atina G Cote^{1

2

3

4}, Nishka Kishore^{1

2

3

4}, Daniel Tabet^{1

2

3

4}, Warren van Loggerenberg^{1

2

3

4

6}, Ashyad Rayhan^{1

2

3

4}, Douglas M Fowler^{5

7

8}, Maitreya J Dunham⁵, Frederick P Roth^{1

2

3

4

6}

Affiliations

¹ Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada.
² Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada.
³ Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.
⁴ Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada.
⁵ Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States.
⁶ Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, United States.
⁷ Department of Bioengineering, University of Washington, Seattle, WA 98195, United States.
⁸ Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, United States.

PMID: 38569896
PMCID: PMC11021806
DOI: 10.1093/bioinformatics/btae182

Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries

Jochen Weile et al. Bioinformatics. 2024.

. 2024 Mar 29;40(4):btae182.

doi: 10.1093/bioinformatics/btae182.

Authors

Affiliations

¹ Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada.
² Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada.
³ Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.
⁴ Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada.
⁵ Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States.
⁶ Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, United States.
⁷ Department of Bioengineering, University of Washington, Seattle, WA 98195, United States.
⁸ Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, United States.

PMID: 38569896
PMCID: PMC11021806
DOI: 10.1093/bioinformatics/btae182

Abstract

Motivation: Long-read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library.

Results: Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or nonunique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues.

Availability and implementation: Pacybara, freely available at https://github.com/rothlab/pacybara, is implemented using R, Python, and bash for Linux. It runs on GNU/Linux HPC clusters via Slurm, PBS, or GridEngine schedulers. A single-machine simplex version is also available.

PubMed Disclaimer

Conflict of interest statement

F.P.R. is a shareholder and advisor for SeqWell, Constantiam, BioSymetrics, and a shareholder of Ranomics.

Update of

Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries.
Weile J, Ferra G, Boyle G, Pendyala S, Amorosi C, Yeh CL, Cote AG, Kishore N, Tabet D, van Loggerenberg W, Rayhan A, Fowler DM, Dunham MJ, Roth FP. Weile J, et al. bioRxiv [Preprint]. 2023 Dec 7:2023.02.22.529427. doi: 10.1101/2023.02.22.529427. bioRxiv. 2023. Update in: Bioinformatics. 2024 Mar 29;40(4):btae182. doi: 10.1093/bioinformatics/btae182. PMID: 36865234 Free PMC article. Updated. Preprint.

References

1. Amorosi CJ, Chiasson MA, McDonald MG. et al. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. Am J Hum Genet 2021;108:1735–51. - PMC - PubMed
1. Boyle GE, Sitko K, Galloway JG. et al. Deep mutational scanning of CYP2C19 reveals a substrate specificity-abundance tradeoff. bioRxiv, 10.1101/2023.10.06.561250, 2023, preprint: not peer reviewed. - DOI - PMC - PubMed
1. Edgar RC. Muscle5: high-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat Commun 2022;13:6968. 10.1038/s41467-022-34630-w - DOI - PMC - PubMed
1. Hiatt JB, Patwardhan RP, Turner EH. et al. Parallel, tag-directed assembly of locally derived short sequence reads. Nat Methods 2010;7:119–22. 10.1038/nmeth.1416 - DOI - PMC - PubMed
1. Karst SM, Ziels RM, Kirkegaard RH. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with nanopore or PacBio sequencing. Nat Methods 2021;18:165–9. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries

Affiliations

Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries

Authors

Affiliations

Abstract

Conflict of interest statement

Update of

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources