Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 1;40(10):btae591.
doi: 10.1093/bioinformatics/btae591.

Castanet: a pipeline for rapid analysis of targeted multi-pathogen genomic data

Affiliations

Castanet: a pipeline for rapid analysis of targeted multi-pathogen genomic data

Richard Mayne et al. Bioinformatics. .

Abstract

Motivation: Target enrichment strategies generate genomic data from multiple pathogens in a single process, greatly improving sensitivity over metagenomic sequencing and enabling cost-effective, high-throughput surveillance and clinical applications. However, uptake by research and clinical laboratories is constrained by an absence of computational tools that are specifically designed for the analysis of multi-pathogen enrichment sequence data. Here we present an analysis pipeline, Castanet, for use with multi-pathogen enrichment sequencing data. Castanet is designed to work with short-read data produced by existing targeted enrichment strategies, but can be readily deployed on any BAM file generated by another methodology. Also included are an optional graphical interface and installer script.

Results: In addition to genome reconstruction, Castanet reports method-specific metrics that enable quantification of capture efficiency, estimation of pathogen load, differentiation of low-level positives from contamination, and assessment of sequencing quality. Castanet can be used as a traditional end-to-end pipeline for consensus generation, but its strength lies in the ability to process a flexible, pre-defined set of pathogens of interest directly from multi-pathogen enrichment experiments. In our tests, Castanet consensus sequences were accurate reconstructions of reference sequences, including in instances where multiple strains of the same pathogen were present. Castanet performs effectively on standard computers and can process the entire output of a 96-sample enrichment sequencing run (50M reads) using a single batch process command, in $<$2 h.

Availability and implementation: Source code freely available under GPL-3 license at https://github.com/MultipathogenGenomics/castanet, implemented in Python 3.10 and supported in Ubuntu Linux 22.04. The data underlying this article are available in Europe Nucleotide Archives, at https://www.ebi.ac.uk/ena/browser/view/PRJEB77004.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Castanet process flow. (Left) Castanet pipeline. Blue boxes are optional stages. (Right) Consensus generator algorithm.
Figure 2.
Figure 2.
Castanet output for set A.1, at several dilutions. (a) Read depth graph showing total and deduplicated (unique) reads for HPeV, undiluted, viral load 9.6 × 104 copies per ml. (b) Read depth, HPeV, 1:500 dilution, 5.3 × 102 copies per ml. (c) Log10 deduplicated reads versus Log10 copies per ml for HPeV, HHV4, and HHV5 at four different dilutions (1, 1:10, 1:100, and 1:500), with linear regression line.
Figure 3.
Figure 3.
Appearance of presumptive low positives, false positives, and contamination in read depth plots from set A.2. (a) Reads for TTV, with high amplification rate across the region with reads (4.0), across a partial region of the genome. (b) Trypanosoma spp. reads at the ribosomal 18s locus, with no amplification and incomplete coverage. (c) HBV reads with incomplete coverage and significant amplification (MAR 5.82, SD 3.37).
Figure 4.
Figure 4.
Comparison of read depth between untargeted and targeted sequences, in a screened pooled human plasma sample (set A.2). (a) Whole human mitochondrial gene, median depth 16 (excluding COX-1 region). (b) Targeted sequence, mitochondrial COX-1, median depth 2109.

References

    1. Alborelli I, Generali D, Jermann P. et al. Cell-free DNA analysis in healthy individuals by next-generation sequencing: a proof of concept and technical validation study. Cell Death Dis 2019;10:534. - PMC - PubMed
    1. Ansari MA, Aranday-Cortes E, Ip CL. et al.; STOP-HCV Consortium. Interferon lambda 4 impacts the genetic diversity of hepatitis C virus. Elife 2019;8:e42463. - PMC - PubMed
    1. Bestvina CM, Waters D, Morrison L. et al. Cost of genetic testing, delayed care, and suboptimal treatment associated with polymerase chain reaction versus next-generation sequencing biomarker testing for genomic alterations in metastatic non-small cell lung cancer. J Med Econ 2024;27:292–303. 10.1080/13696998.2024.2314430. - DOI - PubMed
    1. Blanco-Míguez A, Beghini F, Cumbo F. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol 2023;41:1633–44. 10.1038/s41587-023-01688-w. - DOI - PMC - PubMed
    1. Bolger AM, Lohse M, Usadel B.. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 2014;30:2114–20. 10.1093/bioinformatics/btu170. - DOI - PMC - PubMed