Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 26;25(1):130.
doi: 10.1186/s12859-024-05756-z.

COSAP: Comparative Sequencing Analysis Platform

Affiliations

COSAP: Comparative Sequencing Analysis Platform

Mehmet Arif Ergun et al. BMC Bioinformatics. .

Abstract

Background: Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies.

Results: Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure.

Conclusions: COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.

Keywords: Copy number variation; Microsatellite instability; NGS Analysis; Variant annotation; Variant classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Predefined pipeline steps have multiple algorithm/tool choices. The input and output of each step must be a list of files. The file names are for humans to understand, and steps know which file to read from a config file
Fig. 2
Fig. 2
COSAP Docker container as a celery worker which consumes pipeline messages from the backend application
Fig. 3
Fig. 3
Performance of parallelized versions of the tools on different disk speed settings in comparison with the baselines
Fig. 4
Fig. 4
Upset plot depicts the intersection between variant sets of several variant callers, and variant allele frequency distribution of each intersection
Fig. 5
Fig. 5
a Double venn diagram of chosen variant callers. b Triple venn diagram of chosen variant callers. c Jaccard similarities of each variant caller. d Precision and recall plot when ground truth set is available
Fig. 6
Fig. 6
Main page of the web application where users choose the analysis they want to perform and see their recent activity
Fig. 7
Fig. 7
Project creation interface to create projects with input files and desired algorithms
Fig. 8
Fig. 8
Interface to track status of projects and manage them
Fig. 9
Fig. 9
Results page where basic stats of the run is displayed alongside with the detailed variant descriptions and classifications
Fig. 10
Fig. 10
Variant filtering example

Similar articles

References

    1. Reuter JA, Spacek DV, Snyder MP. High-throughput sequencing technologies. Mol Cell. 2015;58(4):586–597. doi: 10.1016/j.molcel.2015.05.004. - DOI - PMC - PubMed
    1. Cortés-Ciriano I, Gulhan DC, Lee JJ, Melloni GE, Park PJ. Computational analysis of cancer genome sequencing data. Nat Rev Genet. 2022;23(5):298–314. doi: 10.1038/s41576-021-00431-y. - DOI - PubMed
    1. Anzar I, Sverchkova A, Stratford R, Clancy T. NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer. BMC Med Genomics. 2019;12:1–4. doi: 10.1186/s12920-019-0508-5. - DOI - PMC - PubMed
    1. Kisakol B, Sarihan Ş, Ergün MA, Baysan M. Detailed evaluation of cancer sequencing pipelines in different microenvironments and heterogeneity levels. Turk J Biol. 2021;45(2):114–126. doi: 10.3906/biy-2008-8. - DOI - PMC - PubMed
    1. Afgan E, Baker D, Batut B, Van Den Beek M, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018;46(W1):W537–W544. doi: 10.1093/nar/gky379. - DOI - PMC - PubMed

LinkOut - more resources