Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 25;4(3):lqac053.
doi: 10.1093/nargab/lqac053. eCollection 2022 Sep.

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data

Affiliations

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data

Anjana Anilkumar Sithara et al. NAR Genom Bioinform. .

Abstract

Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, we have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input and outputs insightful statistics. Our iCOMIC toolkit pipeline featuring many independent workflows is embedded in the popular Snakemake workflow management system. It can analyze whole-genome and transcriptome data and is characterized by a user-friendly GUI that offers several advantages, including minimal execution steps and eliminating the need for complex command-line arguments. Notably, we have integrated algorithms developed in-house to predict pathogenicity among cancer-causing mutations and differentiate between tumor suppressor genes and oncogenes from somatic mutation data. We benchmarked our tool against Genome In A Bottle benchmark dataset (NA12878) and got the highest F1 score of 0.971 and 0.988 for indels and SNPs, respectively, using the BWA MEM-GATK HC DNA-Seq pipeline. Similarly, we achieved a correlation coefficient of r = 0.85 using the HISAT2-StringTie-ballgown and STAR-StringTie-ballgown RNA-Seq pipelines on the human monocyte dataset (SRP082682). Overall, our tool enables easy analyses of omics datasets, significantly ameliorating complex data analysis pipelines.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schema for iCOMIC pipeline. Multiple workflows are embedded in iCOMIC providing users with the complete freedom to choose from the integrated tools. Both DNA-Seq and RNA-Seq pipelines take in raw FASTQ files as input. Quality control and alignment are common steps in both pipelines. FastQC and Cutadapt are the Quality control tools used and MultiQC is used to generate a consolidated report on Quality statistics. Analysis of RNA-Seq data includes mapping of sequencing reads to a reference genome using Aligner, Quantification of expression levels using Expression modeller and Differential expression analysis. On the other hand, steps in DNA-Seq analysis include Alignment followed by identifying the variants and annotating them. Tools incorporated in iCOMIC are listed in Table 1.
Figure 2.
Figure 2.
Schematic diagram of DNA-Seq pipeline. The input, followed by the application of various quality control techniques, alignment to the reference genome, variant calling, filtering and annotation are indicated in this figure.
Figure 3.
Figure 3.
Schematic diagram of RNA-Seq pipeline. The input, followed by the application of various quality control techniques, alignment to the reference genome, counting the mapped reads, normalization, and differential expression analysis, ultimately generating the TXT/PDF output is detailed in this figure.
Figure 4.
Figure 4.
Snakemake workflow management system. All the input and output files in blue colour are those corresponding to DNA-Seq analysis and those in green correspond to RNA-seq analysis. The common files for DNA and RNA-Seq analysis are represented in red. ‘Rule’ files specifying the input, output and the shell/wrapper script form the basic units of Snakemake. Each rule corresponds to individual tools. The additional parameters for the tools are indicated in the ‘config’ file. According to the choice of tools made by the user, rules are integrated into the Snakefile and the workflow is executed.
Figure 5.
Figure 5.
Fold change correlation between iCOMIC and reference dataset for the four workflows. The Pearson correlation coefficient was used to calculate fold changes.
Figure 6.
Figure 6.
Fold change correlation between Galaxy and reference dataset for STAR-HTSeq-DESeq2 workflow. The Pearson correlation coefficient was used to calculate fold changes.

Similar articles

Cited by

References

    1. Qin D. Next-generation sequencing and its clinical application. Cancer Biol. Med. 2019; 16:4–10. - PMC - PubMed
    1. Kukurba K.R., Montgomery S.B.. RNA sequencing and analysis. Cold Spring Harb. Protoc. 2015; 2015:951–969. - PMC - PubMed
    1. Alioto T.S., Buchhalter I., Derdak S., Hutter B., Eldridge M.D., Hovig E., Heisler L.E., Beck T.A., Simpson J.T., Tonon L.et al. .. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 2015; 6:10001. - PMC - PubMed
    1. Nakagawa H., Fujita M.. Whole genome sequencing analysis for cancer genomics and precision medicine. Cancer Sci. 2018; 109:513–522. - PMC - PubMed
    1. Nocq J., Celton M., Gendron P., Lemieux S., Wilhelm B.T.. Harnessing virtual machines to simplify next-generation DNA sequencing analysis. Bioinforma. Oxf. Engl. 2013; 29:2075–2083. - PubMed