Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 28;32(2):149-159.
doi: 10.4014/jmb.2110.10017.

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene

Affiliations

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene

Atul Ajaykumar et al. J Microbiol Biotechnol. .

Abstract

Cancers of the lung and liver are the top 10 leading causes of cancer death worldwide. Thus, it is essential to identify the genes specifically expressed in these two cancer types to develop new therapeutics. Although many messenger RNA (mRNA) sequencing data related to these cancer cells are available due to the advancement of next-generation sequencing (NGS) technologies, optimized data processing methods need to be developed to identify the novel cancer-specific genes. Here, we conducted an analytical comparison between Bowtie2, a Burrows-Wheeler transform-based alignment tool, and Kallisto, which adopts pseudo alignment based on a transcriptome de Bruijn graph using mRNA sequencing data on normal cells and lung/liver cancer tissues. Before using cancer data, simulated mRNA sequencing reads were generated, and the high Transcripts Per Million (TPM) values were compared. mRNA sequencing reads data on lung/liver cancer cells were also extracted and quantified. While Kallisto could directly give the output in TPM values, Bowtie2 provided the counts. Thus, TPM values were calculated by processing the Sequence Alignment Map (SAM) file in R using package Rsubread and subsequently in python. The analysis of the simulated sequencing data revealed that Kallisto could detect more transcripts and had a higher overlap over Bowtie2. The evaluation of these two data processing methods using the known lung cancer biomarkers concludes that in standard settings without any dedicated quality control, Kallisto is more effective at producing faster and more accurate results than Bowtie2. Such conclusions were also drawn and confirmed with the known biomarkers specific to liver cancer.

Keywords: Kallisto; bowtie2; cancer-specific biomarker; lung/liver cancer; mRNA sequencing data; mapping comparison.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors have no financial conflicts of interest to declare.

Figures

Fig. 1
Fig. 1. De Bruijn graph is a directed graph with overlapping nodes, where the nodes are k-mers.
As an example, here we have 3 different transcripts, shown by color.
Fig. 2
Fig. 2. Pseudoalignment is when the matching skips similar k-compatibility classes.
Since the k-compatibility class value is the same, those transcripts share similar sequences. So, the algorithm skips the similar nodes for efficiency.
Fig. 3
Fig. 3. Cosine similarity between 2 items is the θ angle between the items.
Fig. 4
Fig. 4. Process architecture for Kallisto and Bowtie2 for the alignment of transcripts and calculation of TPM values.
The pipeline on top shows the input, process, and output for Kallisto. Since no additional steps are being carried out by us, the pipeline is direct. In case of Bowtie2, the tool does not directly give the output as transcript TPM values, so we added an additional R process to handle the transcript read counts and TPM conversion to convert the expression levels to TPM.
Fig. 5
Fig. 5
Comparison of top 100 TPM values of tumor cells taken in ascending order by the tools to show if Bowtie2 can align the top 100 from Kallisto and vice versa.
Fig. 6
Fig. 6
Comparison of top 100 TPM values of non-tumor cells taken in ascending order by the tools to show if Bowtie2 can align the top 100 from Kallisto and vice versa.

References

    1. Siegel, Rebecca L, Kimberly DM, Ahmedin J. Cancer statistics, 2019. CA Cancer J. Clin. 2019;69:7–34. doi: 10.3322/caac.21551. - DOI - PubMed
    1. Street W. Cancer Facts & Figures 2019. American Cancer Society; Atlanta, GA, USA: 2019. Available from https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts....
    1. Islam S, Ronald C W. Advanced imaging (positron emission tomography and magnetic resonance imaging) and image-guided biopsy in initial staging and monitoring of therapy of lung cancer. Cancer J. 2013;19:208–216. doi: 10.1097/PPO.0b013e318295185f. - DOI - PMC - PubMed
    1. Toyoda H, Kumada T, Tada T, Sone Y, Kaneoka Y, Maeda A. Tumor markers for hepatocellular carcinoma: Simple and significant predictors of outcome in patients with HCC. Liver Cancer. 2015;4:126–136. doi: 10.1159/000367735. - DOI - PMC - PubMed
    1. Blanco-Prieto S, De CL, Rodríguez-Girondo M. Highly sensitive marker panel for guidance in lung cancer rapid diagnostic units. Sci. Rep. 2017;7:41151. doi: 10.1038/srep41151. - DOI - PMC - PubMed