Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug 6;15(8):1036.
doi: 10.3390/genes15081036.

A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology

Affiliations
Review

A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology

Alexis J Clark et al. Genes (Basel). .

Abstract

The rapid advancement of high-throughput technologies, particularly next-generation sequencing (NGS), has revolutionized cancer research by enabling the investigation of genetic variations such as SNPs, copy number variations, gene expression, and protein levels. These technologies have elevated the significance of precision oncology, creating a demand for biomarker identification and validation. This review explores the complex interplay of oncology, cancer biology, and bioinformatics tools, highlighting the challenges in statistical learning, experimental validation, data processing, and quality control that underpin this transformative field. This review outlines the methodologies and applications of bioinformatics tools in cancer genomics research, encompassing tools for data structuring, pathway analysis, network analysis, tools for analyzing biomarker signatures, somatic variant interpretation, genomic data analysis, and visualization tools. Open-source tools and repositories like The Cancer Genome Atlas (TCGA), Genomic Data Commons (GDC), cBioPortal, UCSC Genome Browser, Array Express, and Gene Expression Omnibus (GEO) have emerged to streamline cancer omics data analysis. Bioinformatics has significantly impacted cancer research, uncovering novel biomarkers, driver mutations, oncogenic pathways, and therapeutic targets. Integrating multi-omics data, network analysis, and advanced ML will be pivotal in future biomarker discovery and patient prognosis prediction.

Keywords: RNA-Seq; bioinformatics; biomarker discovery; oncology; predictive algorithms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Overview of sequencing technologies 2004–2022. Post-Sanger sequencing technologies, beginning with Roche 454 Pyrosequencing in 2004, Illumina HiSeq and MiSeq (2007), SOLiD (2008), PacBioSMRT (2009), DNBS: Helicos (2010), Ion Torrent (2011), in situ RNA sequencing (2013), ONT nanopore (2015), spatial transcriptomics (2016), GeoSeq (2017), Slide-Seq (2019), and Revio (2022), illustrate the continuous improvements contributing to the genomics. It explores the various NGS sequencing techniques that have expanded upon chain termination and polymerase chain reaction.
Figure 2
Figure 2
Overview of RNA-Seq data workflow. In this example of RNA-Seq data analysis workflow, raw sequence reads serve as the input. Raw sequences undergo pre-processing and quality control. Next, the Fastq files are aligned to a reference genome (SAM/BAM files) and are quantified to generate count matrices (Text files). Downstream analysis, such as differential gene expression, uses the count matrices as input. The data results can be visualized and integrated using R packages, Python libraries, and other tools.
Figure 3
Figure 3
Overview of a classification algorithm workflow. Classification algorithms utilize features to identify patterns in the input data. During the training stage, the model uses statistical calculations to develop predictions. In the testing stage, the initial model is employed to evaluate its performance using precision, accuracy, recall, and F1 score. The final step is to validate the model on external datasets and address various scientific questions.

References

    1. Zamora Atenza C., Anguera G., Riudavets Melià M., Alserawan De Lamo L., Sullivan I., Barba Joaquin A., Serra Lopez J., Ortiz M.A., Mulet M., Vidal S., et al. The integration of systemic and tumor PD-L1 as a predictive biomarker of clinical outcomes in patients with advanced NSCLC treated with PD-(L)1blockade agents. Cancer Immunol. Immunother. 2022;71:1823–1835. doi: 10.1007/s00262-021-03107-y. - DOI - PMC - PubMed
    1. Mullis K.B., Faloona F.A. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzym. 1987;155:335–350. - PubMed
    1. Sanger F., Nicklen S., Coulson A.R. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA. 1977;74:5463–5467. doi: 10.1073/pnas.74.12.5463. - DOI - PMC - PubMed
    1. Rabbani B., Tekin M., Mahdieh N. The promise of whole-exome sequencing in medical genetics. J. Hum. Genet. 2014;59:5–15. doi: 10.1038/jhg.2013.114. - DOI - PubMed
    1. Makałowski W. The human genome structure and organization. Acta Biochim. Pol. 2001;48:587–598. doi: 10.18388/abp.2001_3893. - DOI - PubMed

MeSH terms

Substances

LinkOut - more resources