Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb;18(2):143-157.
doi: 10.1016/j.jtho.2022.11.006. Epub 2022 Nov 12.

A Clinician's Guide to Bioinformatics for Next-Generation Sequencing

Affiliations
Review

A Clinician's Guide to Bioinformatics for Next-Generation Sequencing

Nicholas Bradley Larson et al. J Thorac Oncol. 2023 Feb.

Abstract

Next-generation sequencing (NGS) technologies are high-throughput methods for DNA sequencing and have become a widely adopted tool in cancer research. The sheer amount and variety of data generated by NGS assays require sophisticated computational methods and bioinformatics expertise. In this review, we provide background details of NGS technology and basic bioinformatics concepts for the clinician investigator interested in cancer research applications, with a focus on DNA-based approaches. We introduce the general principles of presequencing library preparation, postsequencing alignment, and variant calling. We also highlight the common variant annotations and NGS applications for other molecular data types. Finally, we briefly discuss the revealed utility of NGS methods in NSCLC research and study design considerations for research studies that aim to leverage NGS technologies for clinical care.

Keywords: Bioinformatics; DNA; Next-generation sequencing; Review.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Comparison of traditional Sanger sequencing (left) versus next-generation sequencing (right). Both methods leverage fluorescently labeled dideoxynucleotides (ddNTPs) for chain termination. However, while Sanger sequencing uses subsequent size selection to characterize the sequence of a single template, NGS leverages reversible chain termination to characterize sequences one base at time in sequential order for millions of templates. This image was reproduced from Figure 1 in Muzzey, Evans, and Lieber (2015) licensed under Creative Commons Attribution 4.0 International License.
Figure 2:
Figure 2:
Example entry for sequencing read stored in a FASTQ file from platinum genome NA12878, illustrating the various components of the format. The FASTQ file was retrieved from NCBI sequence read archive (SRX000194).
Figure 3:
Figure 3:
Illustration of relationship between sequencing depth and variant call confidence as a function of variant allele frequency (VAF). This simplified representation considers a variant to be detected under the criterion that at least five unique reads support the variant allele to be detected using a binomial probability model with success probability equal to the VAF.
Figure 4:
Figure 4:
(a) Example of a valid variant call format (VCF) file with header and a few variant site records. The header includes multiple pieces of information relevant to the dataset, including the file format, reference data, and details on format and annotation. The body includes variant records where rows indicate individual variants. (b-e) These illustrate representations of sequence alignments and corresponding VCF entries for various variant types. This figure is adapted from Figure 1 from Danecek et al. (2011) under the Creative Commons Attribution Non-Commercial License.

References

    1. Sanger F, Air GM, Barrell BG, et al. Nucleotide sequence of bacteriophage phi X174 DNA. Nature. 1977;265(5596):687–695. - PubMed
    1. Shendure J, Porreca GJ, Reppas NB, et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005;309(5741):1728–1732. - PubMed
    1. Margulies M, Egholm M, Altman WE, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437(7057):376–380. - PMC - PubMed
    1. Austin MC, Smith C, Pritchard CC, Tait JF. DNA Yield From Tissue Samples in Surgical Pathology and Minimum Tissue Requirements for Molecular Testing. Arch Pathol Lab Med. 2016;140(2):130–133. - PubMed
    1. Cho M, Ahn S, Hong M, et al. Tissue recommendations for precision cancer therapy using next generation sequencing: a comprehensive single cancer center’s experiences. Oncotarget. 2017;8(26):42478–42486. - PMC - PubMed