Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Nov 17:18:9-19.
doi: 10.1016/j.csbj.2019.11.002. eCollection 2020.

Long walk to genomics: History and current approaches to genome sequencing and assembly

Affiliations
Review

Long walk to genomics: History and current approaches to genome sequencing and assembly

Alice Maria Giani et al. Comput Struct Biotechnol J. .

Abstract

Genomes represent the starting point of genetic studies. Since the discovery of DNA structure, scientists have devoted great efforts to determine their sequence in an exact way. In this review we provide a comprehensive historical background of the improvements in DNA sequencing technologies that have accompanied the major milestones in genome sequencing and assembly, ranging from early sequencing methods to Next-Generation Sequencing platforms. We then focus on the advantages and challenges of the current technologies and approaches, collectively known as Third Generation Sequencing. As these technical advancements have been accompanied by progress in analytical methods, we also review the bioinformatic tools currently employed in de novo genome assembly, as well as some applications of Third Generation Sequencing technologies and high-quality reference genomes.

Keywords: BAC, Bacterial Artificial Chromosome; Bioinformatics; Genome assembly; HGP, Human Genome Project; HMW, high molecular weight; HapMap, haplotype map; NGS, Next Generation Sequencing; Next-generation; OLC, Overlap-Layout-Consensus; QV, Quality Value (QV); Reference; SBS, Sequencing by Synthesis; SMRT, Single Molecule Real-Time; SNPs, Single Nucleotide Polymorphisms; SRA, Short Read Archive; SV, Structural Variant; Sequencing; TGS, Third Generation Sequencing; Third-generation; WGS, Whole Genome Sequencing; ZMW, Zero-Mode Waveguide; bp, base pair; dNTPs, deoxynucleoside triphosphates; ddNTP, 2,3-dideoxynucleoside triphosphate.

PubMed Disclaimer

Conflict of interest statement

G.F. had one travel sponsored by Bionano Genomics. A.M.G., G.R.G. and L.G. declare no conflicts of interest.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Milestones in genome assembly. Timeline illustrating many of the major genome assembly achievements ranging from the beginning of the sequencing era to the large-scale genome projects currently ongoing. Each genome or genome project (GP) is placed under a color-coded background according to the sequencing approach adopted. Light red: early sequencing methods, Yellow: Sanger-based shotgun sequencing, Green: NGS, Light blue: TGS. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
Structural variations. Schematic representation of the five major types of SVs.
Fig. 3
Fig. 3
Chromosome-level scaffolding of de novo genome assemblies. Schematic illustration of an hybrid de novo genome assembly approach where linkage information obtained from optical and Hi-C maps is used to properly position contigs along the chromosomes. First, sequencing reads are assembled together to form contigs. Then, contigs are aligned to consensus BioNano optical maps, a process that enables to accurately order and orient the contigs with respect to each others and assign each contig to a specific chromosome. Here, BioNano optical maps are represented by black lines and the green dots indicate the labeled sequence motifs. Further ordering of the resulting scaffolds is made possible by Hi-C maps, that take into account the interactions between contiguous genomic loci. In the resulting chromosome-level scaffold colored segments represent genomic regions of known sequence while the remaining gaps between the scaffolds of unknown sequence are depicted as gray lines. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Similar articles

Cited by

References

    1. Watson J.D., Crick F.H.C. Molecular structure of nucleic acids. Nature. 1953;171:737–738. - PubMed
    1. Sanger F., Thompson E.O.P. The amino-acid sequence in the glycyl chain of insulin. II. The investigation of peptides from enzymic hydrolysates. Biochem J. 1953;53:366–374. - PMC - PubMed
    1. Sanger F., Thompson E.O.P. The amino-acid sequence in the glycyl chain of insulin. I. The identification of lower peptides from partial hydrolysates. Biochem J. 1953;53:353–366. - PMC - PubMed
    1. Holley R.W., Apgar J., Everett G.A., Madison J.T., Marquisee M. Structure of a ribonucleic acid. Science. 1965;147:1462–1465. - PubMed
    1. Wu R., Kaiser A.D. Structure and base sequence in the cohesive ends of bacteriophage lambda DNA. J Mol Biol. 1968;35:523–537. - PubMed