Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Aug 29;23(1):182.
doi: 10.1186/s13059-022-02735-6.

Multiple genome alignment in the telomere-to-telomere assembly era

Affiliations
Review

Multiple genome alignment in the telomere-to-telomere assembly era

Bryce Kille et al. Genome Biol. .

Abstract

With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.

Keywords: Comparative genomics; Homology; Multiple genome alignment; Synteny.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
An example of evolution resulting in different classes of homology. In (a), deletion and speciation events are depicted, resulting in both orthologs and paralogs. The resulting relationships between all pairs of X segments as well as between all pairs of Y segments are depicted in (b). Notably, segments Yb3 and Yc2 participate in three important homology relationship types. As the most recent common ancestor of Ya3 and Yb3 is at a speciation event, they are orthologous to each other. Segments Yc2 and Ya1 on the other hand are paralogs, as their homology is a result of duplication. Finally, Yb3 and Yc2 are special types of paralogs known as “false orthologs” i.e. paralogous segments all of whose other copies in their respective genomes have since been deleted, resulting in two segments which appear to never have been duplicated. The presence of such false orthologies complicates the problem of core-genome alignment particularly, but also the problem of further categorizing homologies identified through genome alignment. It is worth noting that homology relationships within a single species’ genome do exist, but are not depicted here. For example, Ya1,Ya2 and Ya3 are all paralogous to each other
Fig. 2
Fig. 2
Timeline of MGA tools. The original sequencing of human and mouse genomes spurred the development of a number of multiple genome alignment tools. Following this initial spurt, the next generation of genome aligners (starting with Enredo-Pecan and ending with Cactus) were developed, followed by a 6-year period of silence, with Parsnp being one of the few tools released between 2012 and 2019
Fig. 3
Fig. 3
a-c The number of assemblies available for different groups of eukaryotic species in NCBI. d-f The number of eukaryotic species with available genome assemblies in NCBI. The emergence of 3rd-generation sequencing technologies and their accompanying assembly algorithms in the early 2010s is largely responsible for the increasing rate of novel eukaryotic genome deposits
Fig. 4
Fig. 4
The pipeline of MGA. a An example of large-scale genomic rearrangement, insertion, and deletion occurring across a set genomes. The lines denote bounds of homologous segments, and inversions are denoted by crossing lines. b Anchors from the section of 3 genomes surrounded by the dotted box in a. Once again, the lines between genomes represent homology. The labeled blocks on each genome correspond to anchors, where two blocks with the same label are inferred to be potentially homologous sites. c The alignment graph obtained by merging the 3 linear genome graphs in b. At this step, the aim of MGA is to find colinear paths in the graph i.e. sequences of anchors which are traversed by a group of genomes in the same order. d Often times, the initial set of anchors will be too noisy, containing spurious alignments which prevent the formation of longer, more reliable colinear paths. By removing anchor F, the alignment graph becomes much simpler and yields longer colinear paths, where each set of colinear paths is denoted by the color. e For each colinear path, MGA tools perform an MSA, yielding a set of sequence alignments which together make up the genome alignment

Similar articles

Cited by

References

    1. Jarvis ED. Perspectives from the avian phylogenomics project: questions that can be answered with sequencing all genomes of a vertebrate class. Ann Rev Anim Biosci. 2016;4:45–59. doi: 10.1146/annurev-animal-021815-111216. - DOI - PubMed
    1. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, et al. The complete sequence of a human genome. Science. 2022;376(6588):44–53. doi: 10.1126/science.abj6987. - DOI - PMC - PubMed
    1. Hodgkinson A, Eyre-Walker A. Variation in the mutation rate across mammalian genomes. Nat Rev Genet. 2011;12(11):756–66. doi: 10.1038/nrg3098. - DOI - PubMed
    1. Hannenhalli S, Pevzner PA. Proceedings of IEEE 36th Annual Foundations of Computer Science. New York: IEEE; 1995. Transforming men into mice (polynomial algorithm for genomic distance problem)
    1. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D. Ultraconserved elements in the human genome. Science. 2004;304(5675):1321–5. doi: 10.1126/science.1098119. - DOI - PubMed

Publication types

LinkOut - more resources