Review

. 2019 Feb 15:7:41-64.

doi: 10.1146/annurev-animal-020518-115005. Epub 2018 Oct 31.

Whole-Genome Alignment and Comparative Annotation

Joel Armstrong¹, Ian T Fiddes^{1

2}, Mark Diekhans¹, Benedict Paten¹

Affiliations

¹ UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; email: bpaten@ucsc.edu.
² 10x Genomics, Pleasanton, California 94566, USA.

PMID: 30379572
PMCID: PMC6450745
DOI: 10.1146/annurev-animal-020518-115005

Review

Whole-Genome Alignment and Comparative Annotation

Joel Armstrong et al. Annu Rev Anim Biosci. 2019.

. 2019 Feb 15:7:41-64.

doi: 10.1146/annurev-animal-020518-115005. Epub 2018 Oct 31.

Authors

Joel Armstrong¹, Ian T Fiddes^{1

2}, Mark Diekhans¹, Benedict Paten¹

Affiliations

¹ UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; email: bpaten@ucsc.edu.
² 10x Genomics, Pleasanton, California 94566, USA.

PMID: 30379572
PMCID: PMC6450745
DOI: 10.1146/annurev-animal-020518-115005

Abstract

Rapidly improving sequencing technology coupled with computational developments in sequence assembly are making reference-quality genome assembly economical. Hundreds of vertebrate genome assemblies are now publicly available, and projects are being proposed to sequence thousands of additional species in the next few years. Such dense sampling of the tree of life should give an unprecedented new understanding of evolution and allow a detailed determination of the events that led to the wealth of biodiversity around us. To gain this knowledge, these new genomes must be compared through genome alignment (at the sequence level) and comparative annotation (at the gene level). However, different alignment and annotation methods have different characteristics; before starting a comparative genomics analysis, it is important to understand the nature of, and biases and limitations inherent in, the chosen methods. This review is intended to act as a technical but high-level overview of the field that should provide this understanding. We briefly survey the state of the genome alignment and comparative annotation fields and potential future directions for these fields in a new, large-scale era of comparative genomics.

Keywords: comparative genomics; genome alignment; genome annotation.

PubMed Disclaimer

Figures

**Figure 1**
An example of how different heuristics affect a genome alignment. All panels are dotplots: A line with positive slope indicates an alignment from the positive strand of sequence 1 to the positive strand of sequence 2, and a negative slope indicates an alignment from the positive strand of sequence 1 to the negative strand of sequence 2. Solid blue lines represent alignments, and red dashed lines represent where alignments have been missed. (a) The true alignment between the two sequences. (b) The same alignment if a single-copy aligner perfectly recovered the true alignment, except for the ignored duplication. (c) The same alignment according to a global or approximately global aligner: No edit operations except insertions, deletions, and substitutions are allowed, so substantial alignment is missing.

**Figure 2**
A diagram showing the difference between a reference-biased and a reference-free multiple alignment. In a human-biased multiple alignment, any large regions that are deleted in human, or inserted somewhere else in the tree, cannot be aligned.

**Figure 3**
An example of how progressive genome alignment works, focused on aligners like VISTA-LAGAN (SuperMap) (36) and progressiveCactus (40), which reconstruct ancestral genomes as input for further alignment steps. (a) A large guide tree (usually the species tree), which may include many species, is divided up into smaller local alignment problems of a few genomes each. (b) A diagram of what occurs within each subproblem. Each subproblem is focused on reconstructing a single ancestral genome, which is then used as input for subproblems further up the tree. Ingroup genomes (children of the ancestor in question) and, optionally, outgroup genomes (nondescendants of the ancestor) are aligned together. A plausible ancestral reconstruction is generated for use in later subproblems.

**Figure 4**
Comparing RNA sequencing (RNA-seq) expression quantification across different species with Comparative Annotation Toolkit (CAT). Kallisto (109) protein-coding gene-level expression for chimpanzee induced pluripotent stem cell (iPSC) RNA-seq is compared with human across all of the chimpanzee annotation and assembly combinations as well as when mapped directly to human. In all cases, the x-axis is the transcripts per million of human iPSC data mapped to GRCh38 annotated with GENCODE V27. The highest correlation (Pearson r = 0.96) is seen when comparing Clint (panTro6) annotated with CAT to GRCh38. The value p is the p-value of observing the Pearson correlation.

See this image and copyright information in PMC

References

1. Needleman SB, Wunsch CD. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443–53 - PubMed
1. Smith T, Waterman M. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195–97 - PubMed
1. Bray N, Dubchak I, Pachter L. 2003. AVID: a global alignment program. Genome Res. 13:97–102 - PMC - PubMed
1. Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. 2018. MUMmer4: a fast and versatile genome alignment system. PLOS Comput. Biol. 14:e1005944. - PMC - PubMed
1. Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, et al. 2003. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13:721–31 - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Whole-Genome Alignment and Comparative Annotation

Affiliations

Whole-Genome Alignment and Comparative Annotation

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources