Parameters for accurate genome alignment
- PMID: 20144198
- PMCID: PMC2829014
- DOI: 10.1186/1471-2105-11-80
Parameters for accurate genome alignment
Abstract
Background: Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.
Results: We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that gamma-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.
Conclusions: These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.
Figures







Similar articles
-
Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score.BMC Bioinformatics. 2008 Dec 12;9:531. doi: 10.1186/1471-2105-9-531. BMC Bioinformatics. 2008. PMID: 19077267 Free PMC article.
-
CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score.Bioinformatics. 2009 Dec 15;25(24):3236-43. doi: 10.1093/bioinformatics/btp580. Epub 2009 Oct 6. Bioinformatics. 2009. PMID: 19808876
-
How accurately is ncRNA aligned within whole-genome multiple alignments?BMC Bioinformatics. 2007 Oct 26;8:417. doi: 10.1186/1471-2105-8-417. BMC Bioinformatics. 2007. PMID: 17963514 Free PMC article.
-
Computation and analysis of genomic multi-sequence alignments.Annu Rev Genomics Hum Genet. 2007;8:193-213. doi: 10.1146/annurev.genom.8.080706.092300. Annu Rev Genomics Hum Genet. 2007. PMID: 17489682 Review.
-
Upcoming challenges for multiple sequence alignment methods in the high-throughput era.Bioinformatics. 2009 Oct 1;25(19):2455-65. doi: 10.1093/bioinformatics/btp452. Epub 2009 Jul 30. Bioinformatics. 2009. PMID: 19648142 Free PMC article. Review.
Cited by
-
Gentle masking of low-complexity sequences improves homology search.PLoS One. 2011;6(12):e28819. doi: 10.1371/journal.pone.0028819. Epub 2011 Dec 19. PLoS One. 2011. PMID: 22205972 Free PMC article.
-
Improved search heuristics find 20,000 new alignments between human and mouse genomes.Nucleic Acids Res. 2014 Apr;42(7):e59. doi: 10.1093/nar/gku104. Epub 2014 Feb 3. Nucleic Acids Res. 2014. PMID: 24493737 Free PMC article.
-
Divergent copies of the large inverted repeat in the chloroplast genomes of ulvophycean green algae.Sci Rep. 2017 Apr 20;7(1):994. doi: 10.1038/s41598-017-01144-1. Sci Rep. 2017. PMID: 28428552 Free PMC article.
-
Genome-wide signatures of complex introgression and adaptive evolution in the big cats.Sci Adv. 2017 Jul 19;3(7):e1700299. doi: 10.1126/sciadv.1700299. eCollection 2017 Jul. Sci Adv. 2017. PMID: 28776029 Free PMC article.
-
Species-level resolution of 16S rRNA gene amplicons sequenced through the MinION™ portable nanopore sequencer.Gigascience. 2016 Jan 28;5:4. doi: 10.1186/s13742-016-0111-z. eCollection 2016. Gigascience. 2016. PMID: 26823973 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials