Back-translation for discovering distant protein homologies in the presence of frameshift mutations
- PMID: 20047662
- PMCID: PMC2821327
- DOI: 10.1186/1748-7188-5-6
Back-translation for discovering distant protein homologies in the presence of frameshift mutations
Abstract
Background: Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level.
Results: We developed a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. Our implementation is freely available at [http://bioinfo.lifl.fr/path/].
Conclusions: Our approach allows to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.
Figures













Similar articles
-
Aligning coding sequences with frameshift extension penalties.Algorithms Mol Biol. 2017 Mar 31;12:10. doi: 10.1186/s13015-017-0101-4. eCollection 2017. Algorithms Mol Biol. 2017. PMID: 28373895 Free PMC article.
-
Alignments of DNA and protein sequences containing frameshift errors.Comput Appl Biosci. 1996 Feb;12(1):31-40. doi: 10.1093/bioinformatics/12.1.31. Comput Appl Biosci. 1996. PMID: 8670617
-
MAGNOLIA: multiple alignment of protein-coding and structural RNA sequences.Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W14-8. doi: 10.1093/nar/gkn321. Epub 2008 May 30. Nucleic Acids Res. 2008. PMID: 18515348 Free PMC article.
-
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156. BMC Bioinformatics. 2005. PMID: 15969769 Free PMC article.
-
ProClust: improved clustering of protein sequences with an extended graph-based approach.Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182. Bioinformatics. 2002. PMID: 12386002
Cited by
-
Aligning coding sequences with frameshift extension penalties.Algorithms Mol Biol. 2017 Mar 31;12:10. doi: 10.1186/s13015-017-0101-4. eCollection 2017. Algorithms Mol Biol. 2017. PMID: 28373895 Free PMC article.
-
Frameshift alignment: statistics and post-genomic applications.Bioinformatics. 2014 Dec 15;30(24):3575-82. doi: 10.1093/bioinformatics/btu576. Epub 2014 Aug 28. Bioinformatics. 2014. PMID: 25172925 Free PMC article.
-
Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.DNA Res. 2019 Apr 1;26(2):157-170. doi: 10.1093/dnares/dsy046. DNA Res. 2019. PMID: 30726896 Free PMC article.
-
Correlated occurrence and bypass of frame-shifting insertion-deletions (InDels) to give functional proteins.PLoS Genet. 2013 Oct;9(10):e1003882. doi: 10.1371/journal.pgen.1003882. Epub 2013 Oct 24. PLoS Genet. 2013. PMID: 24204297 Free PMC article.
-
HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors.BMC Bioinformatics. 2011 May 24;12:198. doi: 10.1186/1471-2105-12-198. BMC Bioinformatics. 2011. PMID: 21609463 Free PMC article.
References
-
- Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. JMB. 1990;215(3):403–410. - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources