. 2011 Feb 1;27(3):317-25.

doi: 10.1093/bioinformatics/btq651. Epub 2010 Dec 1.

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences

Peter Menzel¹, Peter F Stadler, Jan Gorodkin

Affiliations

PMID: 21123221
PMCID: PMC3031029
DOI: 10.1093/bioinformatics/btq651

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences

Peter Menzel et al. Bioinformatics. 2011.

. 2011 Feb 1;27(3):317-25.

doi: 10.1093/bioinformatics/btq651. Epub 2010 Dec 1.

Authors

Peter Menzel¹, Peter F Stadler, Jan Gorodkin

Affiliation

¹ Center for non-coding RNA in Technology and Health, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark.

PMID: 21123221
PMCID: PMC3031029
DOI: 10.1093/bioinformatics/btq651

Abstract

Motivation: The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches.

Results: We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset.

Availability: maxAlike is available for download and web server at: http://rth.dk/resources/maxAlike.

PubMed Disclaimer

Figures

**Fig. 1.**
The steps of the *maxAlike* algorithm. From the input, consisting of a multiple alignment and a phylogenetic tree, the algorithm computes PSSMs and reconstructed sequences for the target species. The output can readily be applied to primer design and homology search.

**Fig. 2.**
Dataset *MZ44-2*: median MATCH scores for *maxAlike* (ML) and nucleotide frequency (*Freq*) PSSMs for each species compared with the average distance to its phylogenetically closest neighbor.

**Fig. 3.**
Dataset *MZ44-1*: recovery rates in percent for sequences reconstructed by *maxAlike* (ML), frequency-based consensus (*Freq*) and nearest neighbor (NN). Each point is one species plotted as its average distance to the phylogenetically nearest neighbor. (a) threshold 0.5. (b) no threshold.

**Fig. 4.**
Dataset *MZ44-1*: (a) Average change of total recovery rates across all species for different sets of input trees: gene tree (F); reference species tree (S); (1–10) bins with trees estimated from other genomic loci; increasing bin number corresponds to higher topological distance to reference tree. (b) Change in the T_m difference due to increased number of mismatches in the primer sequence.

**Fig. 5.**
Average change of total recovery rates across all species for different sets of input trees: gene tree (F); reference species tree (S); bins with trees having distorted branch lengths using the specified relative normal errors. (a) MZ44-1. (b) MZ44-2.

**Fig. 6.**
Dataset *MZ44-2*: average differences of the expected and actual melting temperature T_m of the primer–template duplex for primers derived from *maxAlike* (threshold 0.5) and *Freq* (threshold 0.5) reconstructed sequences and nearest neighbor (NN) sequence for each species, sorted by average distance to its phylogenetically nearest neighbor.

See this image and copyright information in PMC

Cited by

phoD Alkaline Phosphatase Gene Diversity in Soil.
Ragot SA, Kertesz MA, Bünemann EK. Ragot SA, et al. Appl Environ Microbiol. 2015 Oct;81(20):7281-9. doi: 10.1128/AEM.01823-15. Epub 2015 Aug 7. Appl Environ Microbiol. 2015. PMID: 26253682 Free PMC article.
Evolution and Phylogeny of MicroRNAs - Protocols, Pitfalls, and Problems.
Velandia-Huerto CA, Yazbeck AM, Schor J, Stadler PF. Velandia-Huerto CA, et al. Methods Mol Biol. 2022;2257:211-233. doi: 10.1007/978-1-0716-1170-8_11. Methods Mol Biol. 2022. PMID: 34432281
One origin for metallo-β-lactamase activity, or two? An investigation assessing a diverse set of reconstructed ancestral sequences based on a sample of phylogenetic trees.
Alderson RG, Barker D, Mitchell JB. Alderson RG, et al. J Mol Evol. 2014 Oct;79(3-4):117-29. doi: 10.1007/s00239-014-9639-7. Epub 2014 Sep 4. J Mol Evol. 2014. PMID: 25185655 Free PMC article.
Comparative RNA Genomics.
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Backofen R, et al. Methods Mol Biol. 2024;2802:347-393. doi: 10.1007/978-1-0716-3838-5_12. Methods Mol Biol. 2024. PMID: 38819565
FastML: a web server for probabilistic reconstruction of ancestral sequences.
Ashkenazy H, Penn O, Doron-Faigenboim A, Cohen O, Cannarozzi G, Zomer O, Pupko T. Ashkenazy H, et al. Nucleic Acids Res. 2012 Jul;40(Web Server issue):W580-4. doi: 10.1093/nar/gks498. Epub 2012 May 31. Nucleic Acids Res. 2012. PMID: 22661579 Free PMC article.

References

1. Boutros R, et al. UniPrime2: a web service providing easier Universal Primer design. Nucleic Acids Res. 2009;37:W209–W213. - PMC - PubMed
1. Browser UG. The ucsc 44 way alignments. 2010 Available at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=cons44way.
1. Cha RS, Thilly WG. Specificity, efficiency, and fidelity of PCR. PCR Methods Appl. 1993;3:S18–S29. - PubMed
1. Contreras-Moreira B, et al. primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies. Nucleic Acids Res. 2009;37:W95–W100. - PMC - PubMed
1. Díaz-Uriarte R, Garland T. Effects of branch length errors on the performance of phylogenetically independent contrasts. Syst. Biol. 1998;47:654–672. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences

Affiliation

maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources