AMAS: a fast tool for alignment manipulation and computing of summary statistics
- PMID: 26835189
- PMCID: PMC4734057
- DOI: 10.7717/peerj.1660
AMAS: a fast tool for alignment manipulation and computing of summary statistics
Abstract
The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python's core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.
Keywords: Alignment properties; Bioinformatics; Concatenation; Phylogenetics; Phylogenomics.
Conflict of interest statement
The author declares there is no competing interests.
Figures


Similar articles
-
FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies.Front Zool. 2014 Nov 18;11(1):81. doi: 10.1186/s12983-014-0081-x. eCollection 2014. Front Zool. 2014. PMID: 25426157 Free PMC article.
-
BAD2matrix: Phylogenomic matrix concatenation, indel coding, and more.Appl Plant Sci. 2024 Sep 24;12(6):e11604. doi: 10.1002/aps3.11604. eCollection 2024 Nov-Dec. Appl Plant Sci. 2024. PMID: 39628543 Free PMC article.
-
GET_PHYLOMARKERS, a Software Package to Select Optimal Orthologous Clusters for Phylogenomics and Inferring Pan-Genome Phylogenies, Used for a Critical Geno-Taxonomic Revision of the Genus Stenotrophomonas.Front Microbiol. 2018 May 1;9:771. doi: 10.3389/fmicb.2018.00771. eCollection 2018. Front Microbiol. 2018. PMID: 29765358 Free PMC article.
-
Do Alignment and Trimming Methods Matter for Phylogenomic (UCE) Analyses?Syst Biol. 2021 Apr 15;70(3):440-462. doi: 10.1093/sysbio/syaa064. Syst Biol. 2021. PMID: 32797207
-
BIR Pipeline for Preparation of Phylogenomic Data.Evol Bioinform Online. 2015 Apr 27;11:79-83. doi: 10.4137/EBO.S10189. eCollection 2015. Evol Bioinform Online. 2015. PMID: 25987827 Free PMC article. Review.
Cited by
-
A global phylogenomic and metabolic reconstruction of the large intestine bacterial community of domesticated cattle.Microbiome. 2022 Sep 26;10(1):155. doi: 10.1186/s40168-022-01357-1. Microbiome. 2022. PMID: 36155629 Free PMC article.
-
Resolving species boundaries in a recent radiation with the Angiosperms353 probe set: the Lomatium packardiae/L. anomalum clade of the L. triternatum (Apiaceae) complex.Am J Bot. 2021 Jul;108(7):1217-1233. doi: 10.1002/ajb2.1676. Epub 2021 Jun 8. Am J Bot. 2021. PMID: 34105148 Free PMC article.
-
Diversity and evolution of the vertebrate chemoreceptor gene repertoire.Nat Commun. 2024 Feb 15;15(1):1421. doi: 10.1038/s41467-024-45500-y. Nat Commun. 2024. PMID: 38360851 Free PMC article.
-
Phylogeny and evolution of hemipteran insects based on expanded genomic and transcriptomic data.BMC Biol. 2024 Sep 2;22(1):190. doi: 10.1186/s12915-024-01991-1. BMC Biol. 2024. PMID: 39218865 Free PMC article.
-
Orthologous nuclear markers and new transcriptomes that broadly cover the phylogenetic diversity of Acanthaceae.Appl Plant Sci. 2019 Sep 25;7(9):e11290. doi: 10.1002/aps3.11290. eCollection 2019 Sep. Appl Plant Sci. 2019. PMID: 31572631 Free PMC article.
References
-
- Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, De Hoon MJL. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. - DOI - PMC - PubMed
-
- Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, Da Fonseca RR, Alfaro-Núñez A, Narula N, Liu L, Burt D, Ellegren H, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, Braun EL, Warnow T, Jun W, Gilbert MTP, Zhang G, The Avian Phylogenomics Consortium Phylogenomic analyses data of the avian phylogenomics project. GigaScience. 2014;4:4. doi: 10.1186/s13742-014-0038-1. - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous