Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 28:4:e1660.
doi: 10.7717/peerj.1660. eCollection 2016.

AMAS: a fast tool for alignment manipulation and computing of summary statistics

Affiliations

AMAS: a fast tool for alignment manipulation and computing of summary statistics

Marek L Borowiec. PeerJ. .

Abstract

The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python's core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.

Keywords: Alignment properties; Bioinformatics; Concatenation; Phylogenetics; Phylogenomics.

PubMed Disclaimer

Conflict of interest statement

The author declares there is no competing interests.

Figures

Figure 1
Figure 1. AMAS functionality.
(A) Concatenation of two FASTA files. (B) File format conversion from FASTA to PHYLIP. (C) Splitting of a concatenated alignment according to pre-defined partitions. (D) Removal of sequences by name. (E) Creation of randomized replicates from input alignments. All input files are preserved, here indicated in blue. Orange files represent the output. Command line examples are given along with each action.
Figure 2
Figure 2. Performance.
(A) Computing times for concatenation of the Johnson et al. (2013) data set composed of 5,214 separate alignments. FASconCAT-G was run in two modes: with and without simultaneous computation of alignment summaries (FASconCAT-G v1.02.pl -s and FASconCAT-G v1.02.pl -s -i , respectively). Phyutility was ran with java -jar phyutility.jar -concat -in fas -out phyut j2013-test. (B) Computing times for AMAS writing alignment summaries on the four benchmark data sets (AMAS.py summary command).

Similar articles

Cited by

References

    1. Borowiec ML, Lee EK, Chiu JC, Plachetzki DC. Extracting phylogenetic signal and accounting for bias in whole genome data sets supports the Ctenophora as sister to remaining Metazoa. BMC Genomics. 2015;16:987. doi: 10.1186/s12864-015-2146-4. - DOI - PMC - PubMed
    1. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, De Hoon MJL. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. - DOI - PMC - PubMed
    1. Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, Da Fonseca RR, Alfaro-Núñez A, Narula N, Liu L, Burt D, Ellegren H, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, Braun EL, Warnow T, Jun W, Gilbert MTP, Zhang G, The Avian Phylogenomics Consortium Phylogenomic analyses data of the avian phylogenomics project. GigaScience. 2014;4:4. doi: 10.1186/s13742-014-0038-1. - DOI - PMC - PubMed
    1. Johnson BR, Borowiec ML, Chiu JC, Lee EK, Atallah J, Ward PS. Phylogenomics resolves evolutionary relationships among ants, bees, and wasps. Current Biology. 2013;23:2058–2062. doi: 10.1016/j.cub.2013.08.050. - DOI - PubMed
    1. Kück P, Longo GC. FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies. Frontiers in Zoology. 2014;11:81. doi: 10.1186/s12983-014-0081-x. - DOI - PMC - PubMed