Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 25;16(1):98.
doi: 10.1186/s12859-015-0515-2.

aTRAM - automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data

Affiliations

aTRAM - automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data

Julie M Allen et al. BMC Bioinformatics. .

Abstract

Background: Assembling genes from next-generation sequencing data is not only time consuming but computationally difficult, particularly for taxa without a closely related reference genome. Assembling even a draft genome using de novo approaches can take days, even on a powerful computer, and these assemblies typically require data from a variety of genomic libraries. Here we describe software that will alleviate these issues by rapidly assembling genes from distantly related taxa using a single library of paired-end reads: aTRAM, automated Target Restricted Assembly Method. The aTRAM pipeline uses a reference sequence, BLAST, and an iterative approach to target and locally assemble the genes of interest.

Results: Our results demonstrate that aTRAM rapidly assembles genes across distantly related taxa. In comparative tests with a closely related taxon, aTRAM assembled the same sequence as reference-based and de novo approaches taking on average < 1 min per gene. As a test case with divergent sequences, we assembled >1,000 genes from six taxa ranging from 25 - 110 million years divergent from the reference taxon. The gene recovery was between 97 - 99% from each taxon.

Conclusions: aTRAM can quickly assemble genes across distantly-related taxa, obviating the need for draft genome assembly of all taxa of interest. Because aTRAM uses a targeted approach, loci can be assembled in minutes depending on the size of the target. Our results suggest that this software will be useful in rapidly assembling genes for phylogenomic projects covering a wide taxonomic range, as well as other applications. The software is freely available http://www.github.com/juliema/aTRAM .

PubMed Disclaimer

Figures

Figure 1
Figure 1
Graphic of the aTRAM method. A) Formation of the aTRAM database; DNA is sequenced into a paired-end short read dataset (SRD). aTRAM splits the SRD into shards, creates a BLAST formatted database of the first pair and indexes the paired-end for the sequences in each shard. B) In iteration 0 a query sequence in either amino acid or DNA format is queried against the aTRAM formatted database using BLAST. The top-hits and their paired-ends are selected and assembled de novo. In the following iterations the contigs from the previous iteration are queried against the same database using BLAST, the top-hits and paired-ends selected and assembled de novo until the full locus is assembled.
Figure 2
Figure 2
Y axis is the ratio of the length of the contig assembled with aTRAM by the length of the contig assembled with the reference based approach. Points under the 1 line are longer with the reference based approach and those above the line are longer from aTRAM assemblies. The x-axis indicates the uncorrected p-distance comparing the aTRAM contigs to the reference DNA sequence. The graph illustrates that aTRAM assemblies tended to be longer and the longer genes tended to be the more divergent ones, suggesting that aTRAM can assemble more divergent sections than a reference based approach.

References

    1. Do K, Qin ZS, Vannucci M. 2010. Advances in Statistical Bioinformatics Models and Integrative Inference for High-Throughput Data. Camb Univ Press
    1. Metzker M. Sequencing technologies - the next generation. Nat Rev Genet. 2011;11:31–46. doi: 10.1038/nrg2626. - DOI - PubMed
    1. Li C, Hofreiter M, Straube N, Corrigan S, Naylor GJP. Capturing protein-coding genes across highly divergent species. Biotechniques. 2013;54:321–6. - PubMed
    1. Warren RL, Holt RA. 2011. Targeted Assembly of Short Sequence Reads. PLoS One. doi:10.1371/journal.pone.0019816 - PMC - PubMed
    1. Peterlogo P, Chikhi R. Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics. 2012;13:48. doi: 10.1186/1471-2105-13-48. - DOI - PMC - PubMed

Publication types