Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 26:10:400.
doi: 10.1186/1471-2164-10-400.

PAVE: program for assembling and viewing ESTs

Affiliations

PAVE: program for assembling and viewing ESTs

Carol Soderlund et al. BMC Genomics. .

Abstract

Background: New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs.

Results: The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs.

Conclusion: The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A schema of the PAVE assembly algorithm. The TC (transitive closure) loop is generally executed multiple times in order to merge contigs that have similar CCSs (contig consensus sequences). The user defines how many times the loop is executed, where for each loop a different set of parameters can be used. If the algorithm is being executed on a multi-processor machine, the user can request that the TC step use multiple processors.
Figure 2
Figure 2
A PAVE contig joined by n's. No ESTs in the left and right sub-contig overlap, so the two sub-contigs are joined by 50 n's, which is indicated by the green box in the consensus sequence under the position 1900. The drawing of the EST indicates quality (blue is low quality), mismatches (red) and gaps (green). What appears to be thick blue lines are ESTs with no quality values, so all bases are low quality.
Figure 3
Figure 3
The jPAVE interface. Within the contig display, the numbers in parentheses are the number of ESTs buried in the corresponding EST. From the stand-alone version of jPAVE shown here, a set of ESTs can be selected, and CAP3 or Phrap can be executed on the ESTs. Also, ESTs from multiple contigs can be selected and assembled. The 'Contig Pairs' link lists all pairs of contigs that are similar; selecting a pair shows the nucleotide and amino acid alignment.
Figure 4
Figure 4
10 largest 454 trichome contigs. The jPAVE listing of the 454 contigs from the trichome assembly sorted on number of ESTs.
Figure 5
Figure 5
An incorrect contig in CAP3 and TGICL. This example shows where using mate-pairs prevents an incorrect join. (A) A contig found in both the CAP3 and TGICL assemblies where the first two ESTs are probably incorrectly joined as their 3' mates do not align. (B) The contig is split in PAVE, since the OSJNEb07I06 and J01B3149O10 mate-pairs must stay together and both the 5' and 3' align from these two clones.

References

    1. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 1991;252:1651–1656. doi: 10.1126/science.2047873. - DOI - PubMed
    1. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
    1. Rothberg JM, Leamon JH. The development and impact of 454 sequencing. Nat Biotechnol. 2008;26:1117–1124. doi: 10.1038/nbt1485. - DOI - PubMed
    1. 454 Life Sciences, a Roche Company http://www.454.com
    1. Solexa/Illumina http://www.illumina.com

Publication types

LinkOut - more resources