Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Sep;13(9):2164-70.
doi: 10.1101/gr.1390403.

PCAP: a whole-genome assembly program

Affiliations

PCAP: a whole-genome assembly program

Xiaoqiu Huang et al. Genome Res. 2003 Sep.

Abstract

We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing errors. Repetitive regions of reads are detected on the basis of many overlaps with other reads, instead of many shorter word matches with other reads. Contaminated end regions of reads are identified and removed. Generation of a consensus sequence for a contig is based on an alignment of reads in the contig, in which both base quality values and coverage information are used to determine every consensus base. The PCAP program was tested on a mouse whole-genome data set of 30 million reads and a human Chromosome 20 data set of 1.7 million reads. The program is freely available for academic use.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Three cases in computation of the 5′ clipping position of a read f. A vertical line shows the start position of an overlap between two reads. The thick line indicates the 5′ clipping range of f. The dot marks the start position of a high-quality region of f. The arrow points to the 5′ clipping position of f. Assume that cdep, the maximum number of overlaps that can be used for computing any clipping position, is set to 3. (A) The maximum depth of coverage by overlaps in the 5′ range of f, denoted by mdep5(f), is 0. (B) We have mdep5(f) = 2 < 3 = cdep. (C) We have mdep5(f) = 4 > 3 = cdep.

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. - PMC - PubMed
    1. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A.F., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301–1310. - PubMed
    1. Batzoglou, S., Jaffe, D., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., and Lander, E.S. 2002. ARACHNE: A whole-genome shotgun assembler. Genome Res. 12: 177–189. - PMC - PubMed
    1. Bentley, J. 1986. Programming pearls. Addison-Wesley, Reading, MA.
    1. Chao, K.-M., Pearson, W.R., and Miller, W. 1992. Aligning two sequences within a specified diagonal band. Comput. Applic. Biosci. 8: 481–487. - PubMed

WEB SITE REFERENCES

    1. ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_20; human Chromosome 20 sequences.
    1. http://seq.cs.iastate.edu; PCAP mouse assembly and PCAP program.
    1. http://www.ncbi.nlm.nih.gov/Traces; mouse raw data set.

Publication types

LinkOut - more resources