PCAP: a whole-genome assembly program
- PMID: 12952883
- PMCID: PMC403719
- DOI: 10.1101/gr.1390403
PCAP: a whole-genome assembly program
Abstract
We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing errors. Repetitive regions of reads are detected on the basis of many overlaps with other reads, instead of many shorter word matches with other reads. Contaminated end regions of reads are identified and removed. Generation of a consensus sequence for a contig is based on an alignment of reads in the contig, in which both base quality values and coverage information are used to determine every consensus base. The PCAP program was tested on a mouse whole-genome data set of 30 million reads and a human Chromosome 20 data set of 1.7 million reads. The program is freely available for academic use.
Figures

Similar articles
-
CAP3: A DNA sequence assembly program.Genome Res. 1999 Sep;9(9):868-77. doi: 10.1101/gr.9.9.868. Genome Res. 1999. PMID: 10508846 Free PMC article.
-
Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9. BMC Genomics. 2015. PMID: 26678408 Free PMC article.
-
ARACHNE: a whole-genome shotgun assembler.Genome Res. 2002 Jan;12(1):177-89. doi: 10.1101/gr.208902. Genome Res. 2002. PMID: 11779843 Free PMC article.
-
Sequence assembly using next generation sequencing data--challenges and solutions.Sci China Life Sci. 2014 Nov;57(11):1140-8. doi: 10.1007/s11427-014-4752-9. Epub 2014 Oct 17. Sci China Life Sci. 2014. PMID: 25326069 Review.
-
A comprehensive review of scaffolding methods in genome assembly.Brief Bioinform. 2021 Sep 2;22(5):bbab033. doi: 10.1093/bib/bbab033. Brief Bioinform. 2021. PMID: 33634311 Review.
Cited by
-
The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum).BMC Genomics. 2012 Jun 20;13:259. doi: 10.1186/1471-2164-13-259. BMC Genomics. 2012. PMID: 22716167 Free PMC article.
-
Lightweight Pattern Matching Method for DNA Sequencing in Internet of Medical Things.Comput Intell Neurosci. 2022 Sep 8;2022:6980335. doi: 10.1155/2022/6980335. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 36120669 Free PMC article.
-
Comparative plant genomics resources at PlantGDB.Plant Physiol. 2005 Oct;139(2):610-8. doi: 10.1104/pp.104.059212. Plant Physiol. 2005. PMID: 16219921 Free PMC article.
-
Genome assembly forensics: finding the elusive mis-assembly.Genome Biol. 2008;9(3):R55. doi: 10.1186/gb-2008-9-3-r55. Epub 2008 Mar 14. Genome Biol. 2008. PMID: 18341692 Free PMC article.
-
Genome assembly quality: assessment and improvement using the neutral indel model.Genome Res. 2010 May;20(5):675-84. doi: 10.1101/gr.096966.109. Epub 2010 Mar 19. Genome Res. 2010. PMID: 20305016 Free PMC article.
References
-
- Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A.F., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301–1310. - PubMed
-
- Bentley, J. 1986. Programming pearls. Addison-Wesley, Reading, MA.
-
- Chao, K.-M., Pearson, W.R., and Miller, W. 1992. Aligning two sequences within a specified diagonal band. Comput. Applic. Biosci. 8: 481–487. - PubMed
WEB SITE REFERENCES
-
- ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_20; human Chromosome 20 sequences.
-
- http://seq.cs.iastate.edu; PCAP mouse assembly and PCAP program.
-
- http://www.ncbi.nlm.nih.gov/Traces; mouse raw data set.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources