Assessing protein coding region integrity in cDNA sequencing projects
- PMID: 9682051
- DOI: 10.1093/bioinformatics/14.5.384
Assessing protein coding region integrity in cDNA sequencing projects
Abstract
Motivation: In cDNA sequencing projects, it is vital to know whether the protein coding region of a sequence is complete, or whether errors have occurred during library construction. Here we present a linear discriminant approach that predicts this completeness by estimating the probability of each ATG being the initiation codon.
Results: Because of the current shortage of full-length cDNA data on which to base this work, tests were performed on a non-redundant set of 660 initiation codon-containing DNA sequences that had been conceptually spliced into mRNA/cDNA. We also used an edited set of the same sequences that only contained the region following the initiation codon as a negative control. Using the criterion that only a single prediction is allowed for each sequence, a cut-off was selected at which discrimination of both positive and negative sets was equal. At this cut-off, 67% of each set could be correctly distinguished, with the correct ATG codon also being identified in the positive set. Reliability could be increased further by raising the cut-off or including homologues, the relative merits of which are discussed.
Availability: The prediction program, called ATGpr, and other data are available at http://www.hri.co.jp/atgpr
Contact: swintech@hri.co.jp
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
