Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data
- PMID: 8889550
- DOI: 10.1101/gr.6.9.829
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data
Abstract
A rigorous analysis of the Merck-sponsored EST data with respect to known gene sequences increases the utility of the data set and helps refine methods for building a gene index. A highly curated human transcript data base was used as a reference data set of known genes. A detailed analysis of EST sequences derived from known genes was performed to assess the accuracy of EST sequence annotation. The EST data was screened to remove low-quality and low-complexity sequences. A set of high-quality ESTs similar to the transcript data base was identified using BLAST; this subset of ESTs was compared with the set of known genes using the Smith-Waterman algorithm. Error rates of several types were assessed based on a flexible match criterion defining sequence identity. The rate of lane-tracking errors is very low, approximately 0.5%. Insert size data is accurate within approximately 20%. Reversed clone and internal priming error rates are approximately 5% and 2.5%, respectively, contributing to the incorrect identification of reads as 3' ends of genes. Follow-up investigation reveals that a significant number of clones, miscategorized as reversed, represent overlapping genes on the opposite strand of entries in the transcript data base. Relevance of these results to the creation of a high-quality index to the human genome capable of supporting diverse genomic investigations is discussed.
Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
A comparison of expressed sequence tags (ESTs) to human genomic sequences.Nucleic Acids Res. 1997 Apr 15;25(8):1626-32. doi: 10.1093/nar/25.8.1626. Nucleic Acids Res. 1997. PMID: 9092672 Free PMC article.
-
Analysis of EST-driven gene annotation in human genomic sequence.Genome Res. 1998 Apr;8(4):362-76. doi: 10.1101/gr.8.4.362. Genome Res. 1998. PMID: 9548972
-
It's the genes! EST access to human genome content.Bioessays. 1996 Dec;18(12):973-81. doi: 10.1002/bies.950181207. Bioessays. 1996. PMID: 8976154 Review.
-
[Anatomy of EST data].Tanpakushitsu Kakusan Koso. 1997 Dec;42(17 Suppl):2814-21. Tanpakushitsu Kakusan Koso. 1997. PMID: 9455198 Review. Japanese. No abstract available.
Cited by
-
In silico cloning of novel endothelial-specific genes.Genome Res. 2000 Nov;10(11):1796-806. doi: 10.1101/gr.150700. Genome Res. 2000. PMID: 11076864 Free PMC article.
-
Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription.Proc Natl Acad Sci U S A. 2002 Apr 30;99(9):6152-6. doi: 10.1073/pnas.092140899. Epub 2002 Apr 23. Proc Natl Acad Sci U S A. 2002. PMID: 11972056 Free PMC article.
-
GBuilder--an application for the visualization and integration of EST cluster data.Genome Res. 2001 Jan;11(1):179-84. doi: 10.1101/gr.157501. Genome Res. 2001. PMID: 11156627 Free PMC article.
-
Overview of DNA microarrays: types, applications, and their future.Curr Protoc Mol Biol. 2013 Jan;Chapter 22:Unit 22.1.. doi: 10.1002/0471142727.mb2201s101. Curr Protoc Mol Biol. 2013. PMID: 23288464 Free PMC article.
-
Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression.Genome Res. 1999 Oct;9(10):950-9. doi: 10.1101/gr.9.10.950. Genome Res. 1999. PMID: 10523523 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials