Analysis of EST-driven gene annotation in human genomic sequence
- PMID: 9548972
- DOI: 10.1101/gr.8.4.362
Analysis of EST-driven gene annotation in human genomic sequence
Abstract
We have performed a systematic analysis of gene identification in genomic sequence by similarity search against expressed sequence tags (ESTs) to assess the suitability of this method for automated annotation of the human genome. A BLAST-based strategy was constructed to examine the potential of this approach, and was applied to test sets containing all human genomic sequences longer than 5 kb in public databases, plus 300 kb of exhaustively characterized benchmark sequence. At high stringency, 70%-90% of all annotated genes are detected by near-identity to EST sequence; >95% of ESTs aligning with well-annotated sequences overlap a gene. These ESTs provide immediate access to the corresponding cDNA clones for follow-up laboratory verification and subsequent biologic analysis. At lower stringency, up to 97% of annotated genes were identified by similarity to ESTs. The apparent false-positive rate rose to 55% of ESTs among all sequences and 20% among benchmark sequences at the lowest stringency, indicating that many genes in public database entries are unannotated. Approximately half of the alignments span multiple exons, and thus aid in the construction of gene predictions and elucidation of alternative splicing. In addition, ESTs from multiple cDNA libraries frequently cluster over genes, providing a starting point for crude expression profiles. Clone IDs may be used to form EST pairs, and particularly to extend models by associating alignments of lower stringency with high-quality alignments. These results demonstrate that EST similarity search is a practical general-purpose annotation technique that complements pattern recognition methods as a tool for gene characterization.
Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
A comparison of expressed sequence tags (ESTs) to human genomic sequences.Nucleic Acids Res. 1997 Apr 15;25(8):1626-32. doi: 10.1093/nar/25.8.1626. Nucleic Acids Res. 1997. PMID: 9092672 Free PMC article.
-
Identification of true EST alignments for recognising transcribed regions.Int J Data Min Bioinform. 2011;5(5):465-84. doi: 10.1504/ijdmb.2011.043029. Int J Data Min Bioinform. 2011. PMID: 22145529
-
Rapid in silico cloning of genes using expressed sequence tags (ESTs).Biotechnol Annu Rev. 2000;5:25-44. doi: 10.1016/s1387-2656(00)05031-6. Biotechnol Annu Rev. 2000. PMID: 10874996 Review.
-
GAIA: framework annotation of genomic sequence.Genome Res. 1998 Mar;8(3):234-50. doi: 10.1101/gr.8.3.234. Genome Res. 1998. PMID: 9521927 Review.
Cited by
-
Assessing the impact of human genome annotation choice on RNA-seq expression estimates.BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-14-S11-S8. Epub 2013 Nov 4. BMC Bioinformatics. 2013. PMID: 24564364 Free PMC article.
-
A computer program for aligning a cDNA sequence with a genomic DNA sequence.Genome Res. 1998 Sep;8(9):967-74. doi: 10.1101/gr.8.9.967. Genome Res. 1998. PMID: 9750195 Free PMC article.
-
Identification of novel human genes evolutionarily conserved in Caenorhabditis elegans by comparative proteomics.Genome Res. 2000 May;10(5):703-13. doi: 10.1101/gr.10.5.703. Genome Res. 2000. PMID: 10810093 Free PMC article.
-
Current methods of gene prediction, their strengths and weaknesses.Nucleic Acids Res. 2002 Oct 1;30(19):4103-17. doi: 10.1093/nar/gkf543. Nucleic Acids Res. 2002. PMID: 12364589 Free PMC article. Review.
-
PipMaker--a web server for aligning two genomic DNA sequences.Genome Res. 2000 Apr;10(4):577-86. doi: 10.1101/gr.10.4.577. Genome Res. 2000. PMID: 10779500 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials