The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species

J Quackenbush¹, J Cho, D Lee, F Liang, I Holt, S Karamycheva, B Parvizi, G Pertea, R Sultana, J White

Affiliations

PMID: 11125077
PMCID: PMC29813
DOI: 10.1093/nar/29.1.159

The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species

J Quackenbush et al. Nucleic Acids Res. 2001.

. 2001 Jan 1;29(1):159-64.

doi: 10.1093/nar/29.1.159.

Authors

J Quackenbush¹, J Cho, D Lee, F Liang, I Holt, S Karamycheva, B Parvizi, G Pertea, R Sultana, J White

Affiliation

¹ The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA. johnq@tigr.org

PMID: 11125077
PMCID: PMC29813
DOI: 10.1093/nar/29.1.159

Abstract

While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi. shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.

PubMed Disclaimer

Figures

**Figure 1**
An example THC from the Human Gene Index. The consensus sequence is presented in FASTA format below which the locations of the gene sequences (red) and ESTs that comprise the assembly are shown with their respective locations within the assembly. Links are provided to GenBank records, internal data for all ESTs sequenced at TIGR and to clones available through the ATCC. This THC has been assigned a putative ID of ‘insulin receptor inhibitor, muscle’ as it contains a HT853 (as well as gene sequences from GenBank).

**Figure 2**
An example TOG from the TOGA database. The human, mouse and rat TCs all contain annotated genes; those in mouse and rat have been identified as ‘bithoraxoid-like protein’ while the human gene is simply annotated as ‘HSPC162’ and the cattle TC consists only of ESTs. The stringent overlap criteria used to construct the TOGs makes it unlikely that these matches are spurious and provides putative functional annotation for the previously unclassified human and bovine gene and EST sequences.

**Figure 3**
Alignment of TCs from the TIGR Plant Gene Indices with the sequence of *Arabidopsis thaliana* Chromsome II. The coding sequence of a putative casein kinase II catalytic subunit shows significant homology to the same gene in other plants as is evident from an alignment between the *Arabidopsis* genomic sequence and the various plant TCs. This gene is well conserved across both monocots and dicots. The multiple hits seen in some species may represent paralogs, gene families, alternative splice forms or partial TC assemblies.

See this image and copyright information in PMC

References

1. Adams M.D., Kelley,J.M., Gocayne,J.D., Dubnick,M., Polymeropoulos,M.H.M., Xiao,H., Merril,C.R., Wu,A., Olde,B., Moreno,R.F. et al. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252, 1651–1656. - PubMed
1. Boguski M.S. and Schuler,G.D. (1995) ESTablishing a human transcript map. Nature Genet., 10, 369–371. - PubMed
1. Burke J., Wang,H., Hide,W. and Davison,D.B. (1998) Alternative gene form discovery and candidate gene selection from gene indexing projects. Genome Res., 8, 276–290. - PMC - PubMed
1. Quackenbush J., Liang,F., Holt,I., Pertea,G. and Upton,J. (2000). The TIGR Gene Indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res., 28, 141–145. - PMC - PubMed
1. Liang F., Holt,I., Pertea,G., Karamycheva,S., Salzberg,S.L. and Quackenbush,J. (2000) An optimized protocol for analysis of EST sequences. Nucleic Acids Res., 28, 3657–3665. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- BioCyc
- Mouse Genome Informatics (MGI)
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species

Affiliation

The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials