Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Oct 6:5:143.
doi: 10.1186/1471-2105-5-143.

Identitag, a relational database for SAGE tag identification and interspecies comparison of SAGE libraries

Affiliations
Comparative Study

Identitag, a relational database for SAGE tag identification and interspecies comparison of SAGE libraries

Céline Keime et al. BMC Bioinformatics. .

Abstract

Background: Serial Analysis of Gene Expression (SAGE) is a method of large-scale gene expression analysis that has the potential to generate the full list of mRNAs present within a cell population at a given time and their frequency. An essential step in SAGE library analysis is the unambiguous assignment of each 14 bp tag to the transcript from which it was derived. This process, called tag-to-gene mapping, represents a step that has to be improved in the analysis of SAGE libraries. Indeed, the existing web sites providing correspondence between tags and transcripts do not concern all species for which numerous EST and cDNA have already been sequenced.

Results: This is the reason why we designed and implemented a freely available tool called Identitag for tag identification that can be used in any species for which transcript sequences are available. Identitag is based on a relational database structure in order to allow rapid and easy storage and updating of data and, most importantly, in order to be able to precisely define identification parameters. This structure can be seen like three interconnected modules : the first one stores virtual tags extracted from a given list of transcript sequences, the second stores experimental tags observed in SAGE experiments, and the third allows the annotation of the transcript sequences used for virtual tag extraction. It therefore connects an observed tag to a virtual tag and to the sequence it comes from, and then to its functional annotation when available. Databases made from different species can be connected according to orthology relationship thus allowing the comparison of SAGE libraries between species. We successfully used Identitag to identify tags from our chicken SAGE libraries and for chicken to human SAGE tags interspecies comparison. Identitag sources are freely available on http://pbil.univ-lyon1.fr/software/identitag/ web site.

Conclusions: Identitag is a flexible and powerful tool for tag identification in any single species and for interspecies comparison of SAGE libraries. It opens the way to comparative transcriptomic analysis, an emerging branch of biology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Identitag relational schema. This figure provides a schematic view of the Identitag tables and their relationships. Identitag can be depicted as three interconnected modules represented in this figure. For a more precise description see data dictionary (available on Identitag web site). The term "Species" could be replaced by any specific species for which transcript sequences are available. The different sources of information needed for completing Identitag are also shown. The minimum number of files consist of : one file containing tag sequences (extracting from ditag concatemers with a software like SAGE 2000), a Fasta file containing transcript sequences from the species considered and a file containing results of their comparison with protein databanks (using BLASTX).
Figure 2
Figure 2
Identitag for interspecies comparison of SAGE libraries. A : General structure behind the process of interspecies comparison of SAGE libraries. B : Detail of the connection between two Identitag databases for generating a tool for SAGE libraries interspecies comparison (example provided for a chicken to human comparison).
Figure 3
Figure 3
The orthology relationship. A. Design of the orthology relationship. Step 1 : Two reciprocal TBLASTX for comparing species A and species B transcript sequences. Step 2 : We conserve only the pairs of transcript sequences originating consistent TBLASTX(1) and TBLASTX(2) results. Step 3 : We consider previously obtained pairs in order to limit erroneous assignment of orthologous pairs for paralogous ones. B. The best reciprocal TBLASTX hits might correspond to paralogs. This figure provides an example of a phylogenetic tree where the best reciprocal TBLASTX hits correspond to paralogs because several transcript sequences are unknown (represented with dotted lines). To avoid such erroneous assignment of orthologous pairs we followed reciprocal best BLAST by another step (figure 3A, step 3) considering that even if the transcript sequences A1 and B2 are unknown, one of their corresponding proteins might be in a protein databank.
Figure 4
Figure 4
Identitag as a tag identifier. A : An example of identification process using Identitag. This process was used to identify SAGE tags from four chicken libraries ([13]; S. Dazy et al, in preparation).
Figure 5
Figure 5
Repartition of the different identification situations. Repartition of the different situations exemplified in table 1 on 6440 different tags obtained in the total of four chicken SAGE libraries ([13]; S. Dazy et al., in preparation).

References

    1. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science. 1995;270:484–487. - PubMed
    1. SAGEmap http://www.ncbi.nlm.nih.gov/SAGE/index.cgi
    1. SAGE Genie http://cgap.nci.nih.gov/SAGE
    1. Melbourne Brain Genome Project http://www.mbgproject.org/
    1. Mouse SAGE site http://mouse.biomed.cas.cz/sage/

Publication types