Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 15:9:533.
doi: 10.1186/1471-2105-9-533.

GPAT: retrieval of genomic annotation from large genomic position datasets

Affiliations

GPAT: retrieval of genomic annotation from large genomic position datasets

Arnaud Krebs et al. BMC Bioinformatics. .

Abstract

Background: Recent genome wide transcription factor binding site or chromatin modification mapping analysis techniques, such as chromatin immunoprecipitation (ChIP) linked to DNA microarray analysis (ChIP on chip) or ChIP coupled to high throughput sequencing (ChIP-seq), generate tremendous amounts of genomic location data in the form of one-dimensional series of signals. After pre-analysis of these data (signal pre-clearing, relevant binding site detection), biologists need to search for the biological relevance of the detected genomic positions representing transcription regulation or chromatin modification events.

Results: To address this problem, we have developed a Genomic Position Annotation Tool (GPAT) with a simple web interface that allows the rapid and systematic labelling of thousands of genomic positions with several types of annotations. GPAT automatically extracts gene annotation information around the submitted positions from different public databases (Refseq or ENSEMBL). In addition, GPAT provides access to the expression status of the corresponding genes from either existing transcriptomic databases or from user generated expression data sets. Furthermore, GPAT allows the localisation of the genomic coordinates relative to the chromosome bands and the well characterised ENCODE regions. We successfully used GPAT to analyse ChIP on chip data and to identify genes functionally regulated by the TATA binding protein (TBP).

Conclusion: GPAT provides a quick, convenient and flexible way to annotate large sets of genomic positions obtained after pre-analysis of ChIP-chip, ChIP-seq or other high throughput sequencing-based techniques. Through the different annotation data displayed, GPAT facilitates the interpretation of genome wide datasets for molecular biologists.

PubMed Disclaimer

Figures

Figure 1
Figure 1
GPAT application flow chart: (A) Information flow of an annotation search in GPAT. (B) The three gene annotation search modes implemented in GPAT. The panel represents two transcription units oriented in opposite directions (orange boxes). The transcription start site (TSS) is symbolised by an arrow. User submitted positions are represented by vertical bars and the search window by open boxes. The colour of the vertical bar symbolizes the result of the GPAT search (green: annotation matched, red: not matched). The "direct search" mode searches the positions located inside a transcription unit. The "window search" mode allows the detection of transcription units located within a defined distance from the genomic positions. The "promoter search" mode allows the identification of transcription units having their TSS within a defined distance from the genomic positions. (C) Results table containing the annotated positions; links to UCSC genome browser and gene source information; global distribution profile of the matched genomic positions as compared to the TSSs of the corresponding genes and statistical values for the expression data of the corresponding genes (represented using a spreadsheet application).
Figure 2
Figure 2
Example of exploitation of the GPAT results: (A) Venn diagram showing the genes with a single occupancy by Pol II (red) or TBP (green) respectively or a co-occupancy (yellow). (B) Distribution of Pol II (blue) and TBP (red) binding sites relative to the 5' end of the matched transcript. The distribution patterns of both Pol II and TBP, but not GST, cluster within +/- 1 kb around the 5' end of the matched transcripts. (C) Distribution of the expression level in each gene category. The highest expression level is observed for genes where both Pol II and TBP were detected at the promoter. Furthermore, genes bound only by Pol II, but not TBP show a high level of expression, suggesting the possibility of TBP independent genes.

References

Publication types

LinkOut - more resources