Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Mar 12:5:27.
doi: 10.1186/1471-2105-5-27.

CisOrtho: a program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting

Affiliations

CisOrtho: a program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting

Henry R Bigelow et al. BMC Bioinformatics. .

Abstract

Background: All known genomes code for a large number of transcription factors. It is important to develop methods that will reveal how these transcription factors act on a genome wide level, that is, through what target genes they exert their function.

Results: We describe here a program pipeline aimed at identifying transcription factor target genes in whole genomes. Starting from a consensus binding site, represented as a weight matrix, potential sites in a pre-filtered genome are identified and then further filtered by assessing conservation of the putative site in the genome of a related species, a process called phylogenetic footprinting. CisOrtho has been successfully used to identify targets for two homeodomain transcription factors in the genomes of the nematodes Caenorhabditis elegans and Caenorhabditis briggsae.

Conclusions: CisOrtho will identify targets of other nematode transcription factors whose DNA binding specificity is known and can be easily adapted to search other genomes for transcription factor targets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart of program pipeline. Information is shown as rectangles, procedures as ovals. The only user defined inputs are the Transcription Factor Binding Site Alignment file and the number of hits to retrieve. All other input files are downloaded from sources mentioned in the text.
Figure 2
Figure 2
Screenshot of the Web Interface. The address is: . The program will be eventually run by WormBase at .
Figure 3
Figure 3
Classification of non-exonic regions. A hypothetical gene arrangement is shown. "5' intergenic": between exon1 and exon1 of two separate genes; "3' intergenic": between the last exon of both genes. "5'/3' intergenic": between first exon of one gene and last exon of the other gene; "intronic#": between any two exons of one gene; "other": all other possible combinations. In cases where the gene flanking a segment is known to exhibit alternative splicing, the segment was prefixed with 'alt_', i.e. 'alt_intronic#', 'alt_3'intergenic', etc. Two other categories, BEGIN and END, denote regions at the beginning or ending of the chromosome, in the case of C. elegans, or of the sequencing reads in the case of C. briggsae. There were two exceptions to the procedure. The first was due to the fact that the C. briggsae genome we used was an unassembled collection of 578 individual sequence reads. 112 of these reads had no exon annotations, and were ignored in this study. Of these 112, only two were greater than 10,000 bases long, with an average length of 3679.3 nucleotides. Secondly, there were 16 C. elegans and 35 C. briggsae exon annotations one nucleotide long. By visual inspection, we determined that for C. elegans these exons were in fact longer than one nucleotide, but noncoding: in all cases the single nucleotide is 'A' and when spliced forms a TGA stop codon. They were treated as non-existent for this study, which has very little effect on the procedure except that the last true intron of the gene will be considered its 3' region. For C. briggsae, they appear to be errors in the gene annotations and fall within introns. Thus, they were treated as part of the intron in which they occur.
Figure 4
Figure 4
Output of the program pipeline. Hits of a search with the TTX-3 consensus binding site is shown. num: number in list. mis: Number of base mismatches between first C. elegans and first C. briggsae hits. segtype: Type of non-exonic region (see Figure 3). str1/2: negative (N) or positive (P), strand on which the first/second of the two genes that flank the identified target site are located; offset1/2: distance of the target site to the flanking gene(s) (in relation to the start codon if the target site is 5' or located in an intron; in relation to the stop codon if the site is 3' to the gene; in the latter two cases, the number has a positive value); ID: cosmid name of the flanking genes, name: flanking gene names (if available). Gene IDs/names are linked to the WormBase gene model at , which contains further information about the gene. In case there are multiple target sites located in a defined inter/intragenic region, there is an option to report the n highest scoring hits for each ortholog. If this option is used, the top-scoring C. elegans or C. briggsae hit in each hit-pair will be highlighted, and the next n-1 hits will be gray. Color coding: Orthologous C. elegans/C. briggsae genes ("hit-pairs") are color coded in blue (Y39A3B.5 and CBG15122 are orthologs) and green (M01E10.2 and CBG15118 are orthologs).

References

    1. Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003;5:201. doi: 10.1186/gb-2003-5-1-201. - DOI - PMC - PubMed
    1. Sosinsky A, Bonin CP, Mann RS, Honig B. Target Explorer: An automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. 2003;31:3589–3592. doi: 10.1093/nar/gkg544. - DOI - PMC - PubMed
    1. Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci U S A. 2002;99:757–762. doi: 10.1073/pnas.231608898. - DOI - PMC - PubMed
    1. Tronche F, Ringeisen F, Blumenfeld M, Yaniv M, Pontoglio M. Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. J Mol Biol. 1997;266:231–245. doi: 10.1006/jmbi.1996.0760. - DOI - PubMed
    1. Tagle DA, Koop BF, Goodman M, Slightom JL, Hess DL, Jones RT. Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J Mol Biol. 1988;203:439–455. - PubMed

Publication types

LinkOut - more resources