Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2003;5(1):201.
doi: 10.1186/gb-2003-5-1-201. Epub 2003 Dec 23.

Computational prediction of transcription-factor binding site locations

Affiliations
Review

Computational prediction of transcription-factor binding site locations

Martha L Bulyk. Genome Biol. 2003.

Abstract

Identifying genomic locations of transcription-factor binding sites, particularly in higher eukaryotic genomes, has been an enormous challenge. Various experimental and computational approaches have been used to detect these sites; methods involving computational comparisons of related genomes have been particularly successful.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Representation of transcription-factor binding sites. (a) An example of six sequences and the consensus sequence that can be derived from them. The consensus simply gives the nucleotide that is found most often in each position; the alternate (or degenerate) consensus sequence gives the possible nucleotides in each position; R represents A or G; N represents any nucleotide. (b) A position weight matrix for the -10 region of E. coli promoters, as an example of a well-studied regulatory element. The boxed elements correspond to the consensus sequence (TATAAT). The score for each nucleotide at each position is derived from the observed frequency of that nucleotide at the corresponding position in the input set of promoters. The score for any particular site is the sum of the individual matrix values for that site's sequence; for example, the score for TATAAT is 85. Note that the matrix values in (b) do not come from the example shown in (a) but rather are derived from a much larger collection of -10 promoter regions. Adapted, with permission, from [3].
Figure 2
Figure 2
Sequence comparison of the GAL1-GAL10 intergenic region across four yeast species. Scer, S. cerevisiae; Spar, S. paradoxus; Smik, S. mikatae; Sbay, S. bayanus. Arrows indicate the start and transcriptional orientation of the GAL1 and GAL10 open reading frames; dashes in the alignment indicate gaps; nucleotide positions conserved across all four species are denoted by asterisks. Stretches of conserved nucleotides are underlined, and experimentally validated transcription-factor binding-site footprints are boxed and labeled with the name of the footprinted transcription factor. Underlined regions that are not boxed correspond to potential, previously unknown, transcription-factor binding sites. Note that not all nucleotide positions of a footprinted binding site are necessarily conserved across all four species in this comparison (note the Mig1 sites, for example). The nucleotides matching the published Gal4 binding-site motif are in gray; for the fourth Gal4 site, non-standard consensus motif nucleotides are shown in boldface. Reproduced with permission from [99].

References

    1. Collins F, Green E, Guttmacher A, Guyer M, US National Human Genome Institute A vision for the future of genomics research. Nature. 2003;422:835–847. - PubMed
    1. Lockhart D, Winzeler E. Genomics, gene expression and DNA arrays. Nature. 2000;405:827–836. - PubMed
    1. Stormo G. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
    1. Cliften P, Hillier L, Fulton L, Graves T, Miner T, Gish W, Waterston R, Johnston M. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 2001;11:1175–1186. - PubMed
    1. Oliphant A, Brandl C, Struhl K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol Cell Biol. 1989;9:2944–2949. - PMC - PubMed

Publication types

LinkOut - more resources