Computational prediction of transcription-factor binding site locations

Martha L Bulyk¹

Affiliations

Affiliation

¹ Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, New Research Building, 77 Avenue Louis Pasteur, Boston, MA 02115, USA. mlbulyk@rascal.med.harvard.edu

PMID: 14709165
PMCID: PMC395725
DOI: 10.1186/gb-2003-5-1-201

Review

Computational prediction of transcription-factor binding site locations

Martha L Bulyk. Genome Biol. 2003.

. 2003;5(1):201.

doi: 10.1186/gb-2003-5-1-201. Epub 2003 Dec 23.

Author

Martha L Bulyk¹

Affiliation

¹ Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, New Research Building, 77 Avenue Louis Pasteur, Boston, MA 02115, USA. mlbulyk@rascal.med.harvard.edu

PMID: 14709165
PMCID: PMC395725
DOI: 10.1186/gb-2003-5-1-201

Abstract

Identifying genomic locations of transcription-factor binding sites, particularly in higher eukaryotic genomes, has been an enormous challenge. Various experimental and computational approaches have been used to detect these sites; methods involving computational comparisons of related genomes have been particularly successful.

PubMed Disclaimer

Figures

**Figure 1**
Representation of transcription-factor binding sites. **(a)** An example of six sequences and the consensus sequence that can be derived from them. The consensus simply gives the nucleotide that is found most often in each position; the alternate (or degenerate) consensus sequence gives the possible nucleotides in each position; R represents A or G; N represents any nucleotide. **(b)** A position weight matrix for the -10 region of *E. coli* promoters, as an example of a well-studied regulatory element. The boxed elements correspond to the consensus sequence (TATAAT). The score for each nucleotide at each position is derived from the observed frequency of that nucleotide at the corresponding position in the input set of promoters. The score for any particular site is the sum of the individual matrix values for that site's sequence; for example, the score for TATAAT is 85. Note that the matrix values in (b) do not come from the example shown in (a) but rather are derived from a much larger collection of -10 promoter regions. Adapted, with permission, from [3].

**Figure 2**
Sequence comparison of the *GAL1-GAL10* intergenic region across four yeast species. Scer, *S. cerevisiae*; Spar, *S. paradoxus*; Smik, *S. mikatae*; Sbay, *S. bayanus*. Arrows indicate the start and transcriptional orientation of the *GAL1* and *GAL10* open reading frames; dashes in the alignment indicate gaps; nucleotide positions conserved across all four species are denoted by asterisks. Stretches of conserved nucleotides are underlined, and experimentally validated transcription-factor binding-site footprints are boxed and labeled with the name of the footprinted transcription factor. Underlined regions that are not boxed correspond to potential, previously unknown, transcription-factor binding sites. Note that not all nucleotide positions of a footprinted binding site are necessarily conserved across all four species in this comparison (note the Mig1 sites, for example). The nucleotides matching the published Gal4 binding-site motif are in gray; for the fourth Gal4 site, non-standard consensus motif nucleotides are shown in boldface. Reproduced with permission from [99].

See this image and copyright information in PMC

References

1. Collins F, Green E, Guttmacher A, Guyer M, US National Human Genome Institute A vision for the future of genomics research. Nature. 2003;422:835–847. - PubMed
1. Lockhart D, Winzeler E. Genomics, gene expression and DNA arrays. Nature. 2000;405:827–836. - PubMed
1. Stormo G. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
1. Cliften P, Hillier L, Fulton L, Graves T, Miner T, Gish W, Waterston R, Johnston M. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 2001;11:1175–1186. - PubMed
1. Oliphant A, Brandl C, Struhl K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol Cell Biol. 1989;9:2944–2949. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational prediction of transcription-factor binding site locations

Affiliation

Computational prediction of transcription-factor binding site locations

Author

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources