Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Apr 1;26(7):860-6.
doi: 10.1093/bioinformatics/btq049. Epub 2010 Feb 10.

Assigning roles to DNA regulatory motifs using comparative genomics

Affiliations
Comparative Study

Assigning roles to DNA regulatory motifs using comparative genomics

Fabian A Buske et al. Bioinformatics. .

Abstract

Motivation: Transcription factors (TFs) are crucial during the lifetime of the cell. Their functional roles are defined by the genes they regulate. Uncovering these roles not only sheds light on the TF at hand but puts it into the context of the complete regulatory network.

Results: Here, we present an alignment- and threshold-free comparative genomics approach for assigning functional roles to DNA regulatory motifs. We incorporate our approach into the Gomo algorithm, a computational tool for detecting associations between a user-specified DNA regulatory motif [expressed as a position weight matrix (PWM)] and Gene Ontology (GO) terms. Incorporating multiple species into the analysis significantly improves Gomo's ability to identify GO terms associated with the regulatory targets of TFs. Including three comparative species in the process of predicting TF roles in Saccharomyces cerevisiae and Homo sapiens increases the number of significant predictions by 75 and 200%, respectively. The predicted GO terms are also more specific, yielding deeper biological insight into the role of the TF. Adjusting motif (binding) affinity scores for individual sequence composition proves to be essential for avoiding false positive associations. We describe a novel DNA sequence-scoring algorithm that compensates a thermodynamic measure of DNA-binding affinity for individual sequence base composition. GOMO's prediction accuracy proves to be relatively insensitive to how promoters are defined. Because GOMO uses a threshold-free form of gene set analysis, there are no free parameters to tune. Biologists can investigate the potential roles of DNA regulatory motifs of interest using GOMO via the web (http://meme.nbcr.net).

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Single-species Gomo prediction accuracy using transferred GO maps. Each point shows the average AUC50 of TF–GO term associations predicted by Gomo using the E.coli (a) or S.cerevisiae (b) GO map and TFs, and promoter sequences from the single given species. The AUC50 is computed using a single TF, then averaged over TFs. The X-axis shows the maximum upstream extent of promoter sequences, which are truncated at the first ORF. The inset shows the phylogenetic tree of the corresponding species. Branch lengths denote average substitutions per site.
Fig. 2.
Fig. 2.
Multiple-species Gomo prediction accuracy. Each point shows the average AUC50 of TF–GO term association predictions made by Gomo in the key species E.coli (a) or S.cerevisiae ( b). Points labeled ‘multiple-species’ use promoter sequences from the key species and three related species; Monkey results use Monkey (Moses et al., 2004) minimum P-value scores instead of Ama scores (Supplementary Material 1). Points labeled ‘single-species’ use promoter sequences from the key species only, and are shown for comparison. The AUC50 is computed using a single TF, then averaged over TFs. The X-axis shows the upstream extent of promoter sequences (‘full’), or the maximum upstream extent when they are truncated at the first ORF (‘intergenic’). For clarity, standard error bars are shown for the ‘full’ promoter sequence set only; standard error bars for the ‘intergenic’ promoter set are similar.

Similar articles

Cited by

References

    1. Ashburner M, et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Bailey TL, et al. MEME Suite: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–W208. - PMC - PubMed
    1. Barski A, Zhao K. Genomic location analysis by chip-seq. J. Cell Biochem. 2009;107:11–18. - PMC - PubMed
    1. Berger MF, Bulyk ML. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 2009;4:393–411. - PMC - PubMed
    1. Bodén M, Bailey TL. Associating transcription factor-binding site motifs with target GO terms and target genes. Nucleic Acids Res. 2008;36:4108–4117. - PMC - PubMed

Publication types