. 2012;13(3):R24.

doi: 10.1186/gb-2012-13-3-r24.

The transcription factor encyclopedia

Dimas Yusuf¹, Stefanie L Butland, Magdalena I Swanson, Eugene Bolotin, Amy Ticoll, Warren A Cheung, Xiao Yu Cindy Zhang, Christopher T D Dickman, Debra L Fulton, Jonathan S Lim, Jake M Schnabl, Oscar H P Ramos, Mireille Vasseur-Cognet, Charles N de Leeuw, Elizabeth M Simpson, Gerhart U Ryffel, Eric W-F Lam, Ralf Kist, Miranda S C Wilson, Raquel Marco-Ferreres, Jan J Brosens, Leonardo L Beccari, Paola Bovolenta, Bérénice A Benayoun, Lara J Monteiro, Helma D C Schwenen, Lars Grontved, Elizabeth Wederell, Susanne Mandrup, Reiner A Veitia, Harini Chakravarthy, Pamela A Hoodless, M Michela Mancarelli, Bruce E Torbett, Alison H Banham, Sekhar P Reddy, Rebecca L Cullum, Michaela Liedtke, Mario P Tschan, Michelle Vaz, Angie Rizzino, Mariastella Zannini, Seth Frietze, Peggy J Farnham, Astrid Eijkelenboom, Philip J Brown, David Laperrière, Dominique Leprince, Tiziana de Cristofaro, Kelly L Prince, Marrit Putker, Luis del Peso, Gieri Camenisch, Roland H Wenger, Michal Mikula, Marieke Rozendaal, Sylvie Mader, Jerzy Ostrowski, Simon J Rhodes, Capucine Van Rechem, Gaylor Boulay, Sam W Z Olechnowicz, Mary B Breslin, Michael S Lan, Kyster K Nanan, Michael Wegner, Juan Hou, Rachel D Mullen, Stephanie C Colvin, Peter John Noy, Carol F Webb, Matthew E Witek, Scott Ferrell, Juliet M Daniel, Jason Park, Scott A Waldman, Daniel J Peet, Michael Taggart, Padma-Sheela Jayaraman, Julien J Karrich, Bianca Blom, Farhad Vesuna, Henriette O'Geen, Yunfu Sun, Richard M Gronostajski, Mark W Woodcroft, Margaret R Hough, Edwin Chen, G Nicholas Europe-Finner, Magdalena Karolczak-Bayatti, Jarrod Bailey, Oliver Hankinson, Venu Raman, David P LeBrun, Shyam Biswal, Christopher J Harvey, Jason P DeBruyne, John B Hogenesch, Robert F Hevner, Christophe Héligon, Xin M Luo, Marissa Cathleen Blank, Kathleen Joyce Millen, David S Sharlin, Douglas Forrest, Karin Dahlman-Wright, Chunyan Zhao, Yuriko Mishima, Satrajit Sinha, Rumela Chakrabarti, Elodie Portales-Casamar, Frances M Sladek, Philip H Bradley, Wyeth W Wasserman

Affiliations

Affiliation

¹ Department of Medical Genetics, Faculty of Medicine, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, Canada.

PMID: 22458515
PMCID: PMC3439975
DOI: 10.1186/gb-2012-13-3-r24

The transcription factor encyclopedia

Dimas Yusuf et al. Genome Biol. 2012.

. 2012;13(3):R24.

doi: 10.1186/gb-2012-13-3-r24.

Authors

Affiliation

¹ Department of Medical Genetics, Faculty of Medicine, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, University of British Columbia, Vancouver, Canada.

PMID: 22458515
PMCID: PMC3439975
DOI: 10.1186/gb-2012-13-3-r24

Abstract

Here we present the Transcription Factor Encyclopedia (TFe), a new web-based compendium of mini review articles on transcription factors (TFs) that is founded on the principles of open access and collaboration. Our consortium of over 100 researchers has collectively contributed over 130 mini review articles on pertinent human, mouse and rat TFs. Notable features of the TFe website include a high-quality PDF generator and web API for programmatic data retrieval. TFe aims to rapidly educate scientists about the TFs they encounter through the delivery of succinct summaries written and vetted by experts in the field. TFe is available at http://www.cisreg.ca/tfe.

PubMed Disclaimer

Figures

**Figure 1**
**New journal articles associated with human or mouse TFs**. Over the past five years, 216,421 journal articles associated with human or mouse genes have been published and indexed in NCBI PubMed. This amount represents 5.59% of all articles published and indexed during the same time frame (3,871,190 articles). Out of the 216,421 articles associated with human or mouse genes, at least 34,943 are associated with human or mouse TFs, or 16.15%. This is astounding when considering that known TFs represent only 5% of the genome. The proportion of journal articles associated with TFs has also been rising steadily over the past five years - from 15.47% in 2005 to 16.81% in 2009. These figures were determined with a conservative set of approximately 3,200 human and mouse TF genes derived from the works of Fulton *et al*. [4] and Vaquerizas *et al*. [5] and the publicly available 'gene2pubmed' annotation from NCBI.

**Figure 2**
**Released mini review articles**. These mini review articles - listed in alphabetical order - are those that have been sufficiently completed and released by their respective authors. These articles can be accessed at [43].

**Figure 3**
**Worldwide distribution of authors by country**. The TFe consortium comprises 114 authors from 13 countries. The exact distribution is as follows: Australia, 2; Canada, 25; Denmark, 2; France, 6; Germany, 2; Italy, 2; Poland, 2; Spain, 4; Sweden, 2; Switzerland, 4; the Netherlands, 4; the United Kingdom, 14; and the United States, 45.

**Figure 4**
**TFe user interface**. Shown in this figure are three screenshots from the TFe web-based user interface. Built around HTML, JavaScript and CSS standards, the TFe user interface is a quick and powerful method of viewing, downloading, and editing TFe data. Pages visualized in this figure are: (a) the home page; (b) the article page; and (c) the classification page.

**Figure 5**
**Tour of the user interface**. (A) The project logo links back to the homepage. (B) The 'quick search' and 'sign in' widgets are conveniently placed near the top of the page. (C) The vertical site navigation bar offers fast access to all available pages in the site. (D) The official symbol, name, and authors are prominently placed to immediately grab the user's attention. Beneath the authors' names is the date of the most recent revision. (E) When available, a thumbnail of the structural prediction rendering is displayed in the header area. (F) Two drop-down menus provide easy access to the top ten most recently visited and updated articles. (G) Vital information on the TF, such as its classification, homologs, genomic links, and synonyms, occupy the top right corner of each page. (H) An article completion score bar provides immediate feedback to the author on the progress of their articles. (I) Articles in TFe are organized into ten tabs. Immediately underneath, the tabs are links to data downloads in PDF and Excel file formats. A 'view content, comments' toggle allows the user to view comments that have been attached to the article. By default, comments are hidden from sight. (J) Most tabbed sections start with an author-contributed 'summary' paragraph that ranges in length from 150 to 500 words.

**Figure 6**
**Content available in TFe**. This diagram demonstrates the diverse range of TF-related content available in TFe. Articles in TFe are organized into ten tabs. In this diagram, the ten tabs are represented by the ten horizontal columns labeled 'Summary', 'Structure', 'TFBS', and so forth. Under each tab in the article, there exist one or more relevant subheadings. In this diagram, these subheadings are represented by beige or grey boxes, which contain partial screenshots of the actual content - whether they are text, figures, or tables. Beige boxes represent content that has been composed by TFe authors, while grey boxes represent content that has been largely automatically populated. Below each screenshot box is the name of the subheading and a brief description of the subheading. Below the description are a series of blue, red, green, and yellow icons labeled 'WEB', 'PDF', 'XLS', and 'API'. As the names suggest, these icons indicate whether the content of that particular subheading is available in various formats. All subheadings are available in web format - on the TFe website. Thus, we consider the TFe website format as the most comprehensive format available. Select content is available in redacted form in the PDF format. Content that is in the form of 'data' can be downloaded as an Excel spreadsheet ('XLS') or retrieved using the TFe web API ('API') from the TFe website.

**Figure 7**
**Structural predictions of TF DNA binding domains**. To date, we have created 212 structural predictions of the active sites of select TFs in TFe. We focused on TFs for which a structural prediction is most feasible and whose articles are nearing completion. These predictions were generated with an in-house, custom-made pipeline that finds the most similar, experimentally determined protein structure for each unsolved TF, and uses that experimentally determined 'template' to guide the prediction of the unknown structure.

**Figure 8**
**Format of the PDF article**. The PDF mini summaries are composed of four pages. The first page features basic information such as the TF name, gene identifiers and classification, as well as author information. Also on the first page are the names and affiliations of the authors, an overview of the TF, an image of its active site protein structure accompanied by a brief commentary, and a featured TF binding profile selected by the author. The second and third pages contain a mixture of figures, paragraph text, and tables of genomic targets and protein as well as ligand interactors. The last page contains two brief paragraphs, a MeSH cloud, and selected references. These are the first two pages of a four-page PDF mini summary generated by the TFe system software. Our PDF creation tool, based on in-house code and the dompdf 0.5.1 open source module, is able to format a TFe article of any length and annotation depth as a standardized four-page PDF article. A fuzzy logic algorithm does all of the modifications necessary to make the conversion. These modifications may include changing the sizes of the figures, truncating excess text, reformatting the references, and calculating trade-offs between having larger figures and data tables at the expense of less text, or keeping more text at the expense of having fewer figures and smaller data tables.

**Figure 9**
**Using the TFe web API**. Adventures in bioinformatics often involve large amounts of data retrieval and computation not amenable to manual labor. Thus, in place of humans, software is written to automate the grunt work, which may include computing vast quantities of data or obtaining large amounts of information from resources in the cloud, such as NCBI. To give researchers the option to retrieve data from TFe in an automated fashion, we have implemented a simple yet powerful web API. This figure provides a summary of what a data transaction may look like when using the TFe web API. In this case, the goal of the data retrieval exercise is to obtain all MeSH disease terms associated with the transcription factor 'ATF3'.

**Figure 10**
**The completion scores of authored articles in TFe**. The y-axis of this graph is the article completion score (ACS), while the bars on the x-axis represent the 176 authored TF articles in TFe (some of which are still works-in-progress), ordered such that higher scoring articles are positioned on the right (for clarity). In this graph, the completion scores of the 176 articles from three different periods - Q2 2009, Q4 2009, and Q2 2010 - are superimposed to demonstrate that the scores have been increasing over time. Within six months of the implementation of the ACS system in Q2 2009, the completion scores of authored TFe articles have increased from 40.6% to 60.2%, thus attesting to the effectiveness of this feedback mechanism (see Q2 2009 versus Q4 2009).

**Figure 11**
**Software architecture**. This schematic demonstrates the conceptual structure of the TFe software. Written mainly in the Perl programming language, the software is essentially a collection of Perl 'scripts' that runs on an Apache web server, in a UNIX-compatible environment. The software relies on MySQL for data storage, and a number of third party modules. Over 40 'front line' scripts (shown as the red rectangle) generate individual pages such as the home page and article page. These front line scripts are backed by a cluster of three TFe Perl modules (shown as the green circles): (1) the 'database updater', which is summoned *pro re nata* whenever the TFe database needs to be maintained or updated with new content from external sources such as NCBI; (2) the 'main module', which contains shared subroutines such as those that generate page headers; and (3) the 'database handler', which forms the gateway between all components of the TFe software and the TFe database. The database (shown as the yellow cylinder) is stored on a separate database server and communicates with the rest of the TFe software via fiber optic. It contains cached copies of third party resources so that the TFe software does not have to constantly retrieve data from the 'cloud'. This optimizes performance. The web API (shown as the purple rectangle) is directly connected to the ultra small and efficient database handler module. In bypassing activation of the large main module and database updater, the web API is able to run faster than the web-based interface. GO, Gene Ontology; MGI, Mouse Genome Informatics.

See this image and copyright information in PMC

References

1. Brand-Saberi B. Genetic and epigenetic control of skeletal muscle development. Ann Anat. 2005;187:199–207. doi: 10.1016/j.aanat.2004.12.018. - DOI - PubMed
1. Balsamo A, Cicognani A, Gennari M, Sippell WG, Menabo S, Baronio F, Riepe FG. Functional characterization of naturally occurring NR3C2 gene mutations in Italian patients suffering from pseudohypoaldosteronism type 1. Eur J Endocrinol. 2007;156:249–256. doi: 10.1530/eje.1.02330. - DOI - PubMed
1. Field JK, Spandidos DA. The role of ras and myc oncogenes in human solid tumours and their relevance in diagnosis and prognosis (review). Anticancer Res. 1990;10:1–22. - PubMed
1. Fulton DL, Sundararajan S, Badis G, Hughes TR, Wasserman WW, Roach JC, Sladek R. TFCat: The Curated Catalog of Mouse and Human Transcription Factors. Genome Biol. 2009;10:R29. doi: 10.1186/gb-2009-10-3-r29. - DOI - PMC - PubMed
1. Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM. A census of human transcription factors: function, expression and evolution. Nat Rev Genet. 2009;10:252–263. doi: 10.1038/nrg2538. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The transcription factor encyclopedia

Affiliation

The transcription factor encyclopedia

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources