OrthoList: a compendium of C. elegans genes with human orthologs

Daniel D Shaye¹, Iva Greenwald

Affiliations

PMID: 21647448
PMCID: PMC3102077
DOI: 10.1371/journal.pone.0020085

OrthoList: a compendium of C. elegans genes with human orthologs

Daniel D Shaye et al. PLoS One. 2011.

. 2011;6(5):e20085.

doi: 10.1371/journal.pone.0020085. Epub 2011 May 25.

Authors

Daniel D Shaye¹, Iva Greenwald

Affiliation

¹ Howard Hughes Medical Institute, Columbia University, College of Physicians and Surgeons, New York, New York, United States of America. ds451@columbia.edu

PMID: 21647448
PMCID: PMC3102077
DOI: 10.1371/journal.pone.0020085

Erratum in

PLoS One. 2014;9(1). doi:10.1371/annotation/f5ffb738-a176-4a43-b0e0-249cdea45fe0

Abstract

Background: C. elegans is an important model for genetic studies relevant to human biology and disease. We sought to assess the orthology between C. elegans and human genes to understand better the relationship between their genomes and to generate a compelling list of candidates to streamline RNAi-based screens in this model.

Results: We performed a meta-analysis of results from four orthology prediction programs and generated a compendium, "OrthoList", containing 7,663 C. elegans protein-coding genes. Various assessments indicate that OrthoList has extensive coverage with low false-positive and false-negative rates. Part of this evaluation examined the conservation of components of the receptor tyrosine kinase, Notch, Wnt, TGF-ß and insulin signaling pathways, and led us to update compendia of conserved C. elegans kinases, nuclear hormone receptors, F-box proteins, and transcription factors. Comparison with two published genome-wide RNAi screens indicated that virtually all of the conserved hits would have been obtained had just the OrthoList set (∼38% of the genome) been targeted. We compiled Ortholist by InterPro domains and Gene Ontology annotation, making it easy to identify C. elegans orthologs of human disease genes for potential functional analysis.

Conclusions: We anticipate that OrthoList will be of considerable utility to C. elegans researchers for streamlining RNAi screens, by focusing on genes with apparent human orthologs, thus reducing screening effort by ∼60%. Moreover, we find that OrthoList provides a useful basis for annotating orthology and reveals more C. elegans orthologs of human genes in various functional groups, such as transcription factors, than previously described.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Comparison of four orthology prediction programs queried for *C. elegans* orthologs of human proteins.**
This diagram is modified from VENNY (see Materials and Methods). Each program is named above the oval representing its results, with the number of *C. elegans* orthologs and in-paralogs found by the program shown. The table gives an overall measure of how many genes were found by one or more programs (regardless of which one(s) found them). The numbers in the overlapping and non-overlapping areas of the Venn diagram indicate how many genes were found by overlapping or unique sets of programs. The font size used for these numbers indicate how many programs that number of genes was found by: numbers corresponding to genes found by a single program are shown smallest, whereas the largest font denotes the number of genes found by all programs. The data underlying this diagram can be seen in Table S1. A measure of the similarity and divergence between programs can be found in Table S2.

**Figure 2. Examining OrthoList specificity and sensitivity.**
Venn diagrams comparing *C. elegans* gene families, and their previously defined conserved subsets, to members of the same families found in OrthoList (see Materials and Methods). For each family the overlap between OrthoList and the previously defined conserved subset is shown (the percentage refers to how well covered by OrthoList the conserved subset is). The possible homologs missing from OrthoList (putative false-negatives, shown above each overlap) and those found in OrthoList not previously defined as homologs (putative false-positives, shown below each overlap) are based on homology assignments of the original compendia for each of these families (see Materials and Methods, and –[28], [56]). As discussed in the main text, the number of false-negatives and false-positives may actually be lower. A) Kinases (see Table S3C for source data). B) NHRs (see Table S4A for source data). C) F-box proteins (see Table S4B for source data).

**Figure 3. OrthoList coverage of conserved signaling pathways.**
Genes in bold are found by at least one orthology-predicting program, and thus included in OrthoList. The source data for this figure can be found in Table S5. A) RTK/Ras/MAPK pathway (reviewed in [34]). Note that *ras-1* and *ras-2* have not been defined functionally, although they are highly conserved. B) Notch pathway (reviewed in reviewed in [32]). C) TGF-ß pathway (reviewed in [33]). We note that *tag-68*, was only defined by conservation and no phenotype has been associated with its loss. D) Wnt pathway (reviewed in [30]). Note that our analysis was restricted to the conserved, canonical Wnt pathway. E) Insulin pathway. We specifically highlight the six insulins (*daf-28*, *ins-1*, *ins-4*, *ins-6*, *ins-*7 and *ins-*8), out of forty, that have been found (by overexpression, biochemical methods, RNAi or by existence of a semi-dominant allele) to be functional (reviewed in [31]).

**Figure 4. OrthoList coverage of hits from genome-wide RNAi screens.**
Venn diagrams analyzing hits obtained from RNAi screens examining (A) cell division and (B) endocytic/secretory trafficking . For each screen the overlap between OrthoList and the orthologous subset of hits is shown (percentage refers to how well covered by OrthoList this conserved subset is). Orthology assignments for hits missing from OrthoList (shown above overlap) and those found in OrthoList that were not called homologs in the original publications (shown below overlap) were confirmed by TreeFam and/or RBH (see Materials and Methods). Source data for these diagrams can be found in Tables S6 and S8.

**Figure 5. OrthoList coverage of a transcription factor compendium.**
A) Venn diagram comparing the wTF2.0 compendium to OrthoList. Source data for this diagram is found in Tables S9A, B. We find that OrthoList contains ∼98% of TFs previously scored as having human orthologs (overlap). In addition, OrthoList contains 182 TFs not scored in wTF2.0 as having orthologs. B) Distribution of TFs found to have orthologs by OrthoList, but not by wTF2.0. Source data for this table is found in Tables S9C–E.

See this image and copyright information in PMC

References

1. Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, et al. Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project. Science 2010 - PMC - PubMed
1. Fitch WM. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. - PubMed
1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
1. Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001;314:1041–1052. - PubMed
1. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005;39:309–338. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

OrthoList: a compendium of C. elegans genes with human orthologs

Affiliation

OrthoList: a compendium of C. elegans genes with human orthologs

Authors

Affiliation

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources