Using context to improve protein domain identification
- PMID: 21453511
- PMCID: PMC3090354
- DOI: 10.1186/1471-2105-12-90
Using context to improve protein domain identification
Abstract
Background: Identifying domains in protein sequences is an important step in protein structural and functional annotation. Existing domain recognition methods typically evaluate each domain prediction independently of the rest. However, the majority of proteins are multidomain, and pairwise domain co-occurrences are highly specific and non-transitive.
Results: Here, we demonstrate how to exploit domain co-occurrence to boost weak domain predictions that appear in previously observed combinations, while penalizing higher confidence domains if such combinations have never been observed. Our framework, Domain Prediction Using Context (dPUC), incorporates pairwise "context" scores between domains, along with traditional domain scores and thresholds, and improves domain prediction across a variety of organisms from bacteria to protozoa and metazoa. Among the genomes we tested, dPUC is most successful at improving predictions for the poorly-annotated malaria parasite Plasmodium falciparum, for which over 38% of the genome is currently unannotated. Our approach enables high-confidence annotations in this organism and the identification of orthologs to many core machinery proteins conserved in all eukaryotes, including those involved in ribosomal assembly and other RNA processing events, which surprisingly had not been previously known.
Conclusions: Overall, our results demonstrate that this new context-based approach will provide significant improvements in domain and function prediction, especially for poorly understood genomes for which the need for additional annotations is greatest. Source code for the algorithm is available under a GPL open source license at http://compbio.cs.princeton.edu/dpuc/. Pre-computed results for our test organisms and a web server are also available at that location.
Figures



Similar articles
-
A multi-objective optimization approach accurately resolves protein domain architectures.Bioinformatics. 2016 Feb 1;32(3):345-53. doi: 10.1093/bioinformatics/btv582. Epub 2015 Oct 12. Bioinformatics. 2016. PMID: 26458889 Free PMC article.
-
A domain-centric solution to functional genomics via dcGO Predictor.BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2105-14-S3-S9. Epub 2013 Feb 28. BMC Bioinformatics. 2013. PMID: 23514627 Free PMC article.
-
Fitting hidden Markov models of protein domains to a target species: application to Plasmodium falciparum.BMC Bioinformatics. 2012 May 1;13:67. doi: 10.1186/1471-2105-13-67. BMC Bioinformatics. 2012. PMID: 22548871 Free PMC article.
-
Protein function annotation using protein domain family resources.Methods. 2016 Jan 15;93:24-34. doi: 10.1016/j.ymeth.2015.09.029. Epub 2015 Oct 3. Methods. 2016. PMID: 26434392 Review.
-
Prediction of protein function from protein sequence and structure.Q Rev Biophys. 2003 Aug;36(3):307-40. doi: 10.1017/s0033583503003901. Q Rev Biophys. 2003. PMID: 15029827 Review.
Cited by
-
Identification of divergent protein domains by combining HMM-HMM comparisons and co-occurrence detection.PLoS One. 2014 Jun 5;9(6):e95275. doi: 10.1371/journal.pone.0095275. eCollection 2014. PLoS One. 2014. PMID: 24901648 Free PMC article.
-
A multi-objective optimization approach accurately resolves protein domain architectures.Bioinformatics. 2016 Feb 1;32(3):345-53. doi: 10.1093/bioinformatics/btv582. Epub 2015 Oct 12. Bioinformatics. 2016. PMID: 26458889 Free PMC article.
-
Protein domain identification methods and online resources.Comput Struct Biotechnol J. 2021 Feb 2;19:1145-1153. doi: 10.1016/j.csbj.2021.01.041. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 33680357 Free PMC article. Review.
-
Supergenomic network compression and the discovery of EXP1 as a glutathione transferase inhibited by artesunate.Cell. 2014 Aug 14;158(4):916-928. doi: 10.1016/j.cell.2014.07.011. Cell. 2014. PMID: 25126794 Free PMC article.
-
An insight into the lignin peroxidase of Macrophomina phaseolina.Bioinformation. 2013 Aug 7;9(14):730-5. doi: 10.6026/97320630009730. eCollection 2013. Bioinformation. 2013. PMID: 23976830 Free PMC article.
References
-
- Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Lu S, Marchler GH, Mullokandov M, Song JS, Tasneem A, Thanki N, Yamashita RA, Zhang D, Zhang N, Bryant SH. CDD: specific functional annotation with the Conserved Domain Database. Nucl Acids Res. 2009;37:D205–210. doi: 10.1093/nar/gkn845. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases