Exploring Mouse Protein Function via Multiple Approaches
- PMID: 27846315
- PMCID: PMC5112993
- DOI: 10.1371/journal.pone.0166580
Exploring Mouse Protein Function via Multiple Approaches
Abstract
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Prediction of protein subcellular localization.Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018. Proteins. 2006. PMID: 16752418
-
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.In Silico Biol. 2008;8(2):129-40. In Silico Biol. 2008. PMID: 18928201
-
A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction.Curr Opin Struct Biol. 2005 Jun;15(3):285-9. doi: 10.1016/j.sbi.2005.05.011. Curr Opin Struct Biol. 2005. PMID: 15939584 Review.
-
Computational protein design for given backbone: recent progresses in general method-related aspects.Curr Opin Struct Biol. 2016 Aug;39:89-95. doi: 10.1016/j.sbi.2016.06.013. Epub 2016 Jun 24. Curr Opin Struct Biol. 2016. PMID: 27348345 Review.
Cited by
-
A network-based method using a random walk with restart algorithm and screening tests to identify novel genes associated with Menière's disease.PLoS One. 2017 Aug 7;12(8):e0182592. doi: 10.1371/journal.pone.0182592. eCollection 2017. PLoS One. 2017. PMID: 28787010 Free PMC article.
-
Identifying novel fruit-related genes in Arabidopsis thaliana based on the random walk with restart algorithm.PLoS One. 2017 May 4;12(5):e0177017. doi: 10.1371/journal.pone.0177017. eCollection 2017. PLoS One. 2017. PMID: 28472169 Free PMC article.
-
Inferring novel genes related to colorectal cancer via random walk with restart algorithm.Gene Ther. 2019 Sep;26(9):373-385. doi: 10.1038/s41434-019-0090-7. Epub 2019 Jul 15. Gene Ther. 2019. PMID: 31308477
-
Identifying Functions of Proteins in Mice With Functional Embedding Features.Front Genet. 2022 May 16;13:909040. doi: 10.3389/fgene.2022.909040. eCollection 2022. Front Genet. 2022. PMID: 35651937 Free PMC article.
-
Identification of Differentially Expressed Genes between Original Breast Cancer and Xenograft Using Machine Learning Algorithms.Genes (Basel). 2018 Mar 12;9(3):155. doi: 10.3390/genes9030155. Genes (Basel). 2018. PMID: 29534550 Free PMC article.
References
-
- Pandey G, Kumar V, Steinbach M. Computational Approaches for Protein Function: A Review. 2006.
-
- Khan S, Situ G, Decker K, Schmidt CJ. GoFigure: automated Gene Ontology annotation. Bioinformatics. 2003;19(18):2484–5. . - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources