Comparative Study

. 2007 Jul 9:8:222.

doi: 10.1186/1471-2164-8-222.

Quantitative assessment of relationship between sequence similarity and function similarity

Trupti Joshi¹, Dong Xu

Affiliations

Affiliation

¹ Digital Biology Laboratory, Department of Computer Science and Christopher S, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA. joshitr@missouri.edu <joshitr@missouri.edu>

PMID: 17620139
PMCID: PMC1949826
DOI: 10.1186/1471-2164-8-222

Comparative Study

Quantitative assessment of relationship between sequence similarity and function similarity

Trupti Joshi et al. BMC Genomics. 2007.

. 2007 Jul 9:8:222.

doi: 10.1186/1471-2164-8-222.

Authors

Trupti Joshi¹, Dong Xu

Affiliation

¹ Digital Biology Laboratory, Department of Computer Science and Christopher S, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA. joshitr@missouri.edu <joshitr@missouri.edu>

PMID: 17620139
PMCID: PMC1949826
DOI: 10.1186/1471-2164-8-222

Abstract

Background: Comparative sequence analysis is considered as the first step towards annotating new proteins in genome annotation. However, sequence comparison may lead to creation and propagation of function assignment errors. Thus, it is important to perform a thorough analysis for the quality of sequence-based function assignment using large-scale data in a systematic way.

Results: We present an analysis of the relationship between sequence similarity and function similarity for the proteins in four model organisms, i.e., Arabidopsis thaliana, Saccharomyces cerevisiae, Caenorrhabditis elegans, and Drosophila melanogaster. Using a measure of functional similarity based on the three categories of Gene Ontology (GO) classifications (biological process, molecular function, and cellular component), we quantified the correlation between functional similarity and sequence similarity measured by sequence identity or statistical significance of the alignment and compared such a correlation against randomly chosen protein pairs.

Conclusion: Various sequence-function relationships were identified from BLAST versus PSI-BLAST, sequence identity versus Expectation Value, GO indices versus semantic similarity approaches, and within genome versus between genome comparisons, for the three GO categories. Our study provides a benchmark to estimate the confidence in assignment of functions purely based on sequence similarity.

PubMed Disclaimer

Figures

**Figure 1**
Distribution of yeast and *Arabidopsis* unique orthologous pairs from COGs against sequence identity and expectation value intervals.

**Figure 2**
Relation between functional similarity in terms of the GO Indices and the negative logarithmic (base 10) E-value of sequence similarity within the same genomes using FASTA for the GO Biological Process Annotations.

**Figure 9**
Relation between functional similarity for GO Biological Process, Molecular Function and Cellular Component Annotations vs. E-value intervals (negative logarithmic with base 10) within the same genomes using PSI-BLAST.

**Figure 3**
Relation between percentage of sequence similarity and functional similarity for GO Biological Process Annotations within the same genomes using BLAST.

**Figure 4**
Relation between percentage of sequence similarity and functional similarity for GO Molecular Function Annotations within the same genomes using BLAST.

**Figure 5**
Relation between percentage of sequence similarity and functional similarity for GO Cellular Component Annotations within the same genomes using BLAST.

**Figure 6**
Functional conservation patterns for GO Biological Process annotations (A) based on evidences from experimental validations and (B) based on computational techniques such as electronic annotations, against percentage of sequence similarity.

**Figure 7**
A. Relation between E-value intervals (negative logarithmic with base 10) of seq uence similarity and similarity in SubLoc predicted localization of proteins within the same genomes using FASTA. B. Relation between percentage of sequence similarity and similarity in SubLoc predicted localization of proteins within the same genomes using BLAST.

**Figure 8**
Relation between Semantic similarity and sequence identity for GO Annotations for combined inter and intra genome comparisons using BLAST.

**Figure 10**
Relation between percentage of sequence similarity and functional similarity for GO (A) Biological Process, (B) Molecular Function and (C) Cellular Component Annotations within the same genomes using BLAST and (D) for all annotations using PSI-BLAST respectively, in the form of normalized ratio of pms(t1, t2), which is the probability of the minimum subsumer for terms t1 and t2 (see section 4.3).

**Figure 11**
Relation between percentage of sequence similarity and functional similarity for the GO Biological Process Annotations for inter-genome comparison of yeast ORFs against others using BLAST, in the form of normalized ratio.

**Figure 12**
Relation between percentage of sequence similarity and functional similarity for the GO Molecular Function Annotations for inter-genome comparison of yeast ORFs against others using BLAST, in the form of normalized ratio. Data points with a sample size less than 10 gene pairs are not sure, as the statistics is not significant.

**Figure 13**
GO Biological Process sub-graph with probabilities and minimum subsumer. The numbers in parentheses denote the occurrence of the GO term and any of its descendants in the GO.

See this image and copyright information in PMC

Cited by

The New Coronavirus (SARS-CoV-2): A Comprehensive Review on Immunity and the Application of Bioinformatics and Molecular Modeling to the Discovery of Potential Anti-SARS-CoV-2 Agents.
Villas-Boas GR, Rescia VC, Paes MM, Lavorato SN, Magalhães-Filho MF, Cunha MS, Simões RDC, Lacerda RB, Freitas-Júnior RS, Ramos BHDS, Mapeli AM, Henriques MDST, Freitas WR, Lopes LAF, Oliveira LGR, Silva JGD, Silva-Filho SE, Silveira APSD, Leão KV, Matos MMS, Fernandes JS, Cuman RKN, Silva-Comar FMS, Comar JF, Brasileiro LDA, Santos JND, Oesterreich SA. Villas-Boas GR, et al. Molecules. 2020 Sep 7;25(18):4086. doi: 10.3390/molecules25184086. Molecules. 2020. PMID: 32906733 Free PMC article. Review.
ISM1 protects lung homeostasis via cell-surface GRP78-mediated alveolar macrophage apoptosis.
Lam TYW, Nguyen N, Peh HY, Shanmugasundaram M, Chandna R, Tee JH, Ong CB, Hossain MZ, Venugopal S, Zhang T, Xu S, Qiu T, Kong WT, Chakarov S, Srivastava S, Liao W, Kim JS, Teh M, Ginhoux F, Fred Wong WS, Ge R. Lam TYW, et al. Proc Natl Acad Sci U S A. 2022 Jan 25;119(4):e2019161119. doi: 10.1073/pnas.2019161119. Proc Natl Acad Sci U S A. 2022. PMID: 35046017 Free PMC article.
Tissue-specific genes as an underutilized resource in drug discovery.
Ryaboshapkina M, Hammar M. Ryaboshapkina M, et al. Sci Rep. 2019 May 10;9(1):7233. doi: 10.1038/s41598-019-43829-9. Sci Rep. 2019. PMID: 31076736 Free PMC article.
A genome-wide structure-based survey of nucleotide binding proteins in M. tuberculosis.
Bhagavat R, Kim HB, Kim CY, Terwilliger TC, Mehta D, Srinivasan N, Chandra N. Bhagavat R, et al. Sci Rep. 2017 Oct 2;7(1):12489. doi: 10.1038/s41598-017-12471-8. Sci Rep. 2017. PMID: 28970579 Free PMC article.
Protein complex discovery by interaction filtering from protein interaction networks using mutual rank coexpression and sequence similarity.
Kazemi-Pour A, Goliaei B, Pezeshk H. Kazemi-Pour A, et al. Biomed Res Int. 2015;2015:165186. doi: 10.1155/2015/165186. Epub 2015 Jan 27. Biomed Res Int. 2015. PMID: 25692131 Free PMC article.

See all "Cited by" articles

References

1. Andrade MA, Sander C. Bioinformatics: from genome data to biological knowledge. Current Opinion in Biotechnology. 1997;8:675–683. doi: 10.1016/S0958-1669(97)80118-8. - DOI - PubMed
1. Koonin EV, Bork P, Sander C. Yeast chromosome III: new gene functions. The EMBO Journal. 1994;13:493–503. - PMC - PubMed
1. Casari G, Sander C, Valencia A. A method to predict functional residues in proteins. Nature Structural Biology. 1995;2:171–178. doi: 10.1038/nsb0295-171. - DOI - PubMed
1. Ouzounis C, Casari G, Sander C, Tamames J, Valencia A. Comparisons of Model Genomes. Trends in Biotechnology. 1996;14:280–285. doi: 10.1016/0167-7799(96)10043-3. - DOI - PubMed
1. Schneider R, Casari G, Antoine DD, Bremer P, Schlenkrich M, et al. Supercomputer 1996: Anwendungen, Architekturen, Trends. 1997. GeneCrunch: Experiences on the SGI POWER CHALLENGE array with bioinformatics applications; pp. 109–119.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- FlyBase
- Saccharomyces Genome Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quantitative assessment of relationship between sequence similarity and function similarity

Affiliation

Quantitative assessment of relationship between sequence similarity and function similarity

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials