Validation of predicted anonymous proteins simply using Fisher's exact test
- PMID: 36700095
- PMCID: PMC9710694
- DOI: 10.1093/bioadv/vbab034
Validation of predicted anonymous proteins simply using Fisher's exact test
Abstract
Motivation: Genomes sequencing has become the primary (and often the sole) experimental method to characterize newly discovered organisms, in particular from the microbial world (bacteria, archaea, viruses). This generates an ever increasing number of predicted proteins the existence of which is unwarranted, in particular among those without homolog in model organisms. As a last resort, the computation of the selection pressure from pairwise alignments of the corresponding 'Open Reading Frames' (ORFs) can be used to validate their existences. However, this approach is error-prone, as not usually associated with a significance test.
Results: We introduce the use of the straightforward Fisher's exact test as a postprocessing of the results provided by the popular CODEML sequence comparison software. The respective rates of nucleotide changes at the nonsynonymous versus synonymous position (as determined by CODEML) are turned into entries into a 2 × 2 contingency table, the probability of which is computed under the Null hypothesis that they should not behave differently if the ORFs do not encode actual proteins. Using the genome sequences of two recently isolated giant viruses, we show that strong negative selection pressures do not always provide a solid argument in favor of the existence of proteins.
© The Author(s) 2021. Published by Oxford University Press.
Figures

Similar articles
-
Exact Bayesian p-values for a test of independence in a 2 × 2 contingency table with missing data.Stat Methods Med Res. 2018 Nov;27(11):3411-3419. doi: 10.1177/0962280217702538. Epub 2017 Jun 20. Stat Methods Med Res. 2018. PMID: 28633606 Free PMC article.
-
Extension of Fisher's exact test to 2-by-k contingency tables: a computer program in BASIC.Comput Methods Programs Biomed. 1989 Mar;28(3):195-6. doi: 10.1016/0169-2607(89)90149-1. Comput Methods Programs Biomed. 1989. PMID: 2702812
-
Stratified Fisher's exact test and its sample size calculation.Biom J. 2014 Jan;56(1):129-40. doi: 10.1002/bimj.201300048. Epub 2013 Nov 11. Biom J. 2014. PMID: 24395208 Free PMC article.
-
A survey of protein structures from archaeal viruses.Life (Basel). 2013 Jan 24;3(1):118-30. doi: 10.3390/life3010118. Life (Basel). 2013. PMID: 25371334 Free PMC article. Review.
-
[The great virus comeback].Biol Aujourdhui. 2013;207(3):153-68. doi: 10.1051/jbio/2013018. Epub 2013 Dec 13. Biol Aujourdhui. 2013. PMID: 24330969 Review. French.
Cited by
-
CodingDiv: analyzing SNP-level microdiversity to discriminate between coding and noncoding regions in viral genomes.Bioinformatics. 2023 Jul 1;39(7):btad408. doi: 10.1093/bioinformatics/btad408. Bioinformatics. 2023. PMID: 37449883 Free PMC article.
-
Detection of Herpesviruses (Predominantly HHV-6) in Patients with Guillain-Barré Syndrome.Biomedicines. 2025 Apr 1;13(4):845. doi: 10.3390/biomedicines13040845. Biomedicines. 2025. PMID: 40299442 Free PMC article.
-
A Preclinical Investigation on the Role of IgG Antibodies against Coagulant Components in Multiple Sclerosis.Biomedicines. 2023 Mar 15;11(3):906. doi: 10.3390/biomedicines11030906. Biomedicines. 2023. PMID: 36979885 Free PMC article.
References
-
- Abergel C., Claverie J.M. (2020) Giant viruses. Curr. Biol., 30, R1108–R1110. - PubMed
LinkOut - more resources
Full Text Sources
Research Materials