Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 15;1(1):vbab034.
doi: 10.1093/bioadv/vbab034. eCollection 2021.

Validation of predicted anonymous proteins simply using Fisher's exact test

Affiliations

Validation of predicted anonymous proteins simply using Fisher's exact test

Jean-Michel Claverie et al. Bioinform Adv. .

Abstract

Motivation: Genomes sequencing has become the primary (and often the sole) experimental method to characterize newly discovered organisms, in particular from the microbial world (bacteria, archaea, viruses). This generates an ever increasing number of predicted proteins the existence of which is unwarranted, in particular among those without homolog in model organisms. As a last resort, the computation of the selection pressure from pairwise alignments of the corresponding 'Open Reading Frames' (ORFs) can be used to validate their existences. However, this approach is error-prone, as not usually associated with a significance test.

Results: We introduce the use of the straightforward Fisher's exact test as a postprocessing of the results provided by the popular CODEML sequence comparison software. The respective rates of nucleotide changes at the nonsynonymous versus synonymous position (as determined by CODEML) are turned into entries into a 2 × 2 contingency table, the probability of which is computed under the Null hypothesis that they should not behave differently if the ORFs do not encode actual proteins. Using the genome sequences of two recently isolated giant viruses, we show that strong negative selection pressures do not always provide a solid argument in favor of the existence of proteins.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Selection pressure values (A & B) and other parameters (C & D, E & F, G & H) associated to ORFans versus non-ORFans predicted Mollivirus proteins. Each dot corresponds to a pair of orthologous genes, the relative genomic position of which is indicated by the X-axis, separately for each column. The left/right columns correspond to ORFans/non-ORFans, respectively. ORFs associated to ω values not significantly different from 1 are in red (p-value adjusted to allow for one false positive), others are in blue

Similar articles

Cited by

References

    1. Abergel C., Claverie J.M. (2020) Giant viruses. Curr. Biol., 30, R1108–R1110. - PubMed
    1. Benler S. et al. (2021) Thousands of previously unknown phages discovered in whole-community human gut metagenomes. Microbiome, 9, 78. - PMC - PubMed
    1. Boratto P.V.M. et al. (2020) Yaravirus: a novel 80-nm virus infecting Acanthamoeba castellanii. Proc. Natl. Acad. Sci. USA, 117, 16579–16586. - PMC - PubMed
    1. Chen I.-M. et al. (2021) The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities. Nucleic Acids Res., 49, D751–D763. - PMC - PubMed
    1. Christo-Foroux E. et al. (2020) Characterization of Mollivirus kamchatka, the first modern representative of the proposed Molliviridae family of giant viruses. J. Virol., 94, e01997-19. - PMC - PubMed