Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001;2(12):RESEARCH0051.
doi: 10.1186/gb-2001-2-12-research0051. Epub 2001 Nov 13.

Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences

Affiliations

Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences

L M Iyer et al. Genome Biol. 2001.

Abstract

Background: Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment.

Results: We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the beta-lactamase-like superfamily of metal-dependent hydrolases.

Conclusions: In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Multiple alignment of the polygalactosaminidase family that includes MJ1477, the alleged archaeal CysRS. Proteins are denoted by their gene name, followed by their species abbreviations and GenBank identifier (GI) numbers. The coloring reflects the 100% consensus. The consensus abbreviations and coloring scheme used in this and subsequent figures are as follows. Hydrophobic residues (h; LIYFMWACV) and aliphatic (l;LIAV) residues are shaded yellow. Colored magenta are alcohol (o; ST), charged (c; KERDH), basic (+; KRH), acidic (-; DE), and polar (p;STEDRKHNQ) residues. Small (s; SAGDNPVT) residues are colored green and big (b; LIFMWYERKQ) residues are shaded gray. The hydrophobic residues of the signal peptide are highlighted in yellow. In the Secondary Structure line, H indicates a helix and E indicates extended conformation (b strand). Aqa, Aquifex aeolicus; Dr, Deinococcus radiodurans; Mj, Methanococcus jannaschii; Pa, Pseudomonas aeruginosa; Ps, Pseudomonas species; Scoe, Streptomyces coelicolor; Strgi, Streptomyces griseus; Tm, Thermotoga maritima.
Figure 2
Figure 2
Multiple alignment of predicted archaeal dihydropteroate synthases. The scheme for displaying multiple alignments is as described in the legend to Figure 1. The consensus secondary structure was derived from the crystal structures of the Staphylococcus aureus, Mycobacterium tuberculosis and Escherichia coli DHPS (Protein Data Bank ID: 1AD1, EYE, 1AJ0). Residues are colored at 90% consensus. Af, Archaeoglobus fulgidus;Ape, Aeropyrum pernix; At, Arabidopsis thaliana; Ec, Escherichia coli; Mj, Methanococcus jannaschii;Mt, Mycobacterium tuberculosis; Mth, Methanobacterium thermoautotrophicum; Sa, Staphylococcus aureus; Sc, Saccharomyces cerevisiae; Pab, Pyrococcus abyssi.
Figure 3
Figure 3
Multiple alignment of the archaea-specific family of predicted metallo-β-lactamase superfamily hydrolases that includes the alleged archaeal dihydropteroate synthase, MJ0301. The scheme for displaying multiple alignments is as described in the legend to Figure 1. A consensus secondary structure was derived from the crystal structure metallo-β-lactamases from Stenotrophomonas maltophilia (1SML) and Bacteroides fragilis (1A7T). Residues are colored at 90% consensus. Bfr, Bacteroides fragilis; Bsp, Bacillus species 170; Mj, M. jannaschii; Mth, M. thermoautotrophicum; Pab, P. abyssi; Ph, P. horikoshii; Stma, S. maltophilia; Tm, Thermotoga maritima.
Figure 4
Figure 4
Multiple alignment of predicted archaeal thymidylate synthases (TS). The scheme for displaying multiple alignments is as described in the legend to Figure 1. Residues are colored at 90% consensus. A consensus secondary structure was derived using known TS structures from R. norvegicus, E. coli and bacteriophage T4 deoxycytidylate hydroxymethyltransferase (1B5D). The Archaeoglobus fulgidus TS has a duplication of the TS domain and the amino-terminal domain (N.TS_Af; shaded gray) is predicted to be inactive. Af, Archaeoglobus fulgidus; At, Arabidopsis thaliana; BPSP1; bacteriophage SP1; Bs, B. subtilis; Dm, Drosophila melanogaster; Dr, D. radiodurans; Ec, E. coli; Mj, M. jannaschii; Mt, M. tuberculosis; Mth, M. thermoautotrophicum; Nm, Neisseria meningitidis; Rn, R. norvegicus; T2, bacteriophage T2; Xf, Xylella fastidiosa.
Figure 5
Figure 5
Multiple alignment of the uncharacterized archaeal protein family that includes the alleged archaeal thymidylate synthase, MJ0757. The scheme for displaying multiple alignments is as described in the legend to Figure 1. Residues are colored at 100% consensus. In addition, metal-chelating residues in an inserted module shared by orthologs of MJ0757 are shaded blue. The asterisks denote residues in MJ0757 that were predicted to be conserved between MJ0757 and TS. Also shown are predicted secondary structures for the MJ0757 family that were obtained by using the PHD program, and the TS-like secondary structure predicted for MJ0757 in [25]. Af, A. fulgidus; Mj, M. jannaschii; Mth, M. thermoautotrophicum.
Figure 6
Figure 6
Multiple alignment of a selection of C2 domains including the alleged 'paralog' of plant virus movement proteins, Cmpp16. The scheme for displaying multiple alignments is as described in the legend to Figure 1. Residues are colored at 100% consensus. A consensus secondary structure was derived from known structures of the C2 domains in phospholipase C-δ1 (1QAT), synaptotagmin (1RSY), and protein kinase C (1A25). At: A. thaliana, Cm: Cucurbita maxima, Le: Lycopersicon esculentum, Os: Oryza sativa, Rn: R. norvegicus.
Figure 7
Figure 7
Multiple alignment of the region of the ATF-2 transcription factor and its homologs identified as a GCN5-like acetyltransferase domain. The scheme for displaying multiple alignments is as described in the legend to Figure 1. Residues are colored at 100% consensus. Ce: Caenorhabditis elegans, Hs: Homo sapiens, Sp: Schizosaccharomyces pombe.
Figure 8
Figure 8
A comparison of the multiple alignments of PIF3, its rice ortholog, and PAS domain proteins. The scheme for displaying multiple alignments is as described in the legend to Figure 1. Residues are colored at 90% consensus. A consensus secondary structure was derived from those available for FixL (1EW0) and photoactive yellow protein (3PYP). Aa, A. aeolicus; Af, A. fulgidus; At, A. thaliana; Av, Azotobacter vinelandii; Bs, B. subtilis; Dm, D. melanogaster; Ec, E. coli; Eh, Ectothiorhodospira halophila; Nc: Neurospora crassa; Os, O. sativa, Rm: Rhizobium meliloti.

References

    1. Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Predicting function: from genes to genomes and back. J Mol Biol. 1998;283:707–725. - PubMed
    1. Koonin EV, Aravind L, Kondrashov AS. The impact of comparative genomics on our understanding of evolution. Cell. 2000;101:573–576. - PubMed
    1. Aravind L, Koonin EV. Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol. 1999;287:1023–1040. - PubMed
    1. Murzin AG. Progress in protein structure prediction. Nat Struct Biol. 2001;8:110–112. - PubMed
    1. Karlin S, Bucher P, Brendel V, Altschul SF. Statistical methods and insights for protein and DNA sequences. Annu Rev Biophys Biophys Chem. 1991;20:175–203. - PubMed

MeSH terms

Substances

LinkOut - more resources