Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Feb 23:7:89.
doi: 10.1186/1471-2105-7-89.

Integrating protein structures and precomputed genealogies in the Magnum database: examples with cellular retinoid binding proteins

Affiliations

Integrating protein structures and precomputed genealogies in the Magnum database: examples with cellular retinoid binding proteins

Michael E Bradley et al. BMC Bioinformatics. .

Abstract

Background: When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use.

Results: The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1) multiple sequence alignments, 2) mapping of alignment sites to crystal structure sites, 3) phylogenetic trees, 4) inferred ancestral sequences at internal tree nodes, and 5) amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures.

Conclusion: We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural bioinformatics resources that are useful for identifying experimentally testable hypotheses about the molecular basis of protein behaviors and functions, as illustrated with the examples from the cellular retinoid binding proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Alignments. Histograms showing the distribution of protein sequence families (y-axis, number of families) in Magnum versus (A) Number of homologous family members, (B) Number of aligned sites in the multiple sequence alignment of the family, (C) Percentage of characters that are gaps in the multiple sequence alignment, and (D) Number of regions containing gaps.
Figure 2
Figure 2
Phylogenies. Histograms showing the distribution of families, branches, nodes, and sites as a function of the evolutionary feature indicated (x-axis). (A) Width of the evolutionary tree interconnecting the family members, (B) Length of the tree branches, (C) Shortest distance from internal nodes to a leaf node, (D) Substitution rate factor. (A-C) Units are replacements per site.
Figure 3
Figure 3
Structures. (A) Histogram showing the distribution of PDB chains associated with individual families in Magnum. (B) Scatter plot of alignment length versus the number of PDB chains for each family. Glutamate synthase has the longest alignment length; lysozyme c and T4 phage lysozyme have the largest number of non-redundant PDB chains. (C) Frequency histograms for the proportions of alignment sites without structural information due to indels (gray), and unresolved areas of the structure (black). (D) Average PDB chain length plotted against the number of sites aligned to at least one PDB chain residue for families with at least two PDB chains (the coefficient of determination calculated by linear regression is also shown).
Figure 4
Figure 4
Dayhoff matrix comparisons. Elements of leaf-leaf (x-axis) and node-node (y-axis) Dayhoff matrices are plotted together from (A) short, (B) medium and (C) long branches. The coefficient of determination calculated by linear regression is shown. Labeled points involve the amino acids alanine or valine. The all fractional counting method was used in node-node comparisons shown. Similar results were obtained with the other counting methods (data not shown).
Figure 5
Figure 5
F-to-E/Q/K replacement pathways. Start and end residues are black with white codons, intermediate residues are gray with black codons. Codons are written using DNA symbols (A, adenine, T, thymine, C, cytosine, G, guanine, Y, pyrimidine, R, purine). Red pathways cross two different residues; blue pathways cross one residue; dashed pathways cross a stop codon. Note that dashed paths are identical in (A) F-to-E, (B) F-to-Q, and (C) F-to-K.
Figure 6
Figure 6
Phylogeny of retinoid binding proteins. Cellular retinoic acid binding protein (CRABP) subfamilies I and II and cellular retinol binding protein (CRBP) subfamilies I – IV are boxed. Extant and ancestral amino acids corresponding to alignment site 7 (PDB:1opbA position 4) are shown at leaf and internal nodes.
Figure 7
Figure 7
Gecko crystallin contact site prediction. 15 of the 25 sites experiencing amino acid replacement along the branch leading to the gecko crystallin protein (Figure 6; branch 7) are highlighted in red. The reference structure is PDB:1gglA. (A-D) Successive quarter turns of the molecule about its y-axis.

Similar articles

Cited by

References

    1. Benner SA. Interpretive proteomics--finding biological meaning in genome and proteome databases. Adv Enzyme Regul. 2003;43:271–359. doi: 10.1016/S0065-2571(02)00024-9. - DOI - PubMed
    1. Fukami-Kobayashi K, Schreiber DR, Benner SA. Detecting compensatory covariation signals in protein evolution using reconstructed ancestral sequences. J Mol Biol. 2002;319:729–743. doi: 10.1016/S0022-2836(02)00239-5. - DOI - PubMed
    1. Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Eng. 1994;7:341–348. - PubMed
    1. Shindyalov IN, Kolchanov NA, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 1994;7:349–358. - PubMed
    1. Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci U S A. 1994;91:98–102. - PMC - PubMed

Publication types

Substances

LinkOut - more resources