Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004;5(5):R35.
doi: 10.1186/gb-2004-5-5-r35. Epub 2004 Apr 16.

Prolinks: a database of protein functional linkages derived from coevolution

Affiliations
Comparative Study

Prolinks: a database of protein functional linkages derived from coevolution

Peter M Bowers et al. Genome Biol. 2004.

Abstract

The advent of whole-genome sequencing has led to methods that infer protein function and linkages. We have combined four such algorithms (phylogenetic profile, Rosetta Stone, gene neighbor and gene cluster) in a single database--Prolinks--that spans 83 organisms and includes 10 million high-confidence links. The Proteome Navigator tool allows users to browse predicted linkage networks interactively, providing accompanying annotation from public databases. The Prolinks database and the Proteome Navigator tool are available for use online at http://dip.doe-mbi.ucla.edu/pronav.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The general mechanism of inference for each of the four methods used by the Proteome Navigator. (a) The gene neighbor (GN) method identifies protein pairs encoded in close proximity across multiple genomes. We see in this example that genes A and B are gene neighbors while A and C are not. (b) The Rosetta Stone (RS) method searches for gene fusion events. We see that the A and B proteins are expressed as separate proteins in one organism. However, in a second organism a sequence exists that represents the fusion of the two proteins. The fusion protein is termed the Rosetta Stone protein as it allows us to infer that the A and B proteins are functionally linked. (c) The construction of phylogenetic profiles (PP) begins with four sequenced genomes, from which the protein sequences have been predicted. The protein sequence, A, within E. coli is compared to that of the proteins coded by the other genomes and homologs are identified. If the genome contains a homolog of A, a 1 is placed in the corresponding phylogenetic profile position, a 0 otherwise. Genes with similar phylogenetic profiles are likely to participate in the same pathway. (d) The gene cluster (GC) or operon method identifies closely spaced genes, and assigns a probability P of observing a particular gap distance (or smaller), as judged by the collective set of inter-gene distances.
Figure 2
Figure 2
We assess COG category recovery for the four individual methods, the combination of the four methods, and TextLinks. (a) We assign a confidence measure to the likelihood that a pair of proteins is acting within the same COG pathway, reflecting the number of COG-annotated pairs that lie within the same pathway relative to the total number of annotated pairs. The COG confidence metric is used in the network-graphing function of the Proteome Navigator to select inferred protein linkages with uniform confidence. E. coli protein pairs displayed in this figure have a COG pathway confidence recovery (cumulative accuracy) of greater than 0.4, with the exception of the TextLinks pairs. (b) The receiver operator characteristic (ROC) curve shows the performance of the rank-ordered list of all E. coli interactions predicted from genomic inference (solid line) compared with the random selection of protein pairs (dashed line).
Figure 3
Figure 3
The opening page of the Proteome Navigator prompts the user to select a protein by database identifier or protein name or ID, as well as selecting the genome of interest. Pull-down tabs facilitate the selection of protein features and microbial genomes. Here we select the E. coli gene 'fliG'. Clicking the 'Search Proteins' button takes the user to a page displaying all of the proteins that satisfy the search criteria (see Figure 4).
Figure 4
Figure 4
The 'Graphing' function of the Proteome Navigator displays the network of interactions satisfying the input search criterion. (a) Nodes are colored by functional categories explained in the right-hand border. Edges connecting proteins are colored by the method predicting the interaction, also described in the figure border. Associations predicted by multiple methods are shown in black. The double box around fliG indicates that this was the input protein used to generate this network. Clicking on a node brings the user to a protein-annotation page, and the search can be continued using the new protein to generate a new network search. (b) An example of functional discovery using Prolinks. Using kdtA as the initial seed, we speculate that GutQ, an uncharacterized E. coli protein, may be associated with lipopolysaccharide and cell-wall synthesis. Confirmation of these predictions awaits further scientific inquiry.
Figure 5
Figure 5
Assessment of the four methods by recovery of links between members of known E. coli protein complexes. (a) We test to see how often predicted interacting protein pairs are subunits of the same protein complex. E. coli protein complexes were obtained from the EcoCyc database. (b) Again, the ROC curve shows the performance of the rank-ordered list of all E. coli predicted interactions (solid line) compared with the random selection of protein pairs (dashed line), in their ability to recover constituents of known protein complexes.
Figure 6
Figure 6
A comparison of graphs generated by querying the String database and Proteome Navigator to identify proteins in the ATP synthase complex. COG0056, shown in red in the String network (left), contains the E. coli protein AtpA, used to search each database and shown highlighted as a double-lined box in the Proteome Navigator graph (right). The Proteome Navigator network and Prolinks database identify twice the number of functionally linked proteins at the given confidence level.

Similar articles

Cited by

References

    1. Eisenberg D, Marcotte EM, Xenarios I, Yeates TO. Protein function in the post-genomic era. Nature. 2000;405:823–826. doi: 10.1038/35015694. - DOI - PubMed
    1. Marcotte EM. Computational genetics: finding protein function by nonhomology methods. Curr Opin Struct Biol. 2000;10:359–365. doi: 10.1016/S0959-440X(00)00097-X. - DOI - PubMed
    1. Pellegrini M. Computational methods for protein functional analysis. Curr Opin Chem Biol. 2001;5:46–50. doi: 10.1016/S1367-5931(00)00165-4. - DOI - PubMed
    1. Jenssen TK, Laegreid A, Komorowski J, Hovig E. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet. 2001;28:21–28. doi: 10.1038/88213. - DOI - PubMed
    1. Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C. Predictome: a database of putative functional links between proteins. Nucleic Acids Res. 2002;30:306–309. doi: 10.1093/nar/30.1.306. - DOI - PMC - PubMed

Publication types

LinkOut - more resources