Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 1:2019:baz026.
doi: 10.1093/database/baz026.

PIRSitePredict for protein functional site prediction using position-specific rules

Affiliations

PIRSitePredict for protein functional site prediction using position-specific rules

Chuming Chen et al. Database (Oxford). .

Abstract

Methods focused on predicting 'global' annotations for proteins (such as molecular function, biological process and presence of domains or membership in a family) have reached a relatively mature stage. Methods to provide fine-grained 'local' annotation of functional sites (at the level of individual amino acid) are now coming to the forefront, especially in light of the rapid accumulation of genetic variant data. We have developed a computational method and workflow that predicts functional sites within proteins using position-specific conditional template annotation rules (namely PIR Site Rules or PIRSRs for short). Such rules are curated through review of known protein structural and other experimental data by structural biologists and are used to generate high-quality annotations for the UniProt Knowledgebase (UniProtKB) unreviewed section. To share the PIRSR functional site prediction method with the broader scientific community, we have streamlined our workflow and developed a stand-alone Java software package named PIRSitePredict. We demonstrate the use of PIRSitePredict for functional annotation of de novo assembled genome/transcriptome by annotating uncharacterized proteins from Trinity RNA-seq assembly of embryonic transcriptomes of the following three cartilaginous fishes: Leucoraja erinacea (Little Skate), Scyliorhinus canicula (Small-spotted Catshark) and Callorhinchus milii (Elephant Shark). On average about 1200 lines of annotations were predicted for each species.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PIRSitePredict system overview.
Figure 2
Figure 2
An example PIRSR (PIRSR000178-1) in UniRule flat file format. It specifies a set of test conditions that candidate uncharacterized proteins must pass to get corresponding annotations, including features with associated comments and keywords. The test conditions include the following: (a) a whole protein based family HMM (see TR); (b) a site-specific profile HMM (SRHMM); (c) functionally and structurally characterized residues of a manually curated template protein sequence; (d) the candidate protein is from an organism within the defined taxonomic scope.
Figure 3
Figure 3
UniProtKB/TrEMBL protein sequence annotations generated by PIRSitePredict.
Figure 4
Figure 4
The Venn diagrams of overlapping families (left) and rules (right) for embryonic transcriptomes of three cartilaginous fishes.
Figure 5
Figure 5
An application of functional site prediction with PIRSitePredict using PIRSR000178-1 as an example. The template sequence for the site rule PIRSR000178-1 (see Figure 2) is P69054 (UniProtKB Accession), which is E. coli SDH cytochrome b556 subunit. The multiple sequence alignment and phylogenetic tree for eight protein sequences matching the conditions of PIRSR000178-1 were generated with Seqotron (29). The sequences are for corresponding proteins from E. coli, human, bovine, yeast, worm, little skate, small-spotted catshark and elephant shark, respectively. The conserved metal-binding site histidine is marked with a box, and the numbers on the top correspond to the template sequence P69054 (E. coli).

References

    1. Juncker A., Jensen L.J., Pierleoni A. et al. (2009) Sequence-based feature prediction and annotation of proteins. Genome Biol. (Online Edition), 10, 206. - PMC - PubMed
    1. Ouzounis C.A., Coulson R.M., Enright A.J. et al. (2003) Classification schemes for protein structure and function. Nat. Rev. Genet., 4, 508–519. - PubMed
    1. Jensen L.J., Gupta R., Staerfeldt H.H. et al. (2003) Prediction of human protein function according to Gene Ontology categories. Bioinformatics, 19, 635–642. - PubMed
    1. Mi H., Muruganujan A., Casagrande J.T. et al. (2013) Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc., 8, 1551–1566. - PMC - PubMed
    1. Selengut J.D., Haft D.H., Davidsen T. et al. (2007) TIGRFAMs and genome properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res., 35, D260–D264. - PMC - PubMed

Publication types