Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 1;36(17):4643-4648.
doi: 10.1093/bioinformatics/btaa485.

UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase

Collaborators, Affiliations

UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase

Alistair MacDougall et al. Bioinformatics. .

Erratum in

  • UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.
    MacDougall A, Volynkin V, Saidi R, Poggioli D, Zellner H, Hatton-Ellis E, Joshi V, O'Donovan C, Orchard S, Auchincloss AH, Baratin D, Bolleman J, Coudert E, de Castro E, Hulo C, Masson P, Pedruzzi I, Rivoire C, Arighi C, Wang Q, Chen C, Huang H, Garavelli J, Vinayaka CR, Yeh LS, Natale DA, Laiho K, Martin MJ, Renaux A, Pichler K; The UniProt Consortium. MacDougall A, et al. Bioinformatics. 2021 Apr 1;36(22-23):5562. doi: 10.1093/bioinformatics/btaa663. Bioinformatics. 2021. PMID: 33821964 Free PMC article. No abstract available.

Abstract

Motivation: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge.

Results: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB.

Availability and implementation: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
An example UniRule rule as displayed on the UniProt website. (A) UniRule display after the common conditions are selected (highlighted in lilac), showing the annotations (highlighted in lilac) that are applied to records meeting the conditions. (B) Part of the UniRule display after the first special condition, ‘taxon = Fungi’ is selected. The condition is highlighted in brown and the annotation that is added when this condition is met is also in brown. (https://www.uniprot.org/unirule/UR001001756)
Fig. 2.
Fig. 2.
The common conditions in a UniRule rule. Rule UR000000256 contains two sets of conditions that make up the common conditions. Within each set, the conditions are connected by the logical operator ‘AND’, while the two sets are connected by the logical operator ‘OR’
Fig. 3.
Fig. 3.
Special conditions that provide site-specific annotation in protein sequences. (A) The conditions from a HAMAP rule UR000101617. (B) The conditions from a PIRSR rule UR001165955
Fig. 4.
Fig. 4.
The representation in the UniProt website of evidence codes for annotation propagated by UniRule UR001001756. Above: Web page display; Below: Text version of the protein record
Fig. 5.
Fig. 5.
Evolution of annotation quality from 2014 to 2019
Fig. 6.
Fig. 6.
Diagram of the pipeline that applies UniRule annotations to UniProtKB. See the main text for a description of the pipeline

References

    1. Chen C. et al. (2019) PIRSitePredict for protein functional site prediction using position-specific rules. Database, 2019, 1–9. doi: 10.1093/database/baz026. - DOI - PMC - PubMed
    1. El-Gebali S. et al. (2019) The Pfam protein families database in 2019. Nucleic Acids Res., 47, D427–D432. - PMC - PubMed
    1. Fetrow J.S., Babbitt P.C. (2018) New computational approaches to understanding molecular protein function. PLoS Comput. Biol., 14, e1005756. - PMC - PubMed
    1. Giglio M. et al. (2019) ECO, the evidence & conclusion ontology: community standard for evidence information. Nucleic Acids Res., 47, D1186–D1194. - PMC - PubMed
    1. Jones P. et al. (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics, 30, 1236–1240. - PMC - PubMed

Publication types