Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan;43(Database issue):D1064-70.
doi: 10.1093/nar/gku1002. Epub 2014 Oct 27.

HAMAP in 2015: updates to the protein family classification and annotation system

Affiliations

HAMAP in 2015: updates to the protein family classification and annotation system

Ivo Pedruzzi et al. Nucleic Acids Res. 2015 Jan.

Abstract

HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Maximum likelihood cladogram of the sirtuin superfamily. Maximum likelihood (ML) analyses of selected sirtuin family members resulted in 12 trees with two distinct topologies for the main classes I-IV and U, suggesting either classes II and III or classes II and VI to be sister clades. The tree topology with highest branch support is shown. Branches are colored according to families: class I = dark yellow, class II = orange, class III = red, class IV = green, class U = cyan. Branches with aLRT SH-like support values of 0.9 or higher are marked by a red dot. Methods: 65 sirtuin protein family members from 33 species were aligned with MAFFT (21) (version 7; parameters: L-INS-i, JTT200). From the alignment, we selected manually homologous regions using the alignment editor Jalview (22); three data models were created with a length of 238, 220 and 193 amino acids, respectively. The best fitting model of protein evolution was determined with ProtTest (23) (version 3.2; parameters: fixed BIONJ tree calculated under the JTT model of amino acid substitution; rate variation; amino acid frequencies to be the LG model plus gamma distribution). Maximum likelihood (ML) phylogenies and ML consensus trees from 100 bootstrap replicates were inferred with PhyML (24) (version 3.0) and RAxML (25) (version 7.2.8). The tree was visualized with Archaeopteryx (https://sites.google.com/site/cmzmasek/home/software/archaeopteryx). Protein sequences and multiple sequence alignments are provided in supplementary file S2.
Figure 2.
Figure 2.
HAMAP annotation rule MF_01976 for mixed-substrate PFK group III family. The right hand panel shows snippets of the annotation rule MF_01976 including conditions used to specify site-specific annotations propagated to target sequences. If a protein sequence matches the HAMAP family profile MF_01976, then appropriate annotations for all members of that family (such as family membership) are attached to the sequence. For the annotation of sequence features, the target sequence is aligned to the seed alignment and the active site residue from the template sequence mapped to the target sequence. The nature of the residue at the equivalent position in the target sequence determines which of the possible conditional annotations will be attached to the sequence.
Figure 3.
Figure 3.
Partial output of a HAMAP-Scan showing the additional information provided next to the actual annotations. The sequence of Candida parapsilosis hypothetical protein CPAR2_210240 (CCE43379.1) was submitted in FASTA format to HAMAP-Scan. The internal section in the output file contains information such as the submitted FASTA header, a trusted match (including the match score and the score difference to the trusted cut-off score) to profile MF_03117 (ENOPH), a weak match to profile MF_01681 (MTNC, the homologous bacterial family), as well as the information that the sequence has consequently been annotated by HAMAP rule MF_03117 associated with profile MF_03117. The full annotation produced for this sequence can be viewed in UniProtKB/TrEMBL record G8BDN2 for C. parapsilosis CPAR2_210240.

References

    1. Gerlt J.A., Allen K.N., Almo S.C., Armstrong R.N., Babbitt P.C., Cronan J.E., Dunaway-Mariano D., Imker H.J., Jacobson M.P., Minor W. Enzyme Function Initiative. Biochemistry. 2011;50:9950–9962. - PMC - PubMed
    1. Anton B.P., Chang Y.C., Brown P., Choi H.P., Faller L.L., Guleria J., Hu Z., Klitgord N., Levy-Moonshine A., Maksad A., et al. The COMBREX project: design, methodology, and initial results. PLoS Biol. 2013;11:e1001638. - PMC - PubMed
    1. Radivojac P., Clark W.T., Oron T.R., Schnoes A.M., Wittkop T., Sokolov A., Graim K., Funk C., Verspoor K., Ben-Hur A., et al. A large-scale evaluation of computational protein function prediction. Nat. Methods. 2013;10:221–227. - PMC - PubMed
    1. Pedruzzi I., Rivoire C., Auchincloss A.H., Coudert E., Keller G., de Castro E., Baratin D., Cuche B.A., Bougueleret L., Poux S., et al. HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res. 2013;41:D584–D589. - PMC - PubMed
    1. Blake J.A., Dolan M., Drabkin H., Hill D.P., Li N., Sitnikov D., Bridges S., Burgess S., Buza T., McCarthy F., et al. Gene Ontology annotations and resources. Nucleic Acids Res. 2013;41:D530–D535. - PMC - PubMed

Publication types