Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 8;47(D1):D351-D360.
doi: 10.1093/nar/gky1100.

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

Affiliations

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

Alex L Mitchell et al. Nucleic Acids Res. .

Abstract

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
InterPro coverage of amino acid residues in UniProtKB. (A) Unique residue coverage of UniProtKB by signatures integrated into InterPro, member database signatures awaiting integration, intrinsically disordered regions, and regions predicted to be signal peptides, transmembrane domains or coiled-coils. (B) Residue coverage of InterPro's contributing member databases. Residues matched by signatures integrated into InterPro are shown in green, and residues found only in signatures not yet integrated are shown in blue.
Figure 2.
Figure 2.
Example API queries. From top to bottom, the first example returns a count of the total number of entries in InterPro and its member databases. The second retrieves information on all InterPro entries. The third and fourth examples return information specific to InterPro entry IPR023411 and PANTHER entry PTHR10000, respectively. The fifth returns InterPro information for all UniProtKB sequences matching InterPro entry IPR00009. The final request returns details of the match between Pfam entry PF00020 and UniProkKB sequence O00220. Further details about the structure of the API URLs are given in (Supplementary Data Table S1).
Figure 3.
Figure 3.
Selecting data to download from the Browse page creates a link to an appropriately pre-filled form and API request on the Download page.
Figure 4.
Figure 4.
Intersecting (A) and non-intersecting (B) InterPro matches for the purpose of calculating homologous superfamily relationships.
Figure 5.
Figure 5.
Reciprocal ‘overlapping homologous superfamilies’ and ‘overlapping entries’ links on the homologous superfamily entry (left) and other InterPro entry (right) pages which display the relationships between these entry types.
Figure 6.
Figure 6.
The homologous superfamilies annotation track on the ProtVista view on the proteins page allows structural information to be placed in context with other annotations.
Figure 7.
Figure 7.
(A) Pfam, CATH-Gene3D and SUPERFAMILY domain matches for UniProtKB sequence A0A0Q0BJI4. The segments A1 and A2 form a discontinuous domain and segment B is an independent nested domain. (B) Example InterProScan XML output for the Pfam matches shown in (A).

References

    1. The UniProt Consortium UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017; 45:D158–D169. - PMC - PubMed
    1. Lewis T.E., Sillitoe I., Dawson N., Lam S.D., Clarke T., Lee D., Orengo C., Lees J.. Gene3D: extensive prediction of globular domains in proteins. Nucleic Acids Res. 2018; 46:D435–D439. - PMC - PubMed
    1. Marchler-Bauer A., Bo Y., Han L., He J., Lanczycki C.J., Lu S., Chitsaz F., Derbyshire M.K., Geer R.C., Gonzales N.R. et al. . CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017; 45:D200–D203. - PMC - PubMed
    1. Pedruzzi I., Rivoire C., Auchincloss A.H., Coudert E., Keller G., de Castro E., Baratin D., Cuche B.A., Bougueleret L., Poux S. et al. . HAMAP in 2015: updates to the protein family classification and annotation system. Nucleic Acids Res. 2015; 43:D1064–D1070. - PMC - PubMed
    1. Mi H., Huang X., Muruganujan A., Tang H., Mills C., Kang D., Thomas P.D.. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017; 45:D183–D189. - PMC - PubMed

Publication types