Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 12;7(1):337.
doi: 10.1038/s41597-020-00679-9.

Protein ontology on the semantic web for knowledge discovery

Affiliations

Protein ontology on the semantic web for knowledge discovery

Chuming Chen et al. Sci Data. .

Abstract

The Protein Ontology (PRO) provides an ontological representation of protein-related entities, ranging from protein families to proteoforms to complexes. Protein Ontology Linked Open Data (LOD) exposes, shares, and connects knowledge about protein-related entities on the Semantic Web using Resource Description Framework (RDF), thus enabling integration with other Linked Open Data for biological knowledge discovery. For example, proteins (or variants thereof) can be retrieved on the basis of specific disease associations. As a community resource, we strive to follow the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles, disseminate regular updates of our data, support multiple methods for accessing, querying and downloading data in various formats, and provide documentation both for scientists and programmers. PRO Linked Open Data can be browsed via faceted browser interface and queried using SPARQL via YASGUI. RDF data dumps are also available for download. Additionally, we developed RESTful APIs to support programmatic data access. We also provide W3C HCLS specification compliant metadata description for our data. The PRO Linked Open Data is available at https://lod.proconsortium.org/ .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
A PRO RDF data model (PR:000046294). Ellipse and circle shapes are RDF nodes. Rectangle shapes are RDF literals. Directed edges are RDF properties. Circle shapes represent anonymous classes or blank nodes. ‘AKT1’, used here for brevity, is the gene for ‘RAC-alpha serine/threonine-protein kinase’.
Fig. 2
Fig. 2
Knowledge graph of exemplary query result of federated SPARQL query 1 (Get all human genes in PRO whose UniProtKB counterpart has variants with loss of function implicated in disease). Ellipse shapes are RDF nodes. Rectangle shapes are RDF literals. Directed edges are RDF properties.
Fig. 3
Fig. 3
Knowledge graph of exemplary query result of federated SPARQL query 2 (Find variants in UniProt or DisGeNET for AlzForum PRO terms). Ellipse and circle shapes are RDF nodes. Rectangle shapes are RDF literals. Directed edges are RDF properties. Circle shapes represent anonymous classes or blank nodes.
Fig. 4
Fig. 4
Virtuoso faceted browser query interface and result table view.
Fig. 5
Fig. 5
PRO LOD SPARQL GUI. It provides users with a portal to query Protein Ontology Linked Open Data using the SPARQL 1.1 standards as well as a comprehensive set of example queries.
Fig. 6
Fig. 6
API documentation for Protein Ontology Linked Open Data. The Swagger™ API generates an interactive webpage where users can ‘try out’ the service with real queries. Results are returned in the ‘Response Body’ in the user selected response format (JSON illustrated) or XML.

References

    1. Berners-Lee, T. Linked Data, https://www.w3.org/DesignIssues/LinkedData.html (2006).
    1. Callahan, A. et al. Bio2RDF release 2: improved coverage, interoperability and provenance of life science linked data. In: Cimiano P., Corcho O., Presutti V., Hollink L., Rudolph S. (eds) The Semantic Web: Semantics and Big Data. ESWC 2013. Lecture Notes in Computer Science. 7882, 200-212 (Springer, Berlin, Heidelberg, 2013).
    1. Bult J, et al. Mouse genome database (MGD) Nucleic Acids Res. 2019;47:D801–D806. doi: 10.1093/nar/gky1056. - DOI - PMC - PubMed
    1. Cherry M, et al. Saccharomyces genome database: the genomics resource of budding yeast. Nucleic Acids Res. 2012;40:D700–705. doi: 10.1093/nar/gkr1029. - DOI - PMC - PubMed
    1. Smith R, et al. The year of the rat: the rat genome database at 20: a multi-species knowledgebase and analysis platform. Nucleic Acids Res. 2020;48:D731–D742. doi: 10.1093/nar/gkaa239. - DOI - PMC - PubMed

Publication types