Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 21;13(1):25.
doi: 10.1186/s13326-022-00279-z.

A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology

Affiliations

A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology

Yongqun He et al. J Biomed Semantics. .

Abstract

Background: The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020.

Results: As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment.

Conclusion: CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.

Keywords: COVID-19; Coronavirus; Diagnosis; Drug repurposing; Ontology; Phenotype; SARS-CoV-2; Vaccine.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Top level hierarchical structure of class terms represented in CIDO. Abbreviations in parentheses indicate an entity’s source ontology (Supplemental Table 2)
Fig. 2
Fig. 2
SARS-CoV-2 proteins and genes. A PR modeling of SARS-CoV-2 proteins. B OGG modeling of SARS-CoV-2 genes. Black lines represent the ‘has gene template’ relation connecting proteins to genes. Red boxes denote proteins translated from ORFs that are internal to or overlap with those of the longer indicated gene (red arrows). The light blue box indicates proteins that are produced by proteolytic processing of either replicase polyprotein 1a or replicase polyprotein 1ab, while green boxes indicate those that derive specifically and uniquely from pp1a or pp1ab
Fig. 3
Fig. 3
CIDO modeling of AA variants and mutations. CIDO represents AA variants as material entities if they are substitutions and AA mutations as processes to represent deletions in SARS-CoV-2 microbial variants. Both AA variants utilized analogous axioms due to differences in continuants and occurrents
Fig. 4
Fig. 4
Modeling of COVID-19 diagnostic testing using CIDO. *, only two out of six specimen terms are shown in this figure
Fig. 5
Fig. 5
Host-coronavirus protein-protein interactions (PPIs) and drugs targeting the viral or host proteins. A The hierarchy of PPIs, including ‘SARS-CoV-2 nsp5 protein binding to human HDAC2’. B The chemical nirmatrelvir (a component of the Pfizer drug Paxlovid) is an inhibitor of the virus protein nsp5 (i.e., 3C-like proteinase), which is critical for viral replication. C A chemical ‘Valproic Acid’ is an inhibitor of the HDAC2 (i.e., histone deacetylase 2). Valproic acid is also a valuable candidate against SARS-CoV-2
Fig. 6
Fig. 6
The weighted aggregate taxonomy (WAT) for CIDO (version 1.0.306) with 10,853 concepts (b = 42). A white node inside a colored rectangular box represents a partial-area, which is a group of concepts having the same set of nonhierarchical (lateral) relationships and similar semantics denoted by the concept listed inside the white node. Relationships are listed inside the colored box (inherited ones are not shown). The boxes are color-coded by cardinalities of their sets of lateral relationships. Upward arrows are the hierarchical relationships connecting partial-areas. The weight of a partial-area is defined as the number of descendant concepts. A partial-area with a weight less than b is small and is aggregated into its closest ancestor large partial-area. A large partial-area having no aggregated partial-areas is represented as a rectangle white box with one number indicating the number of summarized concepts. A large partial-area having aggregated partial-areas is represented as a rectangle with rounded corners and with three numbers. The first number inside () is the number of summarized concepts including concepts aggregated from small partial-areas, the second number inside {} is the number of small partial-areas aggregated into it, and the third number inside [] is the number of concepts of the partial-area before the aggregation. See more details in Supplemental File 1
Fig. 7
Fig. 7
Query CIDO amino acid (AA) variants for Delta and Omicron strain comparison and basic transmission and virulence mechanism understanding. A DL query for AA variants shared by Delta and Omicron strains. B DL query for amino acid variants that belong to Omicron. C DL query for amino acid variants that belong to Delta. Current AA variants for Omicron and Delta strains are also characteristic AA variants
Fig. 8
Fig. 8
Host-SARS-CoV-2 gene-gene interaction network using SciMiner on the litCovid paper abstracts. Color represents the type of genes: pink (viral), green (host gene directly co-cited with pathogen genes at the sentence level), and cyan (host gene co-cited with the green host genes in at least 30 or more COVID-19 papers). Node size corresponds to the number of connections and edge thickness corresponds to the number of co-citing papers
Fig. 9
Fig. 9
SARS-CoV-2 drug screening based on the drug cocktail strategy. A total of 232 drugs were identified to have their protein targets involving three coronavirus processes (i.e., viral entry, genome replication, and viral release) and/or host anti-coronaviral processes (i.e., cytokine activity). Two drugs (i.e., copper and artenimol) were shared to have protein targets involved in all four processes. The drug screening study was performed using the DrugXplore program (http://medcode.link/drugxplore/)

References

    1. Control CfD, Prevention Revised US surveillance case definition for severe acute respiratory syndrome (SARS) and update on SARS cases-United States and worldwide, December 2003. MMWR Morb Mortal Wkly Rep. 2003;52(49):1202. - PubMed
    1. Bernard-Stoecklin S, Nikolay B, Assiri A, Bin Saeed AA, Ben Embarek PK, El Bushra H, Ki M, Malik MR, Fontanet A, Cauchemez S, et al. Comparative analysis of eleven healthcare-associated outbreaks of Middle East respiratory syndrome coronavirus (Mers-Cov) from 2015 to 2017. Sci Rep. 2019;9(1):7385. - PMC - PubMed
    1. Coronavirus disease (COVID-19) pandemic https://www.euro.who.int/en/health-topics/health-emergencies/coronavirus....
    1. Liu SL, Saif L. Emerging viruses without Borders: the Wuhan coronavirus. Viruses. 2020;12(2). - PMC - PubMed
    1. Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, White KM, O’Meara MJ, Rezelj VV, Guo JZ, Swaney DL, et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 2020;583(7816):459–468. - PMC - PubMed

Publication types

Supplementary concepts