Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar:4:210-220.
doi: 10.1200/CCI.19.00117.

OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data

Affiliations

OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data

Hayley M Dingerdissen et al. JCO Clin Cancer Inform. 2020 Mar.

Abstract

Purpose: The purpose of OncoMX1 knowledgebase development was to integrate cancer biomarker and relevant data types into a meta-portal, enabling the research of cancer biomarkers side by side with other pertinent multidimensional data types.

Methods: Cancer mutation, cancer differential expression, cancer expression specificity, healthy gene expression from human and mouse, literature mining for cancer mutation and cancer expression, and biomarker data were integrated, unified by relevant biomedical ontologies, and subjected to rule-based automated quality control before ingestion into the database.

Results: OncoMX provides integrated data encompassing more than 1,000 unique biomarker entries (939 from the Early Detection Research Network [EDRN] and 96 from the US Food and Drug Administration) mapped to 20,576 genes that have either mutation or differential expression in cancer. Sentences reporting mutation or differential expression in cancer were extracted from more than 40,000 publications, and healthy gene expression data with samples mapped to organs are available for both human genes and their mouse orthologs.

Conclusion: OncoMX has prioritized user feedback as a means of guiding development priorities. By mapping to and integrating data from several cancer genomics resources, it is hoped that OncoMX will foster a dynamic engagement between bioinformaticians and cancer biomarker researchers. This engagement should culminate in a community resource that substantially improves the ability and efficiency of exploring cancer biomarker data and related multidimensional data.

PubMed Disclaimer

Conflict of interest statement

Marc Robinson-Rechavi

Employment: Debiopharm Group (I)

Travel, Accommodations, Expenses: Debiopharm Group (I)

Raja Mazumder

Research Funding: Merck, Otsuka

No other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
OncoMX overview. External segments: Data for cancer mutation, cancer expression, healthy expression, literature mining for both cancer mutation and expression, cancer biomarkers, relevant ontologies, functional annotations, and pathways were initially harvested from a number of publicly available resources, such as The Cancer Genome Atlas, International Cancer Genome Consortium, ClinVar, COSMIC, UniProt, and others, and further processed or analyzed by The George Washington University for BioMuta and BioXpress; SIB (Swiss Institute of Bioinformatics) for Bgee (RNA sequencing–derived healthy expression calls and ranks for human and mouse); the University of Delaware for literature mining through DiMeX and DEXTER; and the Early Detection Research Network for biomarkers. Outer ring: Data sets were integrated and unified through Cancer Disease Ontology slim terms for disease names and Uberon Anatomical Entity terms for tissue and physiologic location. Middle ring: Feedback was solicited through a multipronged approach involving use case collection at poster sessions and two formal workshops. With user community guidance, a series of data views and graphic visualizations were devised to aid end users in the exploration and interpretation of biomarker evidence. Inner ring: Data sets were refined and documented with provenance details following the BioCompute Object model for data provenance capture and hosted through the data.oncomx.org data site. Inner circle: The product of these efforts is the current OncoMX Web portal, a semantic web of integrated cancer biomarker data that is readily accessible and licensed under a Creative Commons Attribution 4.0 International License. FDA, US Food and Drug Administration; SCTK, Single-Cell Toolkit.
FIG 2.
FIG 2.
Biomarker evidence data model. This diagram shows the modular extensibility and current configuration of the biomarker evidence data model developed for OncoMX. The central provenance domain component captures biomarker and related identifiers for keys and core attributes, as well as cross references between various dictionaries. The genomic variation component contains both mutations in cancer and natural polymorphism data. The expression component also has two subcomponents to describe data coming from diseased and healthy samples. The healthy subcomponent has an added layer of organism, and both expression subcomponents can be further broken down on the basis of the experimental strategy from which data were generated (not shown). The literature mining component contains data extracted from abstracts and full articles reporting biomarker activity, and the clinical status component contains data from Early Detection Research Network (EDRN) and US Food and Drug Administration (FDA) describing clinical attributes, including approved and actual indications, status of clinical trials, related publications, and more. Of note, extension to a new evidence type, glycan-protein type biomarkers, is actively underway, which should allow for the future extension to other subtypes of the anticipated post-translational modification domain.
FIG 3.
FIG 3.
Exploring OncoMX for an individual gene biomarker: Findings for PCA3 in prostate cancer. (A) The OncoMX landing page provides quick access to the search bar. Entering “PCA3” will redirect the user to the search results page. (B) The default open tab of the top viewer in the search results page shows mRNA differential expression results for the queried gene across all cancers. PCA3 is shown to be upregulated in 94% of prostate cancer tumor samples compared with the corresponding adjacent normal samples. (C) Tabular and text details are also available in the lower viewer on the search result pages. Exploring the quantitative information shows that 49 of 52 samples have a logtwo-fold change increase of expression in tumor samples compared with the adjacent normal, and that the overexpression reported for prostate cancer is statistically significant. (D) Text details, including a list of biomarker aliases, a brief description, and a list of links to publications reported by Early Detection Research Network (EDRN), are available from the Biomarkers tab in the lower viewer. (E) Healthy expression of PCA3 is found to be “MEDIUM” in earlier human adult stages but “HIGH” in the 65- to 79-year-old human stage. (F) Automatic literature mining finds 16 unique sentences from 15 unique publications reporting overexpression of PCA3 in prostate cancer and another sentence reporting overexpression of PCA3 in leiomyosarcoma. FDA, US Food and Drug Administration; PSA, prostate-specific antigen; RNA-seq, RNA sequencing; TCGA, The Cancer Genome Atlas.
FIG 4.
FIG 4.
Exploring OncoMX for a gene panel biomarker: Findings for ERBB2 as part of the Prosigna multigene prediction panel for breast cancer. (A) In addition to the search bar, the OncoMX landing page contains buttons that are quick links to other tables and views. Clicking Biomarkers will redirect the user to the biomarker exploration table viewer. (B) From the data exploration table, the user can click the arrow in the top right of the Biomarker Filters box to access the dropdown menu for filters. Clicking Switch to FDA Biomarkers will reload the table with the US Food and Drug Administration (FDA)–approved biomarker data set. Searching for “ERBB2” in the search bar immediately above the table will reload the table for hits to ERBB2 (10 hits). Accessing the filters again, the user can search for hits belonging only to panels by selecting Y from the Panel filter dropdown. The table will once again reload to display the single hit for ERBB2 in a panel, identifying it as part of the Prosigna multigene prediction panel for breast cancer. (C) Going back to the landing page and performing a gene search for “ERBB2” will redirect the user to the detailed search results page. Navigating to the Expression Literature Mining tab, one can readily see the standout peak indicating multiple literature evidences (n = 262) for upregulation of ERBB2 in breast cancer, including 253 unique sentences from 248 publications. EDRN, Early Detection Research Network.

References

    1. OncoMX https://www.oncomx.org/
    1. National Cancer Institute NCI dictionary of cancer terms. https://www.cancer.gov/publications/dictionaries/cancer-terms
    1. Villalobos P, Wistuba II. Lung cancer biomarkers. Hematol Oncol Clin North Am. 2017;31:13–29. - PMC - PubMed
    1. Henry NL, Hayes DF. Cancer biomarkers. Mol Oncol. 2012;6:140–146. - PMC - PubMed
    1. O’Connor JP, Aboagye EO, Adams JE, et al. Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol. 2017;14:169–186. - PMC - PubMed

Publication types

Substances