Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan-Feb;21(1):90-6.
doi: 10.1136/amiajnl-2012-001584. Epub 2013 May 18.

Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug-side effect relationships from the literature

Affiliations

Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug-side effect relationships from the literature

Rong Xu et al. J Am Med Inform Assoc. 2014 Jan-Feb.

Abstract

Objective: A comprehensive and machine-understandable cancer drug-side effect (drug-SE) relationship knowledge base is important for in silico cancer drug target discovery, drug repurposing, and toxicity predication, and for personalized risk-benefit decisions by cancer patients. While US Food and Drug Administration (FDA) drug labels capture well-known cancer drug SE information, much cancer drug SE knowledge remains buried the published biomedical literature. We present a relationship extraction approach to extract cancer drug-SE pairs from the literature.

Data and methods: We used 21,354,075 MEDLINE records as the text corpus. We extracted drug-SE co-occurrence pairs using a cancer drug lexicon and a clean SE lexicon that we created. We then developed two filtering approaches to remove drug-disease treatment pairs and subsequently a ranking scheme to further prioritize filtered pairs. Finally, we analyzed relationships among SEs, gene targets, and indications.

Results: We extracted 56,602 cancer drug-SE pairs. The filtering algorithms improved the precision of extracted pairs from 0.252 at baseline to 0.426, representing a 69% improvement in precision with no decrease in recall. The ranking algorithm further prioritized filtered pairs and achieved a precision of 0.778 for top-ranked pairs. We showed that cancer drugs that share SEs tend to have overlapping gene targets and overlapping indications.

Conclusions: The relationship extraction approach is effective in extracting many cancer drug-SE pairs from the literature. This unique knowledge base, when combined with existing cancer drug SE knowledge, can facilitate drug target discovery, drug repurposing, and toxicity prediction.

Keywords: Cancer Drug Toxicity; Drug Repurposing; Drug Target Discovery; Information Extraction; Natural Language Processing; Text Mining.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart depicting the process of cancer drug–SE relationship extraction, filtering, ranking, and analysis. SE, side effect.
Figure 2
Figure 2
Filtered drug–SE pairs extracted from sentences (‘Filtered_Sentence’) and abstracts (‘Filtered_Abstract’) and ranked by MEDLINE frequency. SE, side effect.
Figure 3
Figure 3
Unfiltered drug–SE pairs ranked by MEDLINE frequency and evaluated using dataset ‘Irinotecan–SE’ pairs (‘Unfiltered_Sentence_Irinotecan_SE’) and drug–disease treatment pairs from ClinicalTrials.gov (‘Unfiltered_Sentence_Drug_Disease_ClinicalTrials’). SE, side effect.
Figure 4
Figure 4
The positive association between drug target genes and drug–SE pairs: cancer specific drug–SE pairs extracted from MEDLINE (‘Cancer_Drug_SE’) and all pairs from the SIDER database (‘SIDER_Drug_SE’). SE, side effect.
Figure 5
Figure 5
The positive association between drug disease indications and drug–SE pairs: cancer drug–SE pairs extracted from MEDLINE (‘Cancer_Drug_SE’) and all pairs from the SIDER database (‘SIDER_Drug_SE’). SE, side effect.

Similar articles

Cited by

References

    1. Richey EA, Lyons EA, Nebeker JR, et al. Accelerated approval of cancer drugs: improved access to therapeutic breakthroughs or early release of unsafe and ineffective drugs? J Clin Oncol 2009;27:4398–405 - PMC - PubMed
    1. Ladewski LA, Belknap SM, Nebeker JR, et al. Dissemination of information on potentially fatal adverse drug reactions for cancer drugs from 2000 to 2002: first results from the research on adverse drug events and reports project. J Clin Oncol 2003;21:3859–66 - PubMed
    1. Ely JW, Osheroff JA, Ebell MH, et al. Obstacles to answering doctors’ questions about patient care with evidence: qualitative study. BMJ 2002;324:710. - PMC - PubMed
    1. Kuhn M, Campillos M, Letunic I, et al. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol 2010;6:343. - PMC - PubMed
    1. Campillos M, Kuhn M, Gavin AC, et al. Drug target identification using side-effect similarity. Science 2008;321:263–6 - PubMed

Publication types

Substances