Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep 22:10:303.
doi: 10.1186/1471-2105-10-303.

The first step in the development of Text Mining technology for Cancer Risk Assessment: identifying and organizing scientific evidence in risk assessment literature

Affiliations

The first step in the development of Text Mining technology for Cancer Risk Assessment: identifying and organizing scientific evidence in risk assessment literature

Anna Korhonen et al. BMC Bioinformatics. .

Abstract

Background: One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature.

Results: The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA scenario which demonstrates along with other evaluation that the resources we have built are well-defined, accurate, and applicable in practice.

Conclusion: We present our annotation guidelines and a tool which we have designed for expert annotation of PubMed abstracts. A corpus annotated for keywords and document relevance is also presented, along with the taxonomy which organizes the keywords into classes defining core evidence for CRA. As demonstrated by the evaluation, the materials we have constructed provide a good basis for classification of CRA literature along multiple dimensions. They can support current manual CRA as well as facilitate the development of an approach based on TM. We discuss extending the taxonomy further via manual and machine learning approaches and the subsequent steps required to develop TM technology for the needs of CRA.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Annotation tool: This figure displays the annotation tool.
Figure 2
Figure 2
Annotated abstract: Figure displaying the annotated abstract.
Figure 3
Figure 3
Taxonomy for carcinogenic activity: A flow chart displaying taxonomy for carcinogenic activity.
Figure 4
Figure 4
Taxonomy for mode of action: A flow chart displaying taxonomy for mode of action.
Figure 5
Figure 5
The toxicokinetics taxonomy: A flow chart displaying the toxicokinetics taxonomy.

References

    1. Cohen A, Hersh W. A survey of current work in biomedical text mining. Briefings in Bioinformatics. 2005;6:57–71. doi: 10.1093/bib/6.1.57. - DOI - PubMed
    1. Ananiadou S, McNaught J. Text Mining for Biology And Biomedicine. Norwood, MA, USA: Artech House, Inc; 2005.
    1. Hunter L, Cohen KB. Biomedical Language Processing: What's Beyond PubMed? Mol Cell. 2006;21:589–594. doi: 10.1016/j.molcel.2006.02.012. - DOI - PMC - PubMed
    1. Ananiadou S, Kell D, Tsujii J. Text mining and its potential applications in systems biology. Trends in Biotechnology. 2006;24:571–579. doi: 10.1016/j.tibtech.2006.10.002. - DOI - PubMed
    1. Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text mining: current progress. Brief Bioinform. 2007;8:358–375. doi: 10.1093/bib/bbm045. - DOI - PMC - PubMed

Publication types