. 2007;129(Pt 1):710-5.

Using discourse analysis to improve text categorization in MEDLINE

Patrick Ruch¹, Antoine Geissbühler, Julien Gobeill, Frederic Lisacek, Imad Tbahriti, Anne-Lise Veuthey, Alan R Aronson

Affiliations

PMID: 17911809

Using discourse analysis to improve text categorization in MEDLINE

Patrick Ruch et al. Stud Health Technol Inform. 2007.

. 2007;129(Pt 1):710-5.

Authors

Patrick Ruch¹, Antoine Geissbühler, Julien Gobeill, Frederic Lisacek, Imad Tbahriti, Anne-Lise Veuthey, Alan R Aronson

Affiliation

¹ Medical Informatics Service, University and Hospital of Geneva, Geneva, Switzerland. patrick.ruch@sim.hcuge.ch

PMID: 17911809

Abstract

Problem: Automatic keyword assignment has been largely studied in medical informatics in the context of the MEDLINE database, both for helping search in MEDLINE and in order to provide an indicative "gist" of the content of an article. Automatic assignment of Medical Subject Headings (MeSH), which is formally an automatic text categorization task, has been proposed using different methods or combination of methods, including machine learning (naïve Bayes, neural networks..), linguistically-motivated methods (syntactic parsing, semantic tagging, or information retrieval.

Methods: In the present study, we propose to evaluate the impact of the argumentative structures of scientific articles to improve the categorization effectiveness of a categorizer, which combines linguistically-motivated and information retrieval methods. Our argumentative categorizer, which uses representation levels inherited from the field of discourse analysis, is able to classify sentences of an abstract in four classes: PURPOSE; METHODS; RESULTS and CONCLUSION. For the evaluation, the OHSUMED collection, a sample of MEDLINE, is used as a benchmark. For each abstract in the collection, the result of the argumentative classifier, i.e. the labeling of each sentence with an argumentative class, is used to modify the original ranking of the MeSH categorizer.

Results: The most effective combination (+2%, p<0.003) strongly overweights the METHODS section and moderately the RESULTS and CONCLUSION section.

Conclusion: Although modest, the improvement brought by argumentative features for text categorization confirms that discourse analysis methods could benefit text mining in scientific digital libraries.

PubMed Disclaimer

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- IOS Press

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using discourse analysis to improve text categorization in MEDLINE

Affiliation

Using discourse analysis to improve text categorization in MEDLINE

Authors

Affiliation

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources