Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct 11:2007:31-5.

A model for indexing medical documents combining statistical and symbolic knowledge

Affiliations

A model for indexing medical documents combining statistical and symbolic knowledge

Paul Avillach et al. AMIA Annu Symp Proc. .

Abstract

Objectives: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context.

Methods: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs).

Results: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases.

Conclusions: The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Intra and inter referential information processing model.
Figure 2
Figure 2
Intra and inter-referential information processing model adapted to our situation of test.
Figure 3
Figure 3
Simplified architecture flow-chart illustrating the implementation of the model and the score calculation method.
Figure 4
Figure 4
Success rate in finding the principal diagnosis in 1st or 2nd position according to the number of ICD-10 codes in the standardized discharge summaries. *p < 10–4

Similar articles

References

    1. Salton G. Automatic text analysis. Science. 1970 Apr 17;168(929):335–43. - PubMed
    1. Chute CG, Yang Y, Evans DA. Latent semantic indexing of medical diagnoses using UMLS semantic structures. Proc Annu Symp Comput Appl Med Care. 1991:185–9. - PMC - PubMed
    1. Sparck Jones K, Walker S, Robertson SE.A probabilistic model of information retrieval: development and comparative experiments Information Processing and Management 200036Part 1779–808.Part 2 9–40.
    1. Srinivasan P. MeSHmap: a text mining tool for MEDLINE. Proc AMIA Symp. 2001:642–6. - PMC - PubMed
    1. Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ.The NLM indexing initiative's medical text indexer Medinfo 200411(Pt 1)268–72. - PubMed

Publication types

LinkOut - more resources