Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May-Jun;14(3):253-63.
doi: 10.1197/jamia.M2233. Epub 2007 Feb 28.

Essie: a concept-based search engine for structured biomedical text

Affiliations

Essie: a concept-based search engine for structured biomedical text

Nicholas C Ide et al. J Am Med Inform Assoc. 2007 May-Jun.

Abstract

This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie's design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie's performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Abstract diagram of Essie’s scoring algorithm. Term occurrences are weighted by: (1) the similarity to the user’s query, and (2) the importance of the field where they are found. For example, in a search for “heart attack,” a document with “heart attack” in the title (point A) would score higher than a document with “myocardial infarction” in the abstract (point B).
Figure 2
Figure 2
Index building and related preprocessing. Token adjacency indexes are derived from the corpus and support efficient searches for arbitrary phrases. Word variants are extracted primarily from the Unified Medical Language System (UMLS) (additional compound words and plurals are mined from the corpus), and are used in term expansion. Synonymy is extracted from the UMLS and is used for concept expansion.
Figure 3
Figure 3
Search processing. Queries are parsed to extract search syntax and search texts. Syntax operators can control query expansion, but the default is relaxation expansion, which extends concept and term expansion. Expansion results in a large set of variations of the original search text, all of which are searched as phrases. Hits in the corpus are collected, and the documents containing them are scored, ranked, and returned.
Figure 4
Figure 4
A search expansion tree. Leaf nodes load lists of occurrences (aka hits) for tokens as found in the token adjacency indexes. Adjacent and merge nodes build up multitoken phrase hits. The stretch operation extends hits to include optional extra tokens on the right. Evaluation of the entire tree produces hits for the term expansion of “non-hodgkin’s lymphoma.”

Similar articles

Cited by

References

    1. Alper BS, White DS, Ge B. Physicians answer more clinical questions and change clinical decisions more often with synthesized evidence: A randomized trial in primary care Ann Fam Med 2005;3:507-513. - PMC - PubMed
    1. Ward D, Meadows SE, Nashelsky JE. The role of expert searching in the Family Physicians’ Inquiries Network (FPIN) J Med Libr Assoc 2005;93:88-96. - PMC - PubMed
    1. Friedman C, Kra P, Rzhetsky A. Two biomedical sublanguages: A description based on the theories of Zellig Harris J Biomed Inform 2002;35:222-235. - PubMed
    1. McCray AT, Tse T. Understanding search failures in consumer health information systems AMIA Annu Symp Proc 2003:430-434. - PMC - PubMed
    1. McCray AT, Ide NC. Design and implementation of a national clinical trials registry J Am Med Inform Assoc 2000;7:313-323. - PMC - PubMed

Publication types