Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 2;47(W1):W594-W599.
doi: 10.1093/nar/gkz289.

LitSense: making sense of biomedical literature at sentence level

Affiliations

LitSense: making sense of biomedical literature at sentence level

Alexis Allot et al. Nucleic Acids Res. .

Abstract

Literature search is a routine practice for scientific studies as new discoveries build on knowledge from the past. Current tools (e.g. PubMed, PubMed Central), however, generally require significant effort in query formulation and optimization (especially in searching the full-length articles) and do not allow direct retrieval of specific statements, which is key for tasks such as comparing/validating new findings with previous knowledge and performing evidence attribution in biocuration. Thus, we introduce LitSense, which is the first web-based system that specializes in sentence retrieval for biomedical literature. LitSense provides unified access to PubMed and PMC content with over a half-billion sentences in total. Given a query, LitSense returns best-matching sentences using both a traditional term-weighting approach that up-weights sentences that contain more of the rare terms in the user query as well as a novel neural embedding approach that enables the retrieval of semantically relevant results without explicit keyword match. LitSense provides a user-friendly interface that assists its users to quickly browse the returned sentences in context and/or further filter search results by section or publication date. LitSense also employs PubTator to highlight biomedical entities (e.g. gene/proteins) in the sentences for better result visualization. LitSense is freely available at https://www.ncbi.nlm.nih.gov/research/litsense.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
System overview. LitSense has two main parts: ‘sentence indexing’ and ‘search’. We first obtain PubMed and PMC documents from the BioC repository (https://www.ncbi.nlm.nih.gov/research/bionlp/APIs). After removing irrelevant documents (a), sections in each article are normalized to semantic categories (b), and then the text is split into sentences (c). The extracted sentences are stored in Solr and used for learning semantic vectors via sent2vec. Given a user query (A), Solr retrieves sentences, using the inverse document frequency (IDF) ranking in Solr (B), and the top-ranked sentences are subsequently re-ranked by semantic vector similarity scores (C). Finally, the system displays the results through the web interface (D).
Figure 2.
Figure 2.
LitSense user interface. Users can enter queries into the search bar (a), filter results by a section (b) or a publication date (c), and show/hide the highlights of bio-entities (d). The middle column displays retrieved sentences. The circle icon under each sentence indicates the predicted relevance level from orange (likely to be relevant) to blue (likely to be irrelevant) (e). The same line also shows the provenance of the sentence (f), PubMed/PMC ID (g), a button to use this sentence as query for a new search (h), and citation information (by clicking ‘+Article details’) (i). The mouse click on ‘See in Abstract/FullText’ opens the entire document with the retrieved sentence highlighted (j).

References

    1. Fiorini N., Leaman R., Lipman D.J., Lu Z.. How user intelligence is improving PubMed. Nat. Biotechnol. 2018; 36:937–945. - PubMed
    1. Jensen L.J., Saric J., Bork P.. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 2006; 7:119–129. - PubMed
    1. Europe PMC Consortium Europe PMC: a full-text literature database for the life sciences and platform for innovation. Nucleic Acids Res. 2015; 43:D1042–D1048. - PMC - PubMed
    1. Doms A., Schroeder M.. GoPubMed: exploring PubMed with the gene ontology. Nucleic Acids Res. 2005; 33:W783–W786. - PMC - PubMed
    1. Kim W.G., Yeganova L., Wilbur W.J., Lu Z.. MeSH-based dataset for measuring the relevance of text retrieval. Proceedings of the BioNLP 2018 Workshop. 2018; Melbourne, Australia: 161–165.

Publication types