Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 16;20(1):429.
doi: 10.1186/s12859-019-2958-3.

VIST - a Variant-Information Search Tool for precision oncology

Affiliations

VIST - a Variant-Information Search Tool for precision oncology

Jurica Ševa et al. BMC Bioinformatics. .

Abstract

Background: Diagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient's genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of scientific publications focus on basic science and have no direct clinical impact. We develop the Variant-Information Search Tool (VIST), a search engine designed for the targeted search of clinically relevant publications given an oncological mutation profile.

Results: VIST indexes all PubMed abstracts and content from ClinicalTrials.gov. It applies advanced text mining to identify mentions of genes, variants and drugs and uses machine learning based scoring to judge the clinical relevance of indexed abstracts. Its functionality is available through a fast and intuitive web interface. We perform several evaluations, showing that VIST's ranking is superior to that of PubMed or a pure vector space model with regard to the clinical relevance of a document's content.

Conclusion: Different user groups search repositories of scientific publications with different intentions. This diversity is not adequately reflected in the standard search engines, often leading to poor performance in specialized settings. We develop a search engine for the specific case of finding documents that are clinically relevant in the course of cancer treatment. We believe that the architecture of our engine, heavily relying on machine learning algorithms, can also act as a blueprint for search engines in other, equally specific domains. VIST is freely available at https://vist.informatik.hu-berlin.de/.

Keywords: Biomedical information retrieval; Clinical relevance; Document classification; Document retrieval; Document triage; Personalized oncology.

PubMed Disclaimer

Conflict of interest statement

Co-Author Ulf Leser is an associated editor of BMC Bioinformatics. He was not involved in any form in the scientific assessment of this manuscript. Otherwise, the authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
VIST System Architecture. Left: VIST backend with indexed and preprocessed documents. Right: VIST web interface for query processing and result presentation
Fig. 2
Fig. 2
VIST web interface: Top: Search bar for entering queries. Left: Filter options (by keywords, genes, journals, cancer type, and year of publication. Main pane: List of matching documents, ranked by score according to clinical relevance. Matching clinical trials are available as a second tab
Fig. 3
Fig. 3
Detailed view on matching document in VIST. Entities (genes, drugs, variations) as recognized by VIST’s NER modules are highlighted. Sentences are colored according to the propbability of carrying the main message of the abstract (key phrases)
Fig. 4
Fig. 4
Precision (P),Recall (R) and F1 scores of three evaluated classification tasks, i.e., classification by relatedness to cancer, by clinical relevance, and by cancer type. MTL: Multi-Task Learning; HATT: Hierarchical Attention Network; SVM: Support Vector Machine; RF: Random Forest
Fig. 5
Fig. 5
Evaluation results based on the UserStudy data set: Precision at k (P@k) and recall at k (R@k) of three different ranking schemes, i.e, PubMed, KeywordScore, and VIST SVM. Here, k refers to the k’th document in a ranked list that is also contained in the reference list

References

    1. Garraway LA, Verweij J, Ballman KV. Precision Oncology: An Overview. J Clin Oncol. 2013;31(15):1803–5. doi: 10.1200/JCO.2013.49.4799. - DOI - PubMed
    1. Topalian SL, Taube JM, Anders RA, Pardoll DM. Mechanism-driven biomarkers to guide immune checkpoint blockade in cancer therapy. Nat Rev Cancer. 2016;16(5):275–87. doi: 10.1038/nrc.2016.36. - DOI - PMC - PubMed
    1. Doig KD, Fellowes A, Bell AH, Seleznev A, Ma D, Ellul J, Li J, Doyle MA, Thompson ER, Kumar A, Lara L, Vedururu R, Reid G, Conway T, Papenfuss AT, Fox SB. PathOS: a decision support system for reporting high throughput sequencing of cancers in clinical diagnostic laboratories. Genome Med. 2017;9(1):38. doi: 10.1186/s13073-017-0427-z. - DOI - PMC - PubMed
    1. Fiorini N, Lipman DJ, Lu Z. Towards PubMed 2.0. eLife. 2017; 6. 10.7554/eLife.28801. - PMC - PubMed
    1. Thomas P, Starlinger J, Vowinkel A, Arzt S, Leser U. GeneView: a comprehensive semantic search engine for PubMed. Nucleic Acids Res. 2012;40(W1):585–91. doi: 10.1093/nar/gks563. - DOI - PMC - PubMed

LinkOut - more resources