Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 2;47(W1):W571-W577.
doi: 10.1093/nar/gkz393.

Geneshot: search engine for ranking genes from arbitrary text queries

Affiliations

Geneshot: search engine for ranking genes from arbitrary text queries

Alexander Lachmann et al. Nucleic Acids Res. .

Abstract

The frequency by which genes are studied correlates with the prior knowledge accumulated about them. This leads to an imbalance in research attention where some genes are highly investigated while others are ignored. Geneshot is a search engine developed to illuminate this gap and to promote attention to the under-studied genome. Through a simple web interface, Geneshot enables researchers to enter arbitrary search terms, to receive ranked lists of genes relevant to the search terms. Returned ranked gene lists contain genes that were previously published in association with the search terms, as well as genes predicted to be associated with the terms based on data integration from multiple sources. The search results are presented with interactive visualizations. To predict gene function, Geneshot utilizes gene-gene similarity matrices from processed RNA-seq data, or from gene-gene co-occurrence data obtained from multiple sources. In addition, Geneshot can be used to analyze the novelty of gene sets and augment gene sets with additional relevant genes. The Geneshot web-server and API are freely and openly available from https://amp.pharm.mssm.edu/geneshot.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Geneshot user interface for the PubMed querying tab. (A) Search engine input section. (B) Scatter plot of all publications that mention both the gene and the search terms against the normalized values (left); gene with and without search terms mentions over time (right). (C) Tables providing ranked lists of relevant genes based on GeneRIF (left), and predictions based on AutoRIF co-occurrence (right).
Figure 2.
Figure 2.
Median area under the receiver operating characteristic curve (AUC) distributions for predicting genes associated with terms from 16 Enrichr gene set libraries. The libraries are labeled as data-driven and manually curated. Predictions were made using four gene–gene similarity matrices created from Tagger, GeneRIF, AutoRIF and ARCHS4.
Figure 3.
Figure 3.
KEGG pathway gene members recovered by Geneshot given only the pathway terms. (A) Fraction of pathway gene members recovered with the Geneshot literature search for all 263 KEGG pathway terms using the AutoRIF settings. (B) Total predicted pathway members recovered using the gene function prediction method with the ARCHS4 gene–gene co-expression correlations. (C) Additional pathways members not recovered by the Geneshot original search but recovered by the ARCHS4 gene–gene co-expression correlations. The input for the predictions was top ranked genes of different sizes returned from the literature search with the AutoRIF settings. Ranking was accomplished by three methods: total counts, normalized counts and a combined score that multiplies the total counts by the normalized counts.

References

    1. Wang Z., Clark N.R., Ma’ayan A.. Dynamics of the discovery process of protein-protein interactions from low content studies. BMC Syst. Biol. 2015; 9:26. - PMC - PubMed
    1. Oprea T.I., Bologa C.G., Brunak S., Campbell A., Gan G.N., Gaulton A., Gomez S.M., Guha R., Hersey A., Holmes J.. Unexplored therapeutic opportunities in the human genome. Nat. Rev. Drug Discover. 2018; 17:317–332. - PMC - PubMed
    1. Jensen L.J., Saric J., Bork P.. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 2006; 7:119–129. - PubMed
    1. Mikolov T., Chen K., Corrado G., Dean J.. Efficient estimation of word representations in vector space. 2013; arXiv doi:16 January 2013, preprint: not peer reviewedhttps://arxiv.org/abs/1301.3781.
    1. Wang Z., Lachmann A., Ma’ayan A.. Mining data and metadata from the gene expression omnibus. Biophys. Rev. 2018; 11:1–8. - PMC - PubMed

Publication types