Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 10:1:25.
doi: 10.12688/wellcomeopenres.10210.2. eCollection 2016.

SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data

Affiliations

SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data

Aravind Venkatesan et al. Wellcome Open Res. .

Abstract

The tremendous growth in biological data has resulted in an increase in the number of research papers being published. This presents a great challenge for scientists in searching and assimilating facts described in those papers. Particularly, biological databases depend on curators to add highly precise and useful information that are usually extracted by reading research articles. Therefore, there is an urgent need to find ways to improve linking literature to the underlying data, thereby minimising the effort in browsing content and identifying key biological concepts. As part of the development of Europe PMC, we have developed a new platform, SciLite, which integrates text-mined annotations from different sources and overlays those outputs on research articles. The aim is to aid researchers and curators using Europe PMC in finding key concepts more easily and provide links to related resources or tools, bridging the gap between literature and biological data.

Keywords: Biocuration; Data Integration; Open Access; RDF; SPARQL; SciLite; Semantic Web; Text-Mining; Web Annotations.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Overview of how text mining results are incorporated into SciLite.
Figure 2.
Figure 2.. The figure illustrates a sample annotation of protein MMP9 described in an article ( PMC4676863):
the figure lists the vocabularies used to represent the text-mined annotations. The annotation consists of a link for the tagged entity (Body - UniProt: P52176) and the mentions of the entity (Target) in the text snippet. The text is represented by: prefix – the text that occurs before the tagged entity; exact – tagged entity itself ( MMP9); and postfix – the text snippet that occurs after the tagged entity.
Figure 3.
Figure 3.. An illustration of a sample GeneRIF (gene function) annotation ( PMC4676863):
the figure lists the vocabularies used to represent the annotation. The annotation consists of: Body - text phrase about protein mTOR and a target - data source link for the described protein (UniProt: P09237).
Figure 4.
Figure 4.. An illustration of the semi-automated feedback mechanism to improve annotations.
Erroneous annotations reported by users is used to prepare a report by the helpdesk at Europe PMC. This report is used to perform: a) a quick fix by deleting the particular annotation; b) further the reports are used to refine the text-mining algorithms in the longer term.
Figure 5.
Figure 5.. The screenshot shows the front-end rendering of various annotation types for an article on Europe PMC.
Figure 6.
Figure 6.. A screenshot showing the 3D molecular structure for a given PDB accession number.

Similar articles

  • Europe PMC in 2017.
    Levchenko M, Gou Y, Graef F, Hamelers A, Huang Z, Ide-Smith M, Iyer A, Kilian O, Katuri J, Kim JH, Marinos N, Nambiar R, Parkin M, Pi X, Rogers F, Talo F, Vartak V, Venkatesan A, McEntyre J. Levchenko M, et al. Nucleic Acids Res. 2018 Jan 4;46(D1):D1254-D1260. doi: 10.1093/nar/gkx1005. Nucleic Acids Res. 2018. PMID: 29161421 Free PMC article.
  • Searching and Evaluating Publications and Preprints Using Europe PMC.
    Rosonovski S, Levchenko M, Ide-Smith M, Faulk L, Harrison M, McEntyre J. Rosonovski S, et al. Curr Protoc. 2023 Mar;3(3):e694. doi: 10.1002/cpz1.694. Curr Protoc. 2023. PMID: 36946755 Free PMC article.
  • Europe PMC in 2020.
    Ferguson C, Araújo D, Faulk L, Gou Y, Hamelers A, Huang Z, Ide-Smith M, Levchenko M, Marinos N, Nambiar R, Nassar M, Parkin M, Pi X, Rahman F, Rogers F, Roochun Y, Saha S, Selim M, Shafique Z, Sharma S, Stephenson D, Talo' F, Thouvenin A, Tirunagari S, Vartak V, Venkatesan A, Yang X, McEntyre J. Ferguson C, et al. Nucleic Acids Res. 2021 Jan 8;49(D1):D1507-D1514. doi: 10.1093/nar/gkaa994. Nucleic Acids Res. 2021. PMID: 33180112 Free PMC article.
  • Implementation of linked data in the life sciences at BioHackathon 2011.
    Aoki-Kinoshita KF, Kinjo AR, Morita M, Igarashi Y, Chen YA, Shigemoto Y, Fujisawa T, Akune Y, Katoda T, Kokubu A, Mori T, Nakao M, Kawashima S, Okamoto S, Katayama T, Ogishima S. Aoki-Kinoshita KF, et al. J Biomed Semantics. 2015 Jan 7;6:3. doi: 10.1186/2041-1480-6-3. eCollection 2015. J Biomed Semantics. 2015. PMID: 25973165 Free PMC article. Review.
  • Analysis of biological processes and diseases using text mining approaches.
    Krallinger M, Leitner F, Valencia A. Krallinger M, et al. Methods Mol Biol. 2010;593:341-82. doi: 10.1007/978-1-60327-194-3_16. Methods Mol Biol. 2010. PMID: 19957157 Review.

Cited by

  • PubTator central: automated concept annotation for biomedical full text articles.
    Wei CH, Allot A, Leaman R, Lu Z. Wei CH, et al. Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593. doi: 10.1093/nar/gkz389. Nucleic Acids Res. 2019. PMID: 31114887 Free PMC article.
  • Lit-OTAR framework for extracting biological evidences from literature.
    Tirunagari S, Saha S, Venkatesan A, Suveges D, Carmona M, Buniello A, Ochoa D, McEntyre J, McDonagh E, Harrison M. Tirunagari S, et al. Bioinformatics. 2025 Mar 29;41(4):btaf113. doi: 10.1093/bioinformatics/btaf113. Bioinformatics. 2025. PMID: 40097274 Free PMC article.
  • BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery.
    Theodosiou T, Vrettos K, Baltsavia I, Baltoumas F, Papanikolaou N, Antonakis AΝ, Mossialos D, Ouzounis CA, Promponas VJ, Karaglani M, Chatzaki E, Brandau S, Pavlopoulos GA, Andreakos E, Iliopoulos I. Theodosiou T, et al. Comput Struct Biotechnol J. 2024 Aug 21;23:3247-3253. doi: 10.1016/j.csbj.2024.08.016. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 39279874 Free PMC article.
  • Europe PMC in 2017.
    Levchenko M, Gou Y, Graef F, Hamelers A, Huang Z, Ide-Smith M, Iyer A, Kilian O, Katuri J, Kim JH, Marinos N, Nambiar R, Parkin M, Pi X, Rogers F, Talo F, Vartak V, Venkatesan A, McEntyre J. Levchenko M, et al. Nucleic Acids Res. 2018 Jan 4;46(D1):D1254-D1260. doi: 10.1093/nar/gkx1005. Nucleic Acids Res. 2018. PMID: 29161421 Free PMC article.
  • Automatic annotation of protein residues in published papers.
    Firth R, Talo F, Venkatesan A, Mukhopadhyay A, McEntyre J, Velankar S, Morris C. Firth R, et al. Acta Crystallogr F Struct Biol Commun. 2019 Nov 1;75(Pt 11):665-672. doi: 10.1107/S2053230X1901210X. Epub 2019 Nov 5. Acta Crystallogr F Struct Biol Commun. 2019. PMID: 31702580 Free PMC article.

References

    1. Ananiadou S, Thompson P, Nawaz R, et al. : Event-based text mining for biology and functional genomics. Brief Funct Genomics. 2015;14(3):213–30. PubMed Abstract| 10.1093/bfgp/elu015| Free Full Text - DOI - PMC - PubMed
    1. Attwood TK, Kell DB, McDermott P, et al. : Utopia documents: linking scholarly literature with research data. Bioinformatics. 2010;26(18):i568–i574. PubMed Abstract| 10.1093/bioinformatics/btq383| Free Full Text - DOI - PMC - PubMed
    1. Bateman A: Curators of the world unite: the International Society of Biocuration. Bioinformatics. 2010;26(8):991 PubMed Abstract| 10.1093/bioinformatics/btq101 - DOI - PubMed
    1. Beagrie N, Houghton J: The Value and Impact of the European Bioinformatics Institute.2016. Reference Source
    1. Chang YM, Kuo CJ, Huang HS, et al. : Analysis and Enhancement of Conditional Random Fields Gene Mention Taggers in BioCreative II Challenge Evaluation. In LBM (Short Papers)2007; 7:1 Reference Source