Getting more out of biomedical documents with GATE's full lifecycle open source text analytics
- PMID: 23408875
- PMCID: PMC3567135
- DOI: 10.1371/journal.pcbi.1002854
Getting more out of biomedical documents with GATE's full lifecycle open source text analytics
Abstract
This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures







Similar articles
-
Development of an information retrieval tool for biomedical patents.Comput Methods Programs Biomed. 2018 Jun;159:125-134. doi: 10.1016/j.cmpb.2018.03.012. Epub 2018 Mar 14. Comput Methods Programs Biomed. 2018. PMID: 29650307
-
Text mining.Methods Mol Biol. 2008;453:471-91. doi: 10.1007/978-1-60327-429-6_25. Methods Mol Biol. 2008. PMID: 18712320
-
Argo: an integrative, interactive, text mining-based workbench supporting curation.Database (Oxford). 2012 Mar 20;2012:bas010. doi: 10.1093/database/bas010. Print 2012. Database (Oxford). 2012. PMID: 22434844 Free PMC article.
-
Text-based knowledge discovery: search and mining of life-sciences documents.Drug Discov Today. 2002 Jun 1;7(11):S89-98. doi: 10.1016/s1359-6446(02)02286-9. Drug Discov Today. 2002. PMID: 12047886 Review.
-
A survey on annotation tools for the biomedical literature.Brief Bioinform. 2014 Mar;15(2):327-40. doi: 10.1093/bib/bbs084. Epub 2012 Dec 18. Brief Bioinform. 2014. PMID: 23255168 Review.
Cited by
-
Understanding social and clinical associations with unemployment for people with schizophrenia and bipolar disorders: large-scale health records study.Soc Psychiatry Psychiatr Epidemiol. 2024 Oct;59(10):1709-1719. doi: 10.1007/s00127-024-02620-6. Epub 2024 Feb 20. Soc Psychiatry Psychiatr Epidemiol. 2024. PMID: 38378812 Free PMC article.
-
Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study.JMIR Med Inform. 2021 May 5;9(5):e24678. doi: 10.2196/24678. JMIR Med Inform. 2021. PMID: 33949962 Free PMC article.
-
Surveillance of Domestic Violence Using Text Mining Outputs From Australian Police Records.Front Psychiatry. 2022 Feb 9;12:787792. doi: 10.3389/fpsyt.2021.787792. eCollection 2021. Front Psychiatry. 2022. PMID: 35222105 Free PMC article.
-
Ethnicity and cardiovascular health inequalities in people with severe mental illnesses: protocol for the E-CHASM study.Soc Psychiatry Psychiatr Epidemiol. 2016 Apr;51(4):627-38. doi: 10.1007/s00127-016-1185-8. Epub 2016 Feb 4. Soc Psychiatry Psychiatr Epidemiol. 2016. PMID: 26846127 Free PMC article.
-
Recorded atypical hallucinations in psychotic and affective disorders and associations with non-benzodiazepine hypnotic use: the South London and Maudsley Case Register.BMJ Open. 2018 Sep 28;8(9):e025216. doi: 10.1136/bmjopen-2018-025216. BMJ Open. 2018. PMID: 30269078 Free PMC article.
References
-
- Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: an architecture for development of robust hlt applications. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 7–12 July 2002. Stroudsburg, PA, USA: Association for Computational Linguistics, ACL '02, pp. 168–175. doi:10.3115/1073083.1073112. URL http://gate.ac.uk/sale/acl02/acl-main.pdf.
-
- Cunningham H, Maynard D, Bontcheva K, Tablan V, Aswani N, et al. (2011) Text Processing with GATE (Version 6). The University of Sheffield Available: http://tinyurl.com/gatebook.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources