MedScan, a natural language processing engine for MEDLINE abstracts
- PMID: 12967967
- DOI: 10.1093/bioinformatics/btg207
MedScan, a natural language processing engine for MEDLINE abstracts
Abstract
Motivation: The importance of extracting biomedical information from scientific publications is well recognized. A number of information extraction systems for the biomedical domain have been reported, but none of them have become widely used in practical applications. Most proposals to date make rather simplistic assumptions about the syntactic aspect of natural language. There is an urgent need for a system that has broad coverage and performs well in real-text applications.
Results: We present a general biomedical domain-oriented NLP engine called MedScan that efficiently processes sentences from MEDLINE abstracts and produces a set of regularized logical structures representing the meaning of each sentence. The engine utilizes a specially developed context-free grammar and lexicon. Preliminary evaluation of the system's performance, accuracy, and coverage exhibited encouraging results. Further approaches for increasing the coverage and reducing parsing ambiguity of the engine, as well as its application for information extraction are discussed.
Similar articles
-
Extracting human protein interactions from MEDLINE using a full-sentence parser.Bioinformatics. 2004 Mar 22;20(5):604-11. doi: 10.1093/bioinformatics/btg452. Epub 2004 Jan 22. Bioinformatics. 2004. PMID: 15033866
-
Protein annotation by EBIMed.Nat Biotechnol. 2006 Aug;24(8):902-3. doi: 10.1038/nbt0806-902. Nat Biotechnol. 2006. PMID: 16900125 No abstract available.
-
Bioie: retargetable information extraction and ontological annotation of biological interactions from the literature.J Bioinform Comput Biol. 2004 Sep;2(3):551-68. doi: 10.1142/s0219720004000739. J Bioinform Comput Biol. 2004. PMID: 15359426
-
Hairpins in bookstacks: information retrieval from biomedical text.Brief Bioinform. 2005 Sep;6(3):222-38. doi: 10.1093/bib/6.3.222. Brief Bioinform. 2005. PMID: 16212771 Review.
-
Text mining and ontologies in biomedicine: making sense of raw text.Brief Bioinform. 2005 Sep;6(3):239-51. doi: 10.1093/bib/6.3.239. Brief Bioinform. 2005. PMID: 16212772 Review.
Cited by
-
YTLR: Extracting yeast transcription factor-gene associations from the literature using automated literature readers.Comput Struct Biotechnol J. 2022 Aug 24;20:4636-4644. doi: 10.1016/j.csbj.2022.08.041. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 36090812 Free PMC article.
-
Identification of the key regulating genes of diminished ovarian reserve (DOR) by network and gene ontology analysis.Mol Biol Rep. 2016 Sep;43(9):923-37. doi: 10.1007/s11033-016-4025-8. Epub 2016 Jun 20. Mol Biol Rep. 2016. PMID: 27324248
-
Extraction of protein interaction data: a comparative analysis of methods in use.EURASIP J Bioinform Syst Biol. 2007;2007(1):53096. doi: 10.1155/2007/53096. EURASIP J Bioinform Syst Biol. 2007. PMID: 18274648 Free PMC article.
-
Clustering gene expression regulators: new approach to disease subtyping.PLoS One. 2014 Jan 9;9(1):e84955. doi: 10.1371/journal.pone.0084955. eCollection 2014. PLoS One. 2014. PMID: 24416320 Free PMC article.
-
myGRN: a database and visualisation system for the storage and analysis of developmental genetic regulatory networks.BMC Dev Biol. 2009 Jun 6;9:33. doi: 10.1186/1471-213X-9-33. BMC Dev Biol. 2009. PMID: 19500400 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources