Framework for automatic information extraction from research papers on nanocrystal devices
- PMID: 26665057
- PMCID: PMC4660922
- DOI: 10.3762/bjnano.6.190
Framework for automatic information extraction from research papers on nanocrystal devices
Abstract
To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called " NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.
Keywords: annotated corpus; automatic information extraction; nanocrystal device development; nanoinformatics; text mining.
Figures
Similar articles
-
COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature.Biodivers Data J. 2019 Jan 22;(7):e29626. doi: 10.3897/BDJ.7.e29626. eCollection 2019. Biodivers Data J. 2019. PMID: 30700967 Free PMC article.
-
Extracting laboratory test information from biomedical text.J Pathol Inform. 2013 Aug 31;4:23. doi: 10.4103/2153-3539.117450. eCollection 2013. J Pathol Inform. 2013. PMID: 24083058 Free PMC article.
-
The Effectiveness of Integrated Care Pathways for Adults and Children in Health Care Settings: A Systematic Review.JBI Libr Syst Rev. 2009;7(3):80-129. doi: 10.11124/01938924-200907030-00001. JBI Libr Syst Rev. 2009. PMID: 27820426
-
Drug knowledge discovery via multi-task learning and pre-trained models.BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7. BMC Med Inform Decis Mak. 2021. PMID: 34789238 Free PMC article.
-
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26. Artif Intell Med. 2019. PMID: 31383477 Review.
Cited by
-
Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science.Patterns (N Y). 2022 Apr 8;3(4):100488. doi: 10.1016/j.patter.2022.100488. eCollection 2022 Apr 8. Patterns (N Y). 2022. PMID: 35465225 Free PMC article.
-
Opportunities and challenges of text mining in aterials research.iScience. 2021 Feb 6;24(3):102155. doi: 10.1016/j.isci.2021.102155. eCollection 2021 Mar 19. iScience. 2021. PMID: 33665573 Free PMC article. Review.
-
Nanoinformatics for environmental health and biomedicine.Beilstein J Nanotechnol. 2015 Dec 21;6:2449-51. doi: 10.3762/bjnano.6.253. eCollection 2015. Beilstein J Nanotechnol. 2015. PMID: 26885456 Free PMC article. No abstract available.
References
-
- Kozaki K, Kitamura Y, Mizoguchi R. Systematization of nanotechnology knowledge through ontology engineering - A trial development of idea creation support system for materials design based on functional ontology; Poster notes of ISWC2003; 2003. pp. 63–64.
-
- [Jul 6;2015 ];DaNa project. Available from: http://www.nanoobjects.info/en/
-
- Guzan K A, Mills K C, Gupta V, Murry D, Scheier C N, Willis D A, Ostraat M L. Comput Sci Discovery. 2013;6:014007. doi: 10.1088/1749-4699/6/1/014007. - DOI
-
- Xiao L, Tang K, Liu H, Yang X, Chen Z, Xu R. Information extraction from nanotoxicity related publications; Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM); pp. 25–30.
LinkOut - more resources
Full Text Sources
Other Literature Sources