Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 3;12(3):e0173132.
doi: 10.1371/journal.pone.0173132. eCollection 2017.

Text mining for improved exposure assessment

Affiliations

Text mining for improved exposure assessment

Kristin Larsson et al. PLoS One. .

Abstract

Chemical exposure assessments are based on information collected via different methods, such as biomonitoring, personal monitoring, environmental monitoring and questionnaires. The vast amount of chemical-specific exposure information available from web-based databases, such as PubMed, is undoubtedly a great asset to the scientific community. However, manual retrieval of relevant published information is an extremely time consuming task and overviewing the data is nearly impossible. Here, we present the development of an automatic classifier for chemical exposure information. First, nearly 3700 abstracts were manually annotated by an expert in exposure sciences according to a taxonomy exclusively created for exposure information. Natural Language Processing (NLP) techniques were used to extract semantic and syntactic features relevant to chemical exposure text. Using these features, we trained a supervised machine learning algorithm to automatically classify PubMed abstracts according to the exposure taxonomy. The resulting classifier demonstrates good performance in the intrinsic evaluation. We also show that the classifier improves information retrieval of chemical exposure data compared to keyword-based PubMed searches. Case studies demonstrate that the classifier can be used to assist researchers by facilitating information retrieval and classification, enabling data gap recognition and overviewing available scientific literature using chemical-specific publication profiles. Finally, we identify challenges to be addressed in future development of the system.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Chemical risk assessment.
The process of a chemical risk assessment includes exposure assessment, hazard identification, hazard characterization and risk characterization [1, 2].
Fig 2
Fig 2. The NLP pipeline for automatic classification of document abstracts.
Chem: Chemical lists, MeSH: Medical Subject Headings, GR: Grammatical Relations, LBOW: Lemmatized Bag of Words, N.Bigrams: Noun Bigrams, VC: Verb Clusters, NE: Named Entities.
Fig 3
Fig 3. Results of the intrinsic evaluation.
The color coding is based on F-scores (Green = >75%, yellow = 50–75%, red = <50%).
Fig 4
Fig 4. Publication profiles of exposure information about 4-NP, HCB and lead.
The percentages of the total number of abstracts retrieved from PubMed and considered relevant for the full taxonomy are presented. The total number of abstracts was 130 for 4-NP, 722 for HCB and 7753 for lead.
Fig 5
Fig 5. Publication profiles for exposure biomarkers and exposure routes for different phthalate esters.
Fig 6
Fig 6. Publication profiles for effect biomarkers related to exposure to different phthalate esters.

References

    1. FAO/WHO. Application of risk analysis to food standard issues. Report of the joint FAO/WHO consultation. Geneva; 1995.
    1. NRC. Risk assessment in the federal government. Managing the process. Washington: National Academy Press; 1983. - PubMed
    1. Angerer J, Ewers U, Wilhelm M. Human biomonitoring: State of the art. International Journal of Hygiene and Environmental Health. 2007;210(3–4):201–28. 10.1016/j.ijheh.2007.01.024 - DOI - PubMed
    1. Hunter L, Cohen KB. Biomedical Language Processing: What's Beyond PubMed? Molecular Cell. 2006;21(5):589–94. 10.1016/j.molcel.2006.02.012 - DOI - PMC - PubMed
    1. Simpson MS, Demner-Fushman D. Biomedical text mining: A survey of recent progress. Mining Text Data: Springer; 2012. pp.465–517.

LinkOut - more resources