. 2011 Sep-Oct;18(5):631-8.

doi: 10.1136/amiajnl-2010-000022. Epub 2011 Jun 27.

Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

Taxiarchis Botsis¹, Michael D Nguyen, Emily Jane Woo, Marianthi Markatou, Robert Ball

Affiliations

Affiliation

¹ Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, Maryland 20852, USA. taxiarchis.botsis@fda.hhs.gov

PMID: 21709163
PMCID: PMC3168300
DOI: 10.1136/amiajnl-2010-000022

Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

Taxiarchis Botsis et al. J Am Med Inform Assoc. 2011 Sep-Oct.

. 2011 Sep-Oct;18(5):631-8.

doi: 10.1136/amiajnl-2010-000022. Epub 2011 Jun 27.

Authors

Taxiarchis Botsis¹, Michael D Nguyen, Emily Jane Woo, Marianthi Markatou, Robert Ball

Affiliation

¹ Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, Maryland 20852, USA. taxiarchis.botsis@fda.hhs.gov

PMID: 21709163
PMCID: PMC3168300
DOI: 10.1136/amiajnl-2010-000022

Abstract

Objective: The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload.

Design: We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N(pos)=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations.

Measurements: Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed.

Results: Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively).

Conclusion: Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

**Figure 1**
Initially medical officers use specific MedDRA preferred terms (PT) or other keywords to extract the Vaccine Adverse Event Reporting System (VAERS) case reports of interest (usually a few thousand). Manual review requires two steps: (i) review of each case report (mainly symptom and laboratory text fields) and (ii) review of the medical record for a much smaller portion of case reports. For example, in the case of anaphylaxis, which was investigated in the current study, the PT and keyword search returned 6034 case reports that were reduced to 237 after manual review; the medical records (MR) for the latter portion of VAERS reports were obtained and reviewed resulting in 100 confirmed anaphylaxis cases.

**Figure 2**
An example of text mining processes for a case report for anaphylaxis using either the dictionary (left branch of the diagram) or the lexicon and the grammar rules (right branch of the diagram). The output of each case report is a vector of lemmas (type I vector), a vector of low-level patterns (type II vector), or a set of high-level patterns. The two types of vectors are extended by one position to include the class label for the report. The rule-based classifier classifies this report as potentially positive based on the identification of a high-level pattern (‘class’=1, ie, potentially positive). GI, gastrointestinal; MCDV, major cardiovascular; MDERM, major dermatologic; mDERM, minor dermatologic; MRESP, major respiratory; mRESP, minor respiratory.

See this image and copyright information in PMC

References

1. Sinha A, Hripcsak G, Markatou M. Large datasets in biomedicine: a discussion of salient analytic issues. J Am Med Inform Assoc 2009;16:759–67 - PMC - PubMed
1. Singleton JA, Lloyd JC, Mootrey GT, et al. An overview of the vaccine adverse event reporting system (VAERS) as a surveillance system. Vaccine 1999;17:2908–17 - PubMed
1. Ambert KH, Cohen AM. A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection. J Am Med Inform Assoc 2009;16:590–5 - PMC - PubMed
1. Cohen AM. Five-way smoking status classification using text hot-spot identification and error-correcting output codes. J Am Med Inform Assoc 2008;15:32–5 - PMC - PubMed
1. Conway M, Doan S, Kawazoe A, et al. Classifying disease outbreak reports using n-grams and semantic features. Int J Med Inform 2009;78:e47–58 - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

Affiliation

Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical