Factoid Question Answering with Distant Supervision

Hongzhi Zhang^{1

2}, Xiao Liang¹, Guangluan Xu^{1

2}, Kun Fu^{1

2

3}, Feng Li¹, Tinglei Huang¹

Affiliations

¹ Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China.
² School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China.
³ Institute of Electronics, Chinese Academy of Sciences, Suzhou, Suzhou 215123, China.

PMID: 33265529
PMCID: PMC7512957
DOI: 10.3390/e20060439

Factoid Question Answering with Distant Supervision

Hongzhi Zhang et al. Entropy (Basel). 2018.

. 2018 Jun 5;20(6):439.

doi: 10.3390/e20060439.

Authors

Hongzhi Zhang^{1

2}, Xiao Liang¹, Guangluan Xu^{1

2}, Kun Fu^{1

2

3}, Feng Li¹, Tinglei Huang¹

Affiliations

¹ Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China.
² School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China.
³ Institute of Electronics, Chinese Academy of Sciences, Suzhou, Suzhou 215123, China.

PMID: 33265529
PMCID: PMC7512957
DOI: 10.3390/e20060439

Abstract

Automatic question answering (QA), which can greatly facilitate the access to information, is an important task in artificial intelligence. Recent years have witnessed the development of QA methods based on deep learning. However, a great amount of data is needed to train deep neural networks, and it is laborious to annotate training data for factoid QA of new domains or languages. In this paper, a distantly supervised method is proposed to automatically generate QA pairs. Additional efforts are paid to let the generated questions reflect the query interests and expression styles of users by exploring the community QA. Specifically, the generated questions are selected according to the estimated probabilities they are asked. Diverse paraphrases of questions are mined from community QA data, considering that the model trained on monotonous synthetic questions is very sensitive to variants of question expressions. Experimental results show that the model solely trained on generated data via the distant supervision and mined paraphrases could answer real-world questions with the accuracy of 49.34%. When limited annotated training data is available, significant improvements could be achieved by incorporating the generated data. An improvement of 1.35 absolute points is still observed on WebQA, a dataset with large-scale annotated training samples.

Keywords: distant supervision; question answering; question paraphrase; reading comprehension.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
The structure of the QA model.

**Figure 2**
Statistics of WebQA and training data generated via distant supervision.

**Figure 3**
Distribution of the paraphrased predicates.

**Figure 5**
Factoid QA via distant supervision.

**Figure 6**
Improved factoid QA with distant supervision.

**Figure 7**
Curves of training loss and validation accuracy. SL denotes supervised learning. Pre-training+ SL denotes that the model is pre-trained on generated data and then trained on the annotated data. SL+ denotes the model simultaneously trained on generated data and annotated data.

See this image and copyright information in PMC

References

1. Berant J., Chou A., Frostig R., Liang P. Semantic Parsing on Freebase from Question-Answer Pairs; Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing; Seattle, WA, USA. 18–21 October 2013; pp. 1533–1544.
1. Bordes A., Usunier N., Chopra S., Weston J. Large-scale Simple Question Answering with Memory Networks. arXiv. 2015. 1506.02075
1. Sun H., Ma H., He X., Yih W.t., Su Y., Yan X. Table Cell Search for Question Answering; Proceedings of the 25th International Conference on World Wide Web; Republic and Canton of Geneva, Switzerland. 11–15 April 2016; pp. 771–782.
1. Rajpurkar P., Zhang J., Lopyrev K., Liang P. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv. 2016. 1606.05250
1. Li P., Li W., He Z., Wang X., Cao Y., Zhou J., Xu W. Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering. arXiv. 2016. 1607.06275

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Factoid Question Answering with Distant Supervision

Affiliations

Factoid Question Answering with Distant Supervision

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Other Literature Sources