Coupled generation
- PMID: 36465716
- PMCID: PMC9718422
- DOI: 10.1080/01621459.2020.1844719
Coupled generation
Abstract
Instance generation creates representative examples to interpret a learning model, as in regression and classification. For example, representative sentences of a topic of interest describe the topic specifically for sentence categorization. In such a situation, a large number of unlabeled observations may be available in addition to labeled data, for example, many unclassified text corpora (unlabeled instances) are available with only a few classified sentences (labeled instances). In this article, we introduce a novel generative method, called a coupled generator, producing instances given a specific learning outcome, based on indirect and direct generators. The indirect generator uses the inverse principle to yield the corresponding inverse probability, enabling to generate instances by leveraging an unlabeled data. The direct generator learns the distribution of an instance given its learning outcome. Then, the coupled generator seeks the best one from the indirect and direct generators, which is designed to enjoy the benefits of both and deliver higher generation accuracy. For sentence generation given a topic, we develop an embedding-based regression/classification in conjuncture with an unconditional recurrent neural network for the indirect generator, whereas a conditional recurrent neural network is natural for the corresponding direct generator. Moreover, we derive finite-sample generation error bounds for the indirect and direct generators to reveal the generative aspects of both methods thus explaining the benefits of the coupled generator. Finally, we apply the proposed methods to a real benchmark of abstract classification and demonstrate that the coupled generator composes reasonably good sentences from a dictionary to describe a specific topic of interest.
Keywords: Classification; Natural language processing; Numerical embeddings; Semisupervised generation; Unstructured data.
Figures

Similar articles
-
Enhancing Text Generation via Parse Tree Embedding.Comput Intell Neurosci. 2022 Jun 10;2022:4096383. doi: 10.1155/2022/4096383. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 35720896 Free PMC article.
-
Embedding Learning.J Am Stat Assoc. 2022;117(537):307-319. doi: 10.1080/01621459.2020.1775614. Epub 2020 Jul 20. J Am Stat Assoc. 2022. PMID: 36936129 Free PMC article.
-
Fast and scalable neural embedding models for biomedical sentence classification.BMC Bioinformatics. 2018 Dec 22;19(1):541. doi: 10.1186/s12859-018-2496-4. BMC Bioinformatics. 2018. PMID: 30577747 Free PMC article.
-
Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization.Sensors (Basel). 2022 Dec 26;23(1):249. doi: 10.3390/s23010249. Sensors (Basel). 2022. PMID: 36616847 Free PMC article.
-
Adversarial active learning for the identification of medical concepts and annotation inconsistency.J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18. J Biomed Inform. 2020. PMID: 32687985
References
-
- Bishop CM. Pattern recognition and machine learning. springer, 2006.
-
- Bottou L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
-
- Breiman L. Random forests. Machine Learning, 45(1):5–32, 2001.
-
- Caruana R, Lawrence S, and Giles CL. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems, pages 402–408, 2001.
-
- Cheng J and Lapata M. Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252, 2016.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources