Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022;117(539):1243-1253.
doi: 10.1080/01621459.2020.1844719. Epub 2021 Jan 4.

Coupled generation

Affiliations

Coupled generation

Ben Dai et al. J Am Stat Assoc. 2022.

Abstract

Instance generation creates representative examples to interpret a learning model, as in regression and classification. For example, representative sentences of a topic of interest describe the topic specifically for sentence categorization. In such a situation, a large number of unlabeled observations may be available in addition to labeled data, for example, many unclassified text corpora (unlabeled instances) are available with only a few classified sentences (labeled instances). In this article, we introduce a novel generative method, called a coupled generator, producing instances given a specific learning outcome, based on indirect and direct generators. The indirect generator uses the inverse principle to yield the corresponding inverse probability, enabling to generate instances by leveraging an unlabeled data. The direct generator learns the distribution of an instance given its learning outcome. Then, the coupled generator seeks the best one from the indirect and direct generators, which is designed to enjoy the benefits of both and deliver higher generation accuracy. For sentence generation given a topic, we develop an embedding-based regression/classification in conjuncture with an unconditional recurrent neural network for the indirect generator, whereas a conditional recurrent neural network is natural for the corresponding direct generator. Moreover, we derive finite-sample generation error bounds for the indirect and direct generators to reveal the generative aspects of both methods thus explaining the benefits of the coupled generator. Finally, we apply the proposed methods to a real benchmark of abstract classification and demonstrate that the coupled generator composes reasonably good sentences from a dictionary to describe a specific topic of interest.

Keywords: Classification; Natural language processing; Numerical embeddings; Semisupervised generation; Unstructured data.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A generated sentence by indirect and direct RNN generators in (20) and (15), where the RNN architecture is displayed, in which sentence “The instantaneous loss bound of SYMBOL implies only convergence in probability” with topic “MISC” is consecutively generated by words, ht is the hidden node of RNNs in (20) and (15), and h0 is the initial hidden state, which is zero under (15) and “MISC” under (20).

Similar articles

References

    1. Bishop CM. Pattern recognition and machine learning. springer, 2006.
    1. Bottou L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
    1. Breiman L. Random forests. Machine Learning, 45(1):5–32, 2001.
    1. Caruana R, Lawrence S, and Giles CL. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems, pages 402–408, 2001.
    1. Cheng J and Lapata M. Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252, 2016.

LinkOut - more resources