Poetry Robot Based on CNN and VAE: Random Poetry Generation

By 苏剑林 | March 24, 2018

A few days ago, I wrote a popular science interpretation of VAE, which received recognition from some readers. However, are you tired of every introduction only featuring a MNIST-level demo? Don't worry, I'm bringing you a more classic VAE toy: a poetry-composing robot.

Why do I say "more classic"? In the previous article, we mentioned that images generated by VAE tend to be blurrier than those generated by GANs, meaning VAE is at a disadvantage in the image "battle." However, in the realm of text generation, VAE has won quite handsomely. This is because GANs attempt to directly train a discriminator (metric); however, for text, this metric is likely discrete and non-differentiable, making pure GANs very difficult to train. VAE does not have this step; it operates by reconstructing the input, a process that can be performed for both images and text. Therefore, text generation is a basic, direct application for VAE, much like image generation; for (current) GANs, however, it remains a symbol of hardship and a persistent "headache."

Well, in ancient times Cao Zhi composed poetry in seven steps; today we have VAE generating poetry randomly. Let's begin~

Model

For many people, poetry is a wonderful thing. The beauty lies in the fact that most people don't truly understand poetry, yet everyone has a superficial understanding of what poetry looks like. Therefore, as long as the generated "poetry" looks somewhat decent, we usually believe the robot can compose poetry. Thus, the so-called poetry robot is a pure toy; being able to compose a few lines of poetry does not mean its general language generation ability is high, nor does it mean our understanding of NLP has deepened significantly.

CNN + VAE

As far as the toy in this article is concerned, it is actually a relatively simple model, primarily combining 1D CNN with VAE. Since the length of the generated poetry is fixed, I used pure CNN for both the encoder and the decoder. The model structure diagram looks something like this:

cnn + vae poetry generation model

Specifically, each character is first embedded into a vector, then stacked CNNs are used for encoding, followed by pooling to obtain an encoder result. Based on this result, the mean and variance are calculated, a normal distribution is generated, and re-sampling is performed. In the decoding stage, since there is currently only one encoder output result but multiple characters need to be output at the end, I first connected multiple different fully connected layers to obtain diverse outputs, followed by more fully connected layers.

GCNN

The CNN here is not an ordinary CNN+ReLU, but the GCNN proposed by Facebook. It essentially involves creating two different CNNs with the same shape: one without an activation function and one using a sigmoid activation, then multiplying the results together. In this way, the sigmoid part acts as a "gate."

The first time I saw GCNN was in the paper "Language Modeling with Gated Convolutional Networks", and I saw it again in "Convolutional Sequence to Sequence Learning". There is also a brief introduction in my post "Sharing a Slide: Fancy Natural Language Processing".

Based on actual tests, GCNN performs significantly better than ordinary CNN+ReLU on many NLP tasks.

Experiment

The experiment was completed based on Python 2.7 and Keras (Tensorflow backend)~

Code

With the previous discussion and by combining Keras's built-in VAE example, implementing the entire model is not difficult. For demonstration purposes, I only selected the simplest five-character poetry, and I didn't require the generation of a complete poem—just a single line (10 characters), which is actually more like composing couplets.

Code: https://github.com/bojone/vae/blob/master/vae_shi.py

The training corpus used is the Complete Tang Poems (Quan Tangshi), which has also been placed on GitHub. The model has not undergone extensive hyperparameter tuning, which is left for interested readers to tinker with and enhance.

Training

To observe the changes in the generated poetry lines during the training process, I wrote an evaluator. The effect is shown below:

Poetry robot training process

As can be seen, as the number of training iterations increases, the quality of the poetry lines indeed improves.

Testing

Below are some poetry lines randomly generated by the trained model. Strictly speaking, they aren't great—this is, after all, just a mapping from random numbers to poetry lines, essentially "randomly producing poetry"~ It can be seen that these "poetry lines" look somewhat decent in terms of antithesis and tonal patterns (pingze).

出上无花客，相来一日时。
从瞻大车策，萧盖偃车矛。
今见青衣去，萧凉白叶风。
帝城今不战，征罪在天兵。
鹤仰临山里，逶留出太回。
画关斜过水，残色迥过杨。
上道皆有战，四门不如兵。
涧烟含雨沥，风影动风风。
登回一落景，一处更相期。
芳酒不无醉，长楼酒更春。
天月满云管，楚女长南闻。
明今今有矣，不得道无生。
朝明开绿菊，香影下红枝。
世外相扁处，逍遥无旧目。
唯闻含玉色，讵见月光光。
春风将乐节，风服未谁娱。
万年君何在，今年酒未新。
回辔参相召，乘歌会使程。
今有千人醉，不然一子心。
今风泛云会，清日白衣新。

Note: The demo model only generates single lines of poetry; therefore, there is no correlation between the lines themselves.

Conclusion

The experiment in this article was not intended to produce high-quality poetry, but rather to serve as a demonstration of text generation based on VAE. There are similar poetry robots online, but most popular poetry robots are based on the "RNN + Language Model" approach, which generally requires some seed words as input to complete the poem. VAE, however, truly achieves the process of mapping random numbers to poetry lines~