By 苏剑林 | January 14, 2019
Origin # A few days ago, I saw an article titled "This Brain-Twisting Couplet AI is Driving Everyone Crazy" on the Liangziju (Quantum Bit) WeChat official account. I found it quite interesting, and more importantly, the author organized and released the dataset, so I decided to try it myself.
"Writing couplets" can be viewed as a sentence generation task, which can be completed using seq2seq, similar to my previous post "Playing with Keras: seq2seq Automatic Title Generation," just with a slight modification to the input. The method used in the aforementioned article was also seq2seq, which seems to be the standard approach.
However, if we think about it further, we will find that compared to general sentence generation tasks, "writing couplets" is much more regular: 1. The number of characters in the upper and lower scrolls is the same; 2. There is a character-to-character correspondence between almost every character in the upper and lower scrolls. In this way, writing couplets can be directly treated as a sequence labeling task, following the same approach as word segmentation or Named Entity Recognition (NER). This is the starting point of this article.
Speaking of which, this article actually has very little "technical content," as sequence labeling is a very common task, much simpler than general seq2seq. Sequence labeling refers to inputting a sequence of vectors and outputting another sequence of typically the same length, and finally classifying "each frame" of this sequence. Related concepts can be further explored in the article "A Brief Introduction to Conditional Random Fields (CRF) with a Pure Keras Implementation."
I will introduce the model while writing the code. Readers who need to further understand the underlying basic knowledge can also refer to: "[Chinese Word Segmentation Series] 4. seq2seq Character Labeling Based on Bidirectional LSTM," "[Chinese Word Segmentation Series] 6. Chinese Word Segmentation Based on Fully Convolutional Networks," and "Poetry-Writing Robot Based on CNN and VAE: Random Poem Generation."
The model code we use is as follows:
x_in = Input(shape=(None,))
x = x_in
x = Embedding(len(chars)+1, char_size)(x)
x = Dropout(0.25)(x)
x = gated_resnet(x)
x = gated_resnet(x)
x = gated_resnet(x)
x = gated_resnet(x)
x = gated_resnet(x)
x = gated_resnet(x)
x = Dense(len(chars)+1, activation='softmax')(x)
model = Model(x_in, x)
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam')
Where gated_resnet is the gated convolution module I defined (also introduced in the article "A Reading Comprehension Question-Answering Model Based on CNN: DGCNN"):
def gated_resnet(x, ksize=3):
# Gated Convolution + Residual
x_dim = K.int_shape(x)[-1]
xo = Conv1D(x_dim*2, ksize, padding='same')(x)
return Lambda(lambda x: x[0] * K.sigmoid(x[1][..., :x_dim]) \
+ x[1][..., x_dim:] * K.sigmoid(-x[1][..., :x_dim]))([x, xo])
And that's it! The rest is just data preprocessing. Of course, readers can also try replacing gated_resnet with a standard bidirectional LSTM, but in my experiments, I found that bidirectional LSTM did not perform as well as gated_resnet, and LSTM is also relatively slower, so LSTM was abandoned here.
The training dataset comes from: https://github.com/wb14123/couplet-dataset, thanks to the author for organizing it.
Complete code: https://github.com/bojone/seq2seq/blob/master/couplet_by_seq_tagging.py
Training process:
Partial results:
It seems to have a bit of flair. Note that "晚风摇树树还挺" is an upper scroll from the training set; the standard lower scroll is "晨露润花花更红" (Morning dew moistens the flowers, the flowers are redder), while the model gave "夜雨敲花花更香." This shows that the model isn't simply memorizing the training set but has some level of "understanding"; I even feel the lower scroll provided by the model is somewhat more vivid.
Overall, basic character correspondence seems achievable, but it lacks a sense of global unity. The overall effect is not as good as the two below, but as a small toy, it should be satisfactory.
Finally, there isn't much to summarize. I just felt that writing couplets should be considered a sequence labeling task, so I wanted to try using a sequence labeling model to see how it goes, and the results turned out okay. Of course, to perform better, adjustments to the model are needed. One could also consider introducing Attention, etc., and during decoding, more prior knowledge needs to be introduced to ensure the results comply with the requirements of couplets. I'll leave these tasks for interested readers to continue.
Cite as: Su Jianlin. (Jan. 14, 2019). "Couplet Robot Based on CNN and Sequence Labeling" [Blog post]. Retrieved from https://kexue.fm/archives/6270