【The Incredible Word2Vec】6. Keras Version of Word2Vec

By 苏剑林 | August 06, 2017

Preface

After seeing the TF version of Word2Vec I wrote before, Yin Shen from the Keras group asked me if there was a Keras version. In fact, before making the TF version, I had written a Keras version, but I didn't keep it, so I rewrote it—it's more efficient, and the code looks better. This is a pure Keras implementation of Word2Vec. The principles are the same as in "【The Incredible Word2Vec】5. Tensorflow Version of Word2Vec". I'm releasing it now; I think someone will need it. (For example, adding extra inputs yourself to create a better word vector model?)

Since Keras simultaneously supports multiple backends like TensorFlow, Theano, and CNTK, this is equivalent to implementing Word2Vec for multiple frameworks. Well, it sounds quite high-end when you think about it that way, haha~

Code

GitHub: https://github.com/bojone/tf_word2vec/blob/master/word2vec_keras.py

Key Points

Above is the code for the CBOW model; if you need Skip-Gram, please modify it yourself. Keras code is so simple that it is easy to modify.

Looking through the code, you will find that the part for building the model takes less than 10 lines. In fact, writing the CBOW model is very simple; the only difficulty lies in implementing a sampled version of softmax (randomly selecting several targets for softmax instead of the full softmax) to improve efficiency. In Keras, this is implemented by manually writing the Dense layer rather than using the built-in Dense layer. The specific steps are: 1. Generate random integers via random_uniform, which serve as the negative sample IDs, and then concatenate them with the target input to form a sample; 2. Use an Embedding layer to store the weights of the softmax; 3. Extract the weights in the sample to form a small matrix, and then use the K backend for matrix multiplication, effectively implementing a sampled version of the Dense layer. Look over the code a few times and you will understand.

Finally, in terms of running speed, it definitely cannot compete with the Gensim version or the original version of Word2Vec. Using Keras is mainly for its high flexibility—everyone should keep this in mind.