A More Unique Word Vector Model (6): Code, Sharing, and Conclusion

By 苏剑林 | November 19, 2017

List

A More Unique Word Vector Model (1): simpler glove

A More Unique Word Vector Model (2): Modeling Language

A More Unique Word Vector Model (3): Description-Related Models

A More Unique Word Vector Model (4): Solving the Model

A More Unique Word Vector Model (5): Interesting Results

A More Unique Word Vector Model (6): Code, Sharing, and Conclusion

Code

The implementation of this article is located at: https://github.com/bojone/simpler_glove

The source code is modified from Stanford's original GloVe. I have only made minor modifications because the main difficulty lies in the statistics of co-occurrence frequencies, and I am grateful to the predecessors at Stanford for providing such a classic and excellent statistical implementation case. In fact, I am not familiar with the C language, so the modifications I have made might not be of high caliber; I hope experts will offer corrections.

In addition, to implement the "interesting results" from the previous section, I have supplemented simpler_glove.py in the GitHub repository. It encapsulates a class that can directly read the model files (txt format) exported by the C version of simple glove and includes some commonly used functions for easy calling.

Sharing

Here is a set of Chinese word vectors trained using the model in this article. The corpus was trained on Baidu Baike, with a total of 1 million articles, approximately 300,000 words, and a word vector dimension of 128. A special treatment was applied during tokenization: all numbers and English characters were split into individual digits and letters. For friends who need to conduct experiments, you can download them:

Link: http://pan.baidu.com/s/1jIb3yr8

Password: 1ogw

Conclusion

This article can be considered a relatively complete exploration of word vector models. It is also a result of my "theoretical compulsion." Fortunately, in the end, I obtained a model that looks theoretically elegant, which has partially cured my compulsion. As for experimental effects, applications, etc., they remain to be further verified in future use.

Most of the derivations in this article can be imitated to explain the experimental results of word2vec's skip-gram model, which readers can try. In fact, word2vec's skip-gram model does exhibit similar performance to the model in this article, including the properties of the word vector model.

Overall, combining theory with experiment is a wonderful thing, but it is also a very exhausting task, as the contents above alone took me several months of thinking time.

If reprinting, please include the original address: https://kexue.fm/archives/4681