[Share] 10-Million-Level Baidu Zhidao Corpus

By 苏剑林 | January 30, 2018

Release

January 30, 2018

Quantity

Total of 10 million entries

Format

[
 {
 "url": "http://zhidao.baidu.com/question/565618371557484884.html",
 "question": "What are the vocational colleges for learning to be a clerk?",
 "tags": [
 "School",
 "Junior College",
 "Institution Information"
 ]
 },
 {
 "url": "http://zhidao.baidu.com/question/2079794100345438428.html",
 "question": "Is there a difference between online gambling and gambling in Macau?",
 "tags": [
 "Network",
 "Macau",
 "Gambling"
 ]
 }
]

Purpose

Think for yourself

Source

Obtained through several months of self-conducted continuous monitoring and crawling.

Instructions

This shared data is for learning and research purposes only. Please do not use it for any commercial or illegal purposes. The user is solely responsible for any adverse consequences caused by the irregular use of this corpus.

Author

Su Jianlin (http://kexue.fm)

Download

Link: https://pan.baidu.com/s/1zzDobW9FY7JXP6c_9QChdg Password: 7shl

Compressed size: 300MB+; Uncompressed size: 2GB+

When republishing, please include the address of this article: https://kexue.fm/archives/5067

If you think this article is good, feel free to share or tip this article. Tipping is not about making a profit, but about knowing how much sincere attention Scientific Spaces has received from readers. Of course, if you ignore it, it will not affect your reading. Welcome and thanks again!