By 苏剑林 | August 13, 2016
Python is basically the only programming language I currently use for work, calculation, and data mining (except for symbolic computation, where I use Mathematica). Of course, basic Python functions are not very powerful, but its strength lies in the vast number of third-party extension libraries. When selecting a third-party library for Python, I always consider it carefully, hoping to pick the simplest and most intuitive one (because I'm not that smart and can't use things that are too complex). In terms of data processing, I use Numpy and Pandas the most—these two are definitely king-level libraries. Of course, I must mention Scipy, but I rarely use it directly; it’s usually called indirectly through Pandas. For visualization, it goes without saying that it's Matplotlib. In terms of modeling, I use Keras for deep learning models directly, as Keras has become a very popular deep learning framework. If I'm doing text mining, I usually also use jieba (for word segmentation) and Gensim (for topic modeling, including models like word2vec). For machine learning, there is also the popular Scikit-Learn, though I rarely use it. Regarding networking, I use requests for web crawling, which is a very user-friendly network library; if I'm writing a website, I use bottle, which is a single-file mini-framework where everything is defined by yourself. Of course, I don't write large-scale websites; I just write simple interfaces. Finally, for parallelism, I generally stick with multiprocessing.
However, none of the above are what this article intends to recommend. What I want to recommend are two libraries that can permeate your daily coding. They implement functions we need most of the time without adding much code, and they are truly eyes-opening.
1. tqdm
The introduction to tqdm only needs one GIF.
Simply put, it is used to display progress bars. It looks beautiful, its usage is very intuitive (just wrap an iterable with tqdm inside a loop), and it basically doesn't affect the efficiency of the original program. It is literally "too powerful and beautiful" (tài qiáng dà měi)! It makes writing programs that run for a long time much more comfortable!
2. retry
Exactly as its name suggests, retry is used to implement retry functionality. Many times we need a retry function; for example, when writing a web crawler, network issues can occasionally cause a crawl to fail, necessitating a retry. Usually, I would write it like this (retry 5 times, with a two-second interval):
import time
def do_something():
# xxx code here
pass
for i in range(5):
try:
do_something()
break
except:
time.sleep(2)
This is somewhat cumbersome. With retry, you only need:
from retry import retry
@retry(tries=5, delay=2)
def do_something():
# xxx code here
pass
do_something()
In other words, you just need to add the @retry decorator before the function definition.
Python is indeed absolutely worry-free~