Running Python Scripts on Your Phone on a Schedule

By 苏剑林 | Oct 21, 2015

Without a doubt, data is the foundation of data analysis. For ordinary people like us, the most natural way to obtain large amounts of data is through web scraping. For me, the most straightforward way to write a crawler is using Python. In just a few lines of code, you can complete a practical crawler—it's very clean. (Please refer to: "Recording the Process of Crawling Taobao/Tmall Comment Data")

Where should the crawler reside?

The next question is, where should this crawler run? To crawl daily updated data, it often needs to run once a day, specifically at a certain scheduled time. In this case, it's not very realistic to keep it running on your own computer, as computers eventually need to be turned off. Some readers might think of using a cloud server; that is one method, but it involves additional costs. Inspired by the Great God Xiaoxia, I started thinking about running it on a router. Some high-end routers allow external USB drives and can be flashed with the Open-Wrt system (a Linux-kernel router system that can install Python like a normal Linux). This was a very attractive approach to me, but I am not familiar with compilation in a Linux environment, especially operations under a router environment. Additionally, router configurations are quite low—usually only 16M flash and 64M RAM—and without patience, it can be hard to endure.

I also thought about buying a Raspberry Pi for this. The current Raspberry Pi 2B already has 1GB of RAM, making it essentially a fully functional miniature PC, and it's not expensive, definitely worth playing with. However, putting aside the extra expense, if I bought a Raspberry Pi just to host a crawler, it seemed a bit wasteful. After considering various ideas, a thought suddenly popped up yesterday—besides laptops, the smart devices we use most often are mobile phones. Why not put it on a phone? Our phones are basically connected to Wi-Fi all day anyway, and while we sleep at night, the phone is essentially idle. Running collection on a phone seems to be the perfect fit!

A Difficult Exploration

With the idea in hand, I got to work. Of course, the entire process of exploration was rather painful.

First is the Python environment on the phone. This is simple because there is QPython available; it is a feature-rich Android Python environment that works right out of the box. There aren't many differences from PC Python, except for the inability to install many third-party libraries. Since the crawler needs to collect Chinese information, I chose QPython3. It is said that SL4A and PY4A are available, but they haven't been updated for a long time and the configuration is more complex, so I didn't bother trying them.

Then comes writing the crawler. When writing the crawler, be careful to use the built-in libraries in QPython3. I used to use the requests library for writing crawlers, which is compatible with both 2.x and 3.x. However, QPython3 does not come with this library. Online tutorials usually teach us to use the urllib2 library; in fact, this library only exists in 2.x and has been integrated into urllib in 3.x, so you need to change urllib2 to urllib.request. More importantly, QPython3 includes the re library for regular expressions, which is essential for writing crawlers!! Also, QPython3 has the csv library, which is convenient for saving results. I won't provide the specific crawler code; please explore that yourself.

Next comes the most difficult step: how to make the crawler run on a schedule on the phone??

Yesterday and today, I looked up a lot of information and tried many methods, failing numerous times. Finally, I figured out a feasible method.

Terminal Emulator Settings

The first step is execution. Install "Terminal Emulator" (I don't know if there are other similar apps that can achieve this). This is an Android terminal emulator that allows our Android phone to have a terminal like Linux. Most importantly, in its "Settings," you can choose to start with root, and you can also customize the initial command line. In the initial command line field (the "Initial command" option in the right image), enter:
/data/data/com.hipipal.qpy3/files/bin/qpython.sh /sdcard/com.hipipal.qpyplus/scripts3/582.py
The red part is the crawler script we wrote. After filling it in, close the Terminal Emulator and reopen it. If you find that the specified crawler script runs automatically when you open the terminal, then you have succeeded.

The second step is scheduling. You can install an app like "Schedule Master." There are many such apps; just pick one. As long as it supports running a specified app at a set time, we can create a scheduled task on it to run the Terminal Emulator.

Conclusion

Here is a screenshot of it running:

Mobile crawler interface

By now, we have completed writing a Python crawler on Android and setting up automated scheduled runs. In simple terms, we use QPython to run the Python script and then use an app like Schedule Master to run the app at a set time. How do we get QPython to execute our specific script? By specifying a startup command via the Terminal Emulator—this way, the Terminal Emulator becomes an app dedicated to running this crawler (of course, this seems a bit unfair to the Terminal Emulator, using a sledgehammer to crack a nut, but I couldn't think of a better way).