By 苏剑林 | May 24, 2015
Finally, I can catch a breath~~ Friends who follow Scientific Space may have noticed that updates have been quite slow during this period. This all started from this year's winter break... At the end of January, due to various reasons and combined with my own interests, I looked for an internship. The job was Python programming. The recruitment was posted on the South China University of Technology forum; it was written very concisely, and I also submitted my resume quite concisely. Unexpectedly, I received a reply and was hired. I started working in February, and after joining, I realized that the company was a relatively well-known domestic e-commerce enterprise. My main job was data mining... Although I had a little experience with Python, I was basically a novice at data mining, so I could only learn while working, frantically making up for data mining knowledge. During this process, I learned many things about data mining. You should know that before this, I didn't know what a "feature" was, or what "logistic regression" or "SVM" meant... I was completely ignorant back then.
The new semester arrived quickly. Since there are fewer courses this semester, I have been able to maintain the state of attending classes while interning, which is still ongoing. after school started, I thought about whether the data mining knowledge I learned at work could play a role in my coursework. Consequently, I successively participated in several competitions or projects related to data mining, including: the "Liangjian Cup" of the School of Physics and Optoelectronics, a scientific research project of the School of Mathematical Sciences (Employment Demand Analysis based on Data Mining), research data analysis for the School of Information Technology in Education, and the "Teddy Cup" National Data Mining Challenge (I participated in this one last year too, but only received a consolation prize). These four projects, plus work, plus my regular courses, came wave after wave, making this semester exceptionally fulfilling. I have also benefited greatly from them. Today is the day to submit the paper for the Teddy Cup. When I uploaded the paper and clicked "Submit," the voice "Finally, I can catch a breath" surged in my mind.
Of course, where there is effort, there is reward. Throughout the process, I feel that I have gained immensely. On one hand, through this process (work + competitions + projects), my ability to process data with Python has greatly improved, and my understanding of Python has deepened significantly. On the other hand, what struck me most was that in two data mining competitions, I learned about an algorithm called "Deep Learning" (related to multiple neural networks) and cutting-edge data mining technologies like "Autoencoders." In fact, when I was learning algorithms like logistic regression and SVM, I was very confused: why are there so many algorithms? Why is there no universal and effective algorithm? Why do these algorithms work? I wasn't clear on these questions at first. It wasn't until I encountered neural networks and deep learning that I basically found a satisfying answer. Neural networks (deep learning is neural networks in a broad sense) are the kind of universal and effective algorithm I was looking for; their principle is to use multiple composite functions for fitting. If I have the chance, I will share my experience with neural networks in another article.
Here, I am willing to spend a bit more space talking about deep learning because it is truly worth understanding, even if it's just a conceptual understanding. It is said that this is currently the algorithm closest to artificial intelligence (without exception). The reason it is called universal is simple: it originally simulates the human thinking process (problems are proposed by people, and people can think; if an algorithm can simulate humans, then the algorithm itself must be very powerful). A very important characteristic of human thinking is abstraction. You might not have noticed that the process of "abstraction" is actually a process of information loss; it’s just that the lost information is what we consider unimportant. For example, if we study basketball, volleyball, football, etc., we can abstract some of their overall information, such as they are all spheres, they are all sports tools, etc. This abstraction process allows us to recognize the commonalities of things while reducing the workload of the brain and improving processing efficiency. For computers, deep learning algorithms achieve exactly this process! Through multiple neural networks, they construct autoencoders to simulate the human "abstraction" process! (Of course, in such a short space, readers only need to understand this concept.)
The semester is halfway through, and the data mining content is mostly behind me. So, for me, what is next?
Back to my mathematics and physics!
Although deep learning is interesting and promising, I still admire the group of people who proposed the deep learning algorithms the most. When learning deep learning, I also tried to understand its essence from a mathematical perspective as much as possible. Therefore, I love mathematics more; I am more willing to work on theory and continue the mathematics and physics that I love. In the process of thinking about mathematical and physical theories, I can feel an inexplicable sense of pleasure and achievement. I prefer to build my ideas with just a pen and a piece of paper, rather than having to face a computer to complete them (although doing mathematics and physics inevitably involves computers, that is a different feeling). I am not sure what career I will choose in the future; I might even engage in work related to data mining or machine learning, but no matter what, that love for mathematics and physics has always been there.
Therefore, it is time, to continue my study on science.