ARXIV Math Paper Distribution: Partial Differential Equations are the Most Popular!

By 苏剑林 | Nov 13, 2015

The author has successfully been recommended for admission to the Pure Mathematics graduate program at Sun Yat-sen University. While this major is quite theoretical, I will continue to maintain my interest in data analysis, computer science, and other fields. Recently, I felt inspired to conduct research combining my major with data mining. Therefore, I crawled math papers from ARXIV for the past five years (2010 to 2014), including data such as titles, categories, years, and months, to perform a simple analysis of the "market" of mathematics in recent years. Personally, I believe that as ARXIV is currently the world's largest electronic database for preprints of scientific papers, analyzing its data can lead to conclusions that are representative to a certain extent.

Of course, this article is intended as a practice piece for web crawling and basic data analysis, and it hasn't excavated particularly high-value information. At the end of the post, I've attached the data I crawled for interested readers to further analyze and study.

Overall Situation

Over these five years, the total number of math papers on ARXIV was 135,009, averaging 27,000 papers per year, or 74 papers per day.

In terms of categories, the top fifteen categories by paper count are:

Category Paper Count
Analysis of PDEs (math.AP) 9417
Probability (math.PR) 9064
Combinatorics (math.CO) 8937
Mathematical Physics (math-ph) 8852
Information Theory (cs.IT) 8215
Algebraic Geometry (math.AG) 7524
Number Theory (math.NT) 6789
Differential Geometry (math.DG) 6495
Dynamical Systems (math.DS) 4834
Functional Analysis (math.FA) 4375
Numerical Analysis (math.NA) 4058
Optimization and Control (math.OC) 4015
Classical Analysis and ODEs (math.CA) 3511
Representation Theory (math.RT) 3431
Geometric Topology (math.GT) 3256

In a sense, this table represents the popularity of various directions in mathematics. First, the top rank is Partial Differential Equations (PDEs). It is somewhat related to Mathematical Physics in fourth place; both generally represent the application of PDEs, especially in fields such as physics and biology. Second is Probability. Since almost any phenomenon in our world involves randomness, this naturally drives development in this direction, so the popularity of probability is also logical. Third is Combinatorics, which represents discrete mathematics. Fifth is Information Theory, which should be a result of the development of data mining in recent years. Following these are algebraic geometry, number theory, differential geometry, dynamical systems, functional analysis, numerical analysis, optimization and control, etc., which are all relatively frontier and popular fields in mathematics.

Next, I broke down the titles to see which words appear most frequently. As expected, the most common are stop words like of, and, the, for, in, with, a, on which have no special meaning. After removing these stop words, the results are:

equations(5172), groups(4782), spaces(4531), systems(4422), random(3980), functions(3906), quantum(3817), equation(3720), algebras(3686), theory(3459), graphs(3437), problem(3337), finite(3275), model(3216), solutions(3097), theorem(3014), operators(2880), linear(2718), generalized(2622), type(2579), group(2565), space(2402), manifolds(2363), analysis(2315), stochastic(2278), problems(2235), models(2161), surfaces(2156), applications(2060), nonlinear(2017), approach(1961), local(1930), polynomials(1922), method(1919), fields(1886), differential(1882), new(1874), optimal(1869), function(1854), boundary(1789), number(1768), sets(1766), curves(1751)

The first word, equations, and the eighth, equation, likely correspond to the PDE category. These are followed by groups (group theory), spaces, etc., which likely represent mainstream methods in current mathematical research—namely, placing objects of study into certain spaces and studying them using functional analysis and abstract algebra (especially group theory). Interestingly, the word quantum also ranks high, indicating that mathematical research set against the backdrop of quantum theory is also flourishing. Readers can evaluate other results for themselves.

Yearly Changes

Having looked at the overall situation, we can examine the yearly changes. First, the total number of papers per year shows that the volume of articles is increasing every year:

Total papers per year
Total papers per year

Then, let's look at the five categories with the most papers over the five years to see which fields are gradually becoming more popular.

2010 Math Physics (1619), Probability (1437), Algebraic Geometry (1358), PDE Analysis (1319), Combinatorics (1297)

2011 Math Physics (1809), Probability (1671), Combinatorics (1605), PDE Analysis (1545), Algebraic Geometry (1414)

2012 Math Physics (2005), PDE Analysis (1866) [Note: source text shows 1319 again in parentheses, likely a typo for the 2012 count which was higher], Combinatorics (1826), Probability (1824), Information Theory (1616)

2013 PDE Analysis (2211), Probability (2027), Combinatorics (2020), Information Theory (1958), Math Physics (1773)

2014 PDE Analysis (2464), Combinatorics (2189), Probability (2105), Information Theory (2008), Math Physics (1646)

It can be seen that for the first three years, the Mathematical Physics direction ranked first in the number of papers. However, in the last two years, while the total number of papers increased, the count for Mathematical Physics saw a significant decline. This seems to indicate that the Mathematical Physics direction may have hit a bottleneck. In contrast, Analysis of PDEs has increased year by year and gradually moved into first place, showing that research into partial differential equations remains a mainstream field in contemporary mathematics.

Which categories are growing the fastest? Below I have picked those that I believe are particularly representative.

The first is Systems and Control (cs.SY). The paper counts for these five years were 9, 96, 112, 139, 135. Somewhat related to this is Optimization and Control (math.OC), with counts of 423, 545, 778, 980, 1289 over the five years.

Systems Control
Systems Control

Optimization Control
Optimization Control

Furthermore, Numerical Analysis is becoming increasingly popular. Its paper count has increased year by year with a relatively large growth rate; the counts for the five years were 435, 571, 778, 1012, 1262. These conditions indicate that the combination of mathematics and computer science is one of the mainstream trends in mathematical development. Other categories reflecting this trend include Computational Physics (physics.comp-ph), Computational Geometry (cs.CG), and Computer Vision and Pattern Recognition (cs.CV).

Numerical Analysis
Numerical Analysis

Related
Related

The author used a simple metric to measure the growth rate of a category:

\[ \sum_{n=2010}^{2013}\frac{\text{Number of papers in year } (n+1)}{\text{Number of papers in year } n} \]

To be clear, this metric is very simple and not necessarily accurate; it is only used for intuitive perception. The categories with the largest growth rates filtered by this metric are as follows. Incredibly, many of these fields have some connection to computer science. I believe this is not a coincidence.

Category 2010 2011 2012 2013 2014
Earth and Planetary Astrophysics (astro-ph.EP) 1 11 12 3 7
Systems and Control (cs.SY) 9 96 112 139 135
Other Condensed Matter (cond-mat.other) 8 3 8 1 7
Databases (cs.DB) 1 6 5 1 2
Other Statistics (stat.OT) 1 6 4 4 5
Cellular Automata and Lattice Gases (nlin.CG) 5 1 8 3 1
Computation and Language (cs.CL) 3 3 1 4 14
History and Philosophy of Physics (physics.hist-ph) 6 9 1 4 12
Social and Information Networks (cs.SI) 4 11 19 21 15
Neural and Evolutionary Computing (cs.NE) 5 6 4 15 9
Cell Behavior (q-bio.CB) 2 6 3 2 4
Software Engineering (cs.SE) 1 3 4 4 2
Networking and Internet Architecture (cs.NI) 29 44 56 76 118
Physics and Society (physics.soc-ph) 5 11 15 15 15
Chemical Physics (physics.chem-ph) 7 8 4 13 8
High Energy Physics - Lattice (hep-lat) 3 7 11 9 7
Discrete Mathematics (cs.DM) 54 88 125 187 152
Machine Learning (stat.ML) 30 43 60 86 91
Optimization and Control (math.OC) 423 545 778 980 1289
Computational Physics (physics.comp-ph) 15 20 35 48 40
Data Structures and Algorithms (cs.DS) 35 50 81 90 99
Numerical Analysis (math.NA) 435 571 778 1012 1262
Cryptography and Security (cs.CR) 21 37 35 66 40
Numerical Analysis (cs.NA) 26 26 47 44 61
Quantitative Methods (q-bio.QM) 7 14 15 8 12
Adaptation and Self-Organizing Systems (nlin.AO) 9 13 23 15 18
Computer Vision and Pattern Recognition (cs.CV) 14 22 24 25 34
Computational Geometry (cs.CG) 20 29 45 55 45
Solar and Stellar Astrophysics (astro-ph.SR) 2 6 6 5 1
Artificial Intelligence (cs.AI) 8 13 22 15 15

Attachments Download

Finally, I'm providing the files I crawled; interested readers can take them for further analysis.

arxiv.zip

When reposting, please include the address of this article: https://kexue.fm/archives/3511