The Intuitive Origin and Clever Uses of Entropy

By 苏剑林 | February 20, 2016

In my previous work "Can't Afford 'Entropy': From Entropy, the Maximum Entropy Principle to the Maximum Entropy Model (Part 1)", I introduced entropy from a more "professional" perspective and provided an interpretation. However, as a measure of uncertainty, entropy should have a more common and intuitive origin. This article aims to supplement that part and provide some clever applications as a result.

The Intuitive Origin of Entropy

Consider natural numbers composed of the ten digits 0–9. If we require the number to be less than 10,000, there are naturally 10,000 possibilities. If we say "a certain natural number less than 10,000," then any number from 0 to 9,999 could appear, so 10,000 becomes a measure of the uncertainty of this event. Similarly, consider a sequence of length $m$ composed of $n$ different elements (where elements can be reused). There are $n^m$ possible cases for this sequence, and here $n^m$ also serves as a measure of the uncertainty of this situation.

$n^m$ is in an exponential form, and the numbers can become exceptionally large. Therefore, we take the logarithm to get $m \log n$, which can also serve as a measure of uncertainty. This is consistent with our original definition of entropy, because:

\[ m \log n = -\sum_{i=1}^{n^m} \frac{1}{n^m} \log \frac{1}{n^m} \]

Readers might wonder: since both $n^m$ and $m \log n$ can be considered measures of uncertainty, what is the reason for choosing $m \log n$ instead of $n^m$? The answer is additivity. The measure after taking the logarithm possesses additivity, which makes calculations convenient. Of course, additivity is just a requirement for convenience rather than a necessity. If the $n^m$ form were used, it would correspondingly possess multiplicativity.

Application 1: Efficiency of Sorting Algorithms

Efficient sorting algorithms are essential for programming professionals. We now want to estimate the average efficiency of the theoretically most efficient sorting algorithm. Suppose we want to sort $n$ numbers from smallest to largest. Since this must be a general algorithm, it must be applicable to the general case. Thus, we assume the $n$ numbers are distinct. There are $n!$ possible permutations of these numbers, and the entropy is $\log (n!)$. Using Stirling's formula, we have:

\[ \log (n!) \sim \log\left[\sqrt{2\pi n}\left(\frac{n}{e}\right)^n\right] \sim \mathcal{O}(n \log n) \]

After sorting from smallest to largest, the situation becomes unique, at which point the entropy is 0. Therefore, from an information theory perspective, sorting is a process of transitioning entropy from $\mathcal{O}(n \log n)$ to 0. In each operation, we typically swap the positions of two numbers, changing a certain amount of entropy. Assuming each operation is beneficial, the time used is proportional to $\mathcal{O}(n \log n)$. Of course, this is an average characterization, and it explains why the average peak efficiency of sorting algorithms is $\mathcal{O}(n \log n)$.

Application 2: How many times to shuffle a deck of cards

For card players, shuffling is a very common task. A new deck of cards is usually ordered. To mix them up, a common method is the riffle shuffle, where the deck is split into two halves and then interleaved. The question is: how many times must we riffle shuffle to consider the cards mixed? (Problem source: http://duodaa.com/blog/index.php/archives/463/)

Admittedly, there are many ambiguities here. First, the process of a "riffle shuffle" is not precisely defined; second, what "mixed up" means is not defined. However, we do not need precision, as we aren't actually using mathematics to shuffle cards. We can make a simple estimate using information entropy. Shuffling is somewhat like the inverse process of a sorting algorithm. Similarly, for $n$ cards, if they are sufficiently random, the entropy is $\mathcal{O}(n \log n)$. A riffle shuffle essentially changes the positions of $n$ cards, so the amount of entropy changed per shuffle is roughly proportional to $n$. Therefore, after shuffling approximately $\mathcal{O}(\log n)$ times, we can probably consider the deck mixed. The key result here is $\log n$, which is consistent with the results provided at http://duodaa.com/blog/index.php/archives/463/.

Reflections

Although this article is somewhat rough, it is quite clear: entropy is indeed a very magical thing!