By 苏剑林 | October 25, 2022
In recent days, the "four ducks in a semicircle" problem has been circulating widely on the internet:
Four ducks in a semicircle problem
Many readers may have tried to solve it upon seeing it, and even Teacher Li Yongle dedicated a lesson to this problem (refer to "Four ducks in a circular pond, what is the probability they are in the same semicircle?"). As for the problem itself, the answer is not particularly difficult and can be calculated using many methods. What is slightly more challenging is its generalized version—as described in the title of this article—where the number of ducks is generalized to $n$ and the semicircle is generalized to a sector with a central angle of $\theta$. Even more interesting is that when $\theta \leq \pi$, there are still relatively elementary solutions, but as soon as $\theta > \pi$, the complexity begins to "increase dramatically"...
First, it should be noted that we are treating the ducks as abstract points sampled uniformly at random within the circle. We will not consider generalizations like the "surface area occupied by the ducks." Regarding the "four ducks in a semicircle," it is very easy to generalize, for instance, to higher-dimensional spaces:
What is the probability that $n$ points uniformly distributed within a $d$-dimensional hypersphere lie within the same $d$-dimensional hyper-hemisphere?
The answer to this generalization was actually given in a paper published in 1962 titled "A Problem in Geometric Probability". Another generalization is the subject of this article's title:
What is the probability that $n$ points uniformly distributed within a circle lie within the same sector with a central angle of $\theta$?
This article primarily discusses this problem, which can be equivalently transformed into (readers might want to think for themselves how this transformation works):
If $n$ points are chosen uniformly and randomly on a circumference, dividing the circumference into $n$ arcs, what is the probability that the longest arc subtends a central angle greater than $2\pi - \theta$?
Furthermore, it can be equivalently transformed into:
If $n-1$ points are chosen uniformly and randomly on a line segment of unit length, dividing the segment into $n$ segments, what is the probability that the longest segment length is greater than $1 - \frac{\theta}{2\pi}$?
Actually, the distribution involved in the last equivalent formulation has already been well-discussed, for example, in the Zhihu post "What is the expected length of the longest segment if points are taken randomly in (0, 1) to divide the interval into n segments?". For the convenience of the reader, I will reorganize the explanation here. It will involve several steps, and the entire process is somewhat long, but it is necessary for finding a general solution (applicable when $\theta > \pi$).
As a side note, the extremes of multiple random variables belong to the category of "Order Statistics," which are quite common in machine learning. For example, the nearest neighbor estimation of entropy introduced in "EAE: Autoencoder + BN + Max Entropy = Generative Model" and the general construction of discrete distributions introduced in "Constructing Discrete Probability Distributions from the Perspective of Reparameterization" can both be reduced to this.
Starting from the last equivalent expression, let the $n-1$ random points divide the unit line segment into $n$ segments with lengths from left to right denoted as $x_1, x_2, \cdots, x_n$ (note: these are ordered by position, not by size). Denote their joint probability density function as $p_n(x_1, x_2, \cdots, x_n)$. The answer in the Zhihu link is based directly on the fact that "$p_n(x_1, x_2, \cdots, x_n)$ is a uniform distribution," but I felt that this fact is not entirely trivial, so I will derive it here.
First, note two facts:
1. $x_1, x_2, \cdots, x_n$ are subject to the constraint $x_1 + x_2 + \cdots + x_n = 1$, so there are only $n-1$ degrees of freedom. We take the first $n-1$ variables as independent variables.
2. $p_n(x_1, x_2, \cdots, x_n)$ is a probability density, not a probability. $p_n(x_1, x_2, \cdots, x_n)dx_1 dx_2 \cdots dx_{n-1}$ is the probability.
To understand that "$p_n(x_1, x_2, \cdots, x_n)$ is a uniform distribution," let's start with $n=2$. In this case, a single point is randomly chosen from $(0, 1)$ to divide the segment into two parts. Since the point is sampled from a uniform distribution (probability density is 1) and the point corresponds one-to-one with $(x_1, x_2)$, we have $p_2(x_1, x_2) = 1$.
Next, consider $n=3$. The corresponding probability is $p_3(x_1, x_2, x_3)dx_1 dx_2$. There are two possibilities:
1. Sample one point first, dividing the segment into two parts $(x_1, x_2 + x_3)$. This probability is $p_2(x_1, x_2 + x_3) dx_1$. Then sample another point, dividing the $x_2 + x_3$ segment into two parts $(x_2, x_3)$. This probability is $dx_2$. Multiplying them gives $p_2(x_1, x_2 + x_3) dx_1 dx_2$.
2. Sample one point first, dividing the segment into two parts $(x_1 + x_2, x_3)$. This probability is $p_2(x_1 + x_2, x_3) dx_2$. Then sample another point, dividing the $x_1 + x_2$ segment into two parts $(x_1, x_2)$. This probability is $dx_1$. Multiplying them gives $p_2(x_1 + x_2, x_3) dx_1 dx_2$.
Adding them together:
\begin{equation}p_3(x_1, x_2, x_3)dx_1 dx_2 = p_2(x_1, x_2 + x_3) dx_1 dx_2 + p_2(x_1 + x_2, x_3) dx_1 dx_2 = 2dx_1 dx_2\end{equation}Thus $p_3(x_1, x_2, x_3)=2$ is also a uniform distribution. By induction, we obtain:
\begin{equation}p_n(x_1,x_2,\cdots,x_n) = (n - 1) p_{n-1}(x_1,x_2,\cdots,x_{n-1}) = \cdots = (n - 1)!\end{equation}With the joint distribution, we can find the marginal distribution for any $k$ variables. This prepares us for the "inclusion-exclusion principle" in the next section. Since the joint distribution is uniform, the variables are symmetric; without loss of generality, we calculate the marginal distribution for the first $k$ variables.
By definition:
\begin{equation}p_n(x_1,x_2,\cdots,x_k) = \int \cdots \int p_n(x_1,x_2,\cdots,x_n) dx_{k+1}\cdots dx_{n-1}\end{equation}Care must be taken with the integration limits. The lower limit is naturally $0$. Because of the constraint $x_1 + x_2 + \cdots + x_n = 1$, given $x_1, x_2, \cdots, x_i$, the domain of $x_{i+1}$ is $0 \sim 1 - (x_1 + x_2 + \cdots + x_i)$. Thus, the precise form is:
\begin{equation}\int_0^{1 - (x_1 + x_2 + \cdots + x_k)} \cdots \left[\int_0^{1 - (x_1 + x_2 + \cdots + x_{n-2})} p_n(x_1,x_2,\cdots,x_n) dx_{n-1}\right] \cdots dx_{k+1}\end{equation}Since $p_n(x_1,x_2,\cdots,x_n)=(n-1)!$ is a constant, the above expression can be integrated sequentially. The final result is:
\begin{equation}p_n(x_1,x_2,\cdots,x_k) = \frac{(n-1)!}{(n - k - 1)!}[1 - (x_1 + x_2 + \cdots + x_k)]^{n-k-1}\end{equation}At this point, we can calculate the probability that $x_1, x_2, \cdots, x_k$ are each greater than given thresholds $c_1, c_2, \cdots, c_k$ (where $c_1 + c_2 + \cdots + c_k \leq 1$):
\begin{equation}\begin{aligned} &\, \qquad P_n(x_1 > c_1,x_2 > c_2,\cdots,x_k > c_k) = \int_{c_1}^{1 - (c_2 + c_2 + \cdots + c_k)} \cdots \\ &\, \left[\int_{c_{k-1}}^{1 - (x_1 + x_2 + \cdots + x_{k-2} + c_k)}\left[\int_{c_k}^{1 - (x_1 + x_2 + \cdots + x_{k-1})} p_n(x_1,x_2,\cdots,x_k) dx_k\right]dx_{k-1}\right] \cdots dx_1 \end{aligned}\end{equation}The upper integration limits here are derived under the constraint $x_1 + x_2 + \cdots + x_n = 1$, similar to before. Like $p_n(x_1,x_2,\cdots,x_n)$, this can be integrated sequentially. The final result is surprisingly simple:
\begin{equation}P_n(x_1 > c_1,x_2 > c_2,\cdots,x_k > c_k) = [1 - (c_1 + c_2 + \cdots + c_k)]^{n-1}\end{equation}Everything is now ready for the "inclusion-exclusion principle" to deliver the "final blow." Recall that our goal is to find the probability of the longest segment length, i.e., for a certain threshold $x$, we want to calculate $P_n(\max(x_1,x_2,\cdots,x_n) > x)$. The event $\max(x_1,x_2,\cdots,x_n) > x$ is itself the union of several events:
\begin{equation}\max(x_1,x_2,\cdots,x_n) > x \quad\Leftrightarrow\quad x_1 > x \,\color{red}{\text{OR}}\, x_2 > x \,\color{red}{\text{OR}}\, \cdots \,\color{red}{\text{OR}}\, x_n > x \end{equation}The key is that when $x < \frac{1}{2}$, these events are not disjoint. Therefore, we cannot simply add the individual probabilities. We must use the "Inclusion-Exclusion Principle":
\begin{equation}\begin{aligned} \text{Prob(at least one point} > x) = &\, \text{Sum of probabilities that any one point} > x \\ &\, \quad \color{red}{-} \text{Sum of probabilities that any two points are} > x \\ &\, \quad\quad \color{red}{+} \text{Sum of probabilities that any three points are} > x \\ &\, \quad\quad\quad \color{red}{-} \text{Sum of probabilities that any four points are} > x \\ &\, \quad\quad\quad\quad \color{red}{+} \cdots \end{aligned}\end{equation}Based on the previous results, the probability that any $k$ chosen points are each greater than $x$ is $(1-kx)^{n-1}$. The number of ways to choose $k$ points is $C_n^k$. Substituting these into the formula, we get:
\begin{equation}\begin{aligned} P_n(\max(x_1,x_2,\cdots,x_n) > x) =&\, \sum_{k=1, 1 - kx > 0}^n (-1)^{k-1} C_n^k (1-kx)^{n-1} \\ =&\, C_n^1 (1 - x)^{n-1} - C_n^2 (1 - 2x)^{n-1} + \cdots \end{aligned}\end{equation}For the problem stated at the beginning of the article, let $x = 1 - \frac{\theta}{2\pi}$. When $x > \frac{1}{2}$ (i.e., $\theta < \pi$), all subsequent terms $1 - kx$ for $k \geq 2$ are less than 0, meaning they do not actually exist. Thus, the answer is at its simplest:
\begin{equation}C_n^1 (1 - x)^{n-1} = n \left(\frac{\theta}{2\pi}\right)^{n-1}\end{equation}When $x < \frac{1}{2}$, terms are added or subtracted depending on the specific magnitude of $x$. The smaller $x$ is, the more terms are involved. This is what Teacher Li Yongle meant when he wrote "the situation becomes significantly more complicated when $\theta$ is greater than 180 degrees."
We can also find its expectation, which is the problem discussed in the Zhihu question. There is another interesting case: clearly, when $x < \frac{1}{n}$, it always holds that $1 - kx > 0$ for all $k \leq n$, meaning all terms must be used:
\begin{equation}P_n(\max(x_1,x_2,\cdots,x_n) > x) = \sum_{k=1}^n (-1)^{k-1} C_n^k (1-kx)^{n-1}\end{equation}However, according to the "pigeonhole principle," when a unit line segment is divided into $n$ segments, the longest segment must be at least $\frac{1}{n}$. This implies $P_n(\max(x_1,x_2,\cdots,x_n) > \frac{1}{n}) = 1$. Therefore, when $x < \frac{1}{n}$, it must be that:
\begin{equation}\sum_{k=1}^n (-1)^{k-1} C_n^k (1-kx)^{n-1} = 1\end{equation}Moreover, since the left-hand side is just a simple algebraic polynomial, it must be analytic. Consequently, this identity actually holds for all $x$. Interested readers might want to try proving it directly from an algebraic perspective.
This article discussed the general solution to the generalization of the "four ducks in a semicircle" problem, where the main idea utilized is the inclusion-exclusion principle.