Seeking a Smooth Maximum Function

By 苏剑林 | May 02, 2015

In optimization problems, the most direct method to find the maximum or minimum of a function is to take the derivative and then compare the magnitudes of the various extreme values. However, the functions we want to optimize are not always differentiable—for example, functions containing the maximum function $\max(x, y)$. In such cases, we must resort to other approaches. One very clever idea is to approximate these non-differentiable functions with a differentiable one, allowing us to find an approximate optimal value using standard extremum methods. The task of this article is to explore a simple and useful function that can serve as an approximation for the maximum function while possessing derivatives of multiple orders. Below is the derivation process provided by the author.

In mathematical analysis, I have already learned a formula regarding the maximum function; namely, when $x \geq 0$ and $y \geq 0$, we have: $$\max(x,y)=\frac{1}{2}\left(|x+y|+|x-y|\right)\tag{1}$$ Therefore, to seek a smooth maximum function, we can first consider finding a function that can approximately represent the absolute value $|x|$. This allows us to reduce the problem from two dimensions to one. So, which function can we use?

It is quite difficult to discover which function to use simply by direct observation, so let us break the problem down into simpler steps. We take the derivative of $f(x)=|x|$; except for the point $x=0$, it can be differentiated smoothly everywhere: $$f'(x) = \left\{\begin{aligned}1,&\,x > 0\\ -1,&\, x < 0\end{aligned}\right.\tag{2}$$ This is a simple piecewise function. In physics, such functions are very common. The one most closely related to it should be the unit step function $\theta(x)$: $$\theta(x) = \left\{\begin{aligned}1,&\,x > 0\\ 0,&\, x < 0\end{aligned}\right.\tag{3}$$ Then: $$f'(x)=2\theta(x)-1\tag{4}$$ Next, we only need to seek an approximation for $\theta(x)$. Physicists have already provided existing functions for us; a relatively simple form is [Source: Wikipedia]: $$\theta(x)=\lim_{k\to +\infty} \frac{1}{1+e^{-k x}}\tag{5}$$ Thus, we can take $\frac{1}{1+e^{-k x}}$ as the approximation function. Substituting this into equation $(4)$, we obtain $\frac{2e^{k x}}{1+e^{k x}}-1$. Integrating this yields: \begin{align} f(x)&=\frac{2}{k}\ln(1+e^{kx})-x\\ &=\frac{1}{k}\left[\ln(1+e^{kx})+\ln(1+e^{-kx})\right]\\ &=\frac{1}{k}\ln(2+e^{kx}+e^{-kx})\tag{6} \end{align} It is not difficult to see that in the logarithmic part of equation $(6)$, when $k$ is sufficiently large, the influence of the constant $2$ is negligible. After removing it, we have a relatively simple absolute value function: $$|x|=\lim_{k\to +\infty} \frac{1}{k}\ln(e^{kx}+e^{-kx})\tag{7}$$ Combining equation $(7)$ with equation $(1)$, we get: $$\max(x,y)=\lim_{k\to +\infty} \frac{1}{2k}\left\{\ln[e^{k(x+y)}+e^{-k(x+y)}]+\ln[e^{k(x-y)}+e^{-k(x-y)}]\right\}\tag{8}$$ Equation $(8)$ can be further simplified to: $$\max(x,y)=\lim_{k\to +\infty} \frac{1}{2k}\ln(e^{2kx}+e^{-2kx}+e^{2ky}+e^{-2ky})\tag{9}$$ Furthermore, since equation $(1)$ holds when $x\geq 0$ and $y\geq 0$, the $e^{-2kx}$ and $e^{-2ky}$ terms in equation $(9)$ become unimportant. We remove them as well, obtaining: $$\max(x,y)=\lim_{k\to +\infty} \frac{1}{2k}\ln(e^{2kx}+e^{2ky})\tag{10}$$ Or written as: $$\max(x,y)=\lim_{k\to +\infty} \frac{1}{k}\ln(e^{kx}+e^{ky})\tag{11}$$ Equation $(11)$ is precisely the ideal maximum function we hoped to obtain. Although our derivation was based on $x\geq 0, y\geq 0$, it is easy to find that the formula remains valid even when negative numbers appear among $x$ and $y$! It can even be generalized to a maximum function for multiple variables: $$\max(x,y,z,\dots)=\lim_{k\to +\infty} \frac{1}{k}\ln(e^{kx}+e^{ky}+e^{kz}+\dots)\tag{12}$$

For more exhibitions regarding equation $(11)$, please read Matrix67's "How to Construct a Smooth Maximum Function": http://www.matrix67.com/blog/archives/2830

Observing the structure of equation $(11)$, we can see that it essentially does this: find a function that is monotonically increasing across the entire real number domain, with a growth rate faster than linear growth, then take the sum and finally take the inverse function. Thus, it is not difficult to construct similar functions: if we choose $y=x^{2k+1}$, we get: $$\max(x,y)=\lim_{k\to+\infty} \sqrt[2k+1]{x^{2k+1}+y^{2k+1}}\tag{13}$$ Of course, the accuracy (or rather, the convergence speed) of $(13)$ is nowhere near as good as $(11)$. Improving the accuracy is not difficult; for example: $$\max(x,y)=\lim_{k\to +\infty} \frac{1}{k}\ln\ln\left(e^{e^{kx}}+e^{e^{ky}}\right)\tag{14}$$ Considering both accuracy and simplicity, equation $(11)$ is likely the optimal choice.