By 苏剑林 | Oct 12, 2025
In the "Approximate Estimation" section of "Higher-Order MuP: Simpler but Smarter Spectral Condition Scaling", we "pre-spent" a conclusion: "For an $n\times m$ random matrix whose elements follow a standard normal distribution, its spectral norm is approximately $\sqrt{n}+\sqrt{m}$."
In this article, we supplement the discussion of this conclusion and provide a fast estimation method for the spectral norm of random matrices.
Suppose we have a random matrix $\boldsymbol{W}\in\mathbb{R}^{n\times m}$, where each element is independently and identically sampled from the standard normal distribution $\mathcal{N}(0,1)$. We are required to estimate the spectral norm of $\boldsymbol{W}$, which is the largest singular value, and we take $\mathbb{E}[\|\boldsymbol{W}\|_2]$ as the final estimation result.
First, it should be pointed out that the characterization analysis of random matrices has formed a specialized branch. Regarding the spectral norm estimation of normal matrices, relevant keywords include "Marchenko–Pastur distribution", "Bai-Yin law", "Gordon’s theorem", etc. They contain detailed estimation results concerning the distribution of singular values. However, we do not intend to introduce these contents; instead, we will introduce a method to quickly derive $\sqrt{n}+\sqrt{m}$.
This method was simplified by the author based on Section 5.3.1 of "Introduction to the non-asymptotic analysis of random matrices". In fact, it can only be considered a "popular science" explanation to help everyone quickly understand the origin of $\sqrt{n}+\sqrt{m}$. A rigorous proof would require the addition of many tedious and monotonous details, which we will not expand upon here.
Our starting point is the identity \begin{equation}\|\boldsymbol{W}\|_2 = \max_{\|\boldsymbol{u}\|=1, \|\boldsymbol{v}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\label{eq:w-uv}\end{equation} where $\boldsymbol{u}\in\mathbb{R}^n,\boldsymbol{v}\in\mathbb{R}^m$. Directly calculating $\|\boldsymbol{W}\|_2$ is usually not easy, so it is natural to think of looking for some kind of approximation. We consider the following two "half-finished products": \begin{equation}\max_{\|\boldsymbol{u}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\qquad\qquad \max_{\|\boldsymbol{v}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\end{equation} Intuitively, compared to Eq. $\eqref{eq:w-uv}$, the above two expressions only complete half of the work. We make a bold assumption: their results are also half of the complete result. Therefore, by superimposing them, we consider it a good approximation of the final result: \begin{equation}\|\boldsymbol{W}\|_2 = \max_{\|\boldsymbol{u}\|=1, \|\boldsymbol{v}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} \approx \max_{\|\boldsymbol{u}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} + \max_{\|\boldsymbol{v}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} = \|\boldsymbol{W}\boldsymbol{v}\| + \|\boldsymbol{u}^{\top}\boldsymbol{W}\| \label{eq:core-approx}\end{equation} That is to say, we sample $\boldsymbol{u}$ and $\boldsymbol{v}$ from the unit hyperspheres of $\mathbb{R}^n$ and $\mathbb{R}^m$ respectively, and then obtain an approximation of the spectral norm of $\boldsymbol{W}$ according to the above formula.
With this approximation, we can calculate \begin{equation}\mathbb{E}[\|\boldsymbol{W}\|_2]\approx\mathbb{E}[\|\boldsymbol{W}\boldsymbol{v}\|] + \mathbb{E}[\|\boldsymbol{u}^{\top}\boldsymbol{W}\|] \approx \sqrt{\mathbb{E}[\|\boldsymbol{W}\boldsymbol{v}\|^2]} + \sqrt{\mathbb{E}[\|\boldsymbol{u}^{\top}\boldsymbol{W}\|^2]}\end{equation} where \begin{equation}\mathbb{E}[\|\boldsymbol{W}\boldsymbol{v}\|^2] = \mathbb{E}[\boldsymbol{v}^{\top}\boldsymbol{W}^{\top}\boldsymbol{W}\boldsymbol{v}] = \boldsymbol{v}^{\top}\mathbb{E}[\boldsymbol{W}^{\top}\boldsymbol{W}]\boldsymbol{v} = \boldsymbol{v}^{\top}(n\boldsymbol{I}_m)\boldsymbol{v} = n\end{equation} Similarly, $\mathbb{E}[\|\boldsymbol{u}^{\top}\boldsymbol{W}\|^2]=m$, so \begin{equation}\mathbb{E}[\|\boldsymbol{W}\|_2]\approx\sqrt{n} + \sqrt{m}\end{equation} This result is actually very accurate (it can be verified through simulation experiments). Specifically, if $n=ka, m=kb$, where $a, b$ are constants, then \begin{equation}\lim_{k\to\infty} \frac{\|\boldsymbol{W}\|_2}{\sqrt{n} + \sqrt{m}} = 1,\qquad \boldsymbol{W}\sim\mathcal{N}(0,1)\end{equation} The reason it is so accurate is that we cheated—the most critical formula $\eqref{eq:core-approx}$ is essentially reverse-engineered from the known correct answer. Besides Eq. $\eqref{eq:core-approx}$, the only conditions we used were $\mathbb{E}[\boldsymbol{W}^{\top}\boldsymbol{W}]=n\boldsymbol{I}_m$ and $\mathbb{E}[\boldsymbol{W}\boldsymbol{W}^{\top}]=m\boldsymbol{I}_n$. Therefore, it can be considered that the above approximation holds for any distribution with zero mean and unit variance.
The spectral norm is the largest singular value. In fact, we can use the same logic to estimate the minimum singular value. Of course, "minimum" here needs to be defined more strictly. Assuming $n\geq m$, the minimum singular value here refers to the $m$-th singular value of $\boldsymbol{W}$ ordered from largest to smallest, which is equal to \begin{equation}\sigma_{\min}(\boldsymbol{W}) = \min_{\|\boldsymbol{v}\|=1}\max_{\|\boldsymbol{u}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\end{equation} Note that the positions and objects of $\min$ and $\max$ here cannot be interchanged. Similarly, we consider the sum of two "half-finished products" of $\min$ and $\max$ respectively as its approximation: \begin{equation}\sigma_{\min}(\boldsymbol{W}) = \min_{\|\boldsymbol{v}\|=1}\max_{\|\boldsymbol{u}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v}\approx \min_{\|\boldsymbol{v}\|=1}\boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} + \max_{\|\boldsymbol{u}\|=1} \boldsymbol{u}^{\top}\boldsymbol{W} \boldsymbol{v} = -\|\boldsymbol{u}^{\top}\boldsymbol{W}\| + \|\boldsymbol{W}\boldsymbol{v}\| \end{equation} The rest of the process is the same as before, and the result is $\mathbb{E}[\sigma_{\min}(\boldsymbol{W})]\approx\sqrt{n}-\sqrt{m}$.
This article provides a quick heuristic approach to estimating the spectral norm of random matrices, or more strictly, a popular-science-style, heuristic explanation rather than a rigorous and accurate derivation. It has the potential to be made rigorous, but it would require adding many theoretical details, all of which have been skipped here.