By 苏剑林 | Oct 16, 2018
A few years ago, the author wrote an "Understanding Matrices" series based on his own modest understanding, including an article "Why Do Only Square Matrices Have Determinants?" which discussed the issue of determinants for non-square matrices. It presented views like "determinants of non-square matrices are not elegant" and "square matrix determinants are sufficient." This article revisits this question.
First, recall the determinant of a square matrix. Its most important value lies in its geometric meaning:
The absolute value of the determinant of an $n$-dimensional square matrix is equal to the hypervolume of the $n$-dimensional solid spanned by its row (or column) vectors.
This geometric meaning is the source of all the importance of determinants; related discussions can be found in "Bits and Pieces of Determinants." It is also the basis for our discussion of non-square matrix determinants.
For a square matrix $\boldsymbol{A}_{n \times n}$, it can be viewed as a combination of $n$ row vectors or $n$ column vectors. In either case, the absolute value of the determinant equals the hypervolume of the $n$-dimensional solid spanned by these $n$ vectors. In other words, for square matrices, the distinction between row and column vectors does not change the determinant.
For a non-square matrix $\boldsymbol{B}_{n \times k}$, it is different. Without loss of generality, assume $n > k$. We can view it as a combination of $n$ $k$-dimensional row vectors or $k$ $n$-dimensional column vectors. The determinant of a non-square matrix should also possess the same meaning, namely, the hypervolume of the solid they span.
Consider the first case: if viewed as $n$ $k$-dimensional row vectors, we must consider the hypervolume of the $n$-dimensional solid spanned by these $n$ vectors. However, since $n > k$, these $n$ vectors are necessarily linearly dependent, so they cannot span an $n$-dimensional solid; it might be an $(n-1)$-dimensional solid or lower. Consequently, its $n$-dimensional hypervolume is naturally 0.
However, the second case is not so trivial. If viewed as $k$ $n$-dimensional column vectors, although they are $n$-dimensional, they span a $k$-dimensional solid. The hypervolume of this $k$-dimensional solid is not necessarily 0. Let's use this non-trivial volume as the definition for the determinant of a non-square matrix.
For the second case, there is a very clever definition that can leverage square matrix determinants:
\begin{equation}\|\det \boldsymbol{B}\| = \sqrt{\det (\boldsymbol{B}^{\top}\boldsymbol{B})}\label{eq:dingyi}\end{equation}Of course, this only defines the absolute value of the determinant, but that is often sufficient. In most cases, we only use the absolute value.
It can be observed that this definition is compatible with square matrix determinants, and we will further prove that this definition indeed preserves the geometric meaning of the determinant.
Let's calculate two examples. The first example considers an $n \times 1$ matrix:
\begin{equation}\boldsymbol{X} = \begin{pmatrix}x_1 \\ x_2 \\ \vdots \\ x_n\end{pmatrix}\end{equation}According to definition $\eqref{eq:dingyi}$, we calculate:
\begin{equation}\|\det \boldsymbol{X}\| = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}\end{equation}According to our definition, it should represent the "1-dimensional volume" of one $n$-dimensional column vector. By analogy, "1-dimensional volume" is length, and the formula above is exactly the vector norm formula. That is to say, in the $n \times 1$ case, definition $\eqref{eq:dingyi}$ is compatible with our expectation.
The second example is an $n \times 2$ matrix:
\begin{equation}\boldsymbol{Z} = \begin{pmatrix}x_1 & y_1 \\ x_2 & y_2 \\ \vdots & \vdots \\ x_n & y_n\end{pmatrix}=(\boldsymbol{x}, \boldsymbol{y})\end{equation}Calculating according to definition $\eqref{eq:dingyi}$, we obtain the final result as:
\begin{equation}\|\det \boldsymbol{Z}\| = \sqrt{\boldsymbol{x}^{\top}\boldsymbol{x}\boldsymbol{y}^{\top}\boldsymbol{y} - (\boldsymbol{x}^{\top}\boldsymbol{y})^2}\end{equation}It is not difficult to find that this result is exactly the square of the area of the parallelogram spanned by $\boldsymbol{x}, \boldsymbol{y}$, because the area of the parallelogram calculated according to the definition should be:
\begin{equation}\begin{aligned}\|\boldsymbol{x}\|\cdot\|\boldsymbol{y}\|\cdot\sin\theta =& \|\boldsymbol{x}\|\cdot\|\boldsymbol{y}\|\cdot\sqrt{1-\cos^2\theta}\\ =&\|\boldsymbol{x}\|\cdot\|\boldsymbol{y}\|\cdot\sqrt{1-\left(\frac{\boldsymbol{x}^{\top}\boldsymbol{y}}{\|\boldsymbol{x}\|\cdot\|\boldsymbol{y}\|}\right)^2} \end{aligned}\end{equation}In other words, for an $n \times 2$ matrix, definition $\eqref{eq:dingyi}$ is also consistent with our expectation.
Now consider the general proof. For an $n \times k$ matrix $\boldsymbol{B}$:
\begin{equation}\boldsymbol{B}_{n \times k} = \begin{pmatrix}b_{11} & \dots & b_{1k}\\ b_{21} & \dots & b_{2k}\\ \vdots & \ddots & \vdots\\ b_{n1} & \dots & b_{nk}\end{pmatrix} = (\boldsymbol{b}_1,\dots,\boldsymbol{b}_k)\end{equation}and $n > k$. First, from the Gram–Schmidt orthogonalization process we are familiar with, we know there exists an $n \times k$ semi-orthogonal matrix $\boldsymbol{U}_{n \times k}$ ($k$ mutually orthogonal $n$-dimensional unit column vectors) and a $k \times k$ triangular matrix $\boldsymbol{C}_{k \times k}$, such that:
\begin{equation}\boldsymbol{B}_{n \times k}=\boldsymbol{U}_{n\times k}\boldsymbol{C}_{k\times k}\end{equation}This is mathematically known as "QR decomposition." We know that orthogonal transformations do not change any geometric properties, so the determinant of $\boldsymbol{B}_{n \times k}$ should equal the determinant of $\boldsymbol{C}_{k \times k}$ (absolute value), which is $\|\det \boldsymbol{C}_{k\times k}\|$.
Thus, we have:
\begin{equation}\begin{aligned}\|\det \boldsymbol{B}_{n\times k}\| =& \|\det \boldsymbol{C}_{k\times k}\|\\ =& \sqrt{\det\left(\boldsymbol{C}_{k\times k}^{\top}\boldsymbol{C}_{k\times k}\right)}\\ =& \sqrt{\det\left[\left(\boldsymbol{U}_{n\times k}^{\top}\boldsymbol{B}_{n \times k}\right)^{\top}\left(\boldsymbol{U}_{n\times k}^{\top}\boldsymbol{B}_{n \times k}\right)\right]}\\ =& \sqrt{\det\left(\boldsymbol{B}_{n \times k}^{\top}\boldsymbol{B}_{n \times k}\right)} \end{aligned}\end{equation}Therefore, for an $n \times k$ matrix $\boldsymbol{B}$ where $n > k$, a non-trivial and reasonable matrix definition is $\sqrt{\det (\boldsymbol{B}^{\top}\boldsymbol{B})}$. Obviously, if $n < k$, the definition is $\sqrt{\det (\boldsymbol{B}\boldsymbol{B}^{\top})}$.
We started from geometric meaning to discuss the problem of determinants for non-square matrices, finally showing that formula $\eqref{eq:dingyi}$ can serve as a relatively reasonable definition for non-square determinants. Although theoretically $\eqref{eq:dingyi}$ only defines the absolute value of the determinant, it is sufficient for most scenarios.
As for applications of non-square determinants, we know that when performing coordinate transformations for integrals, we have a Jacobian determinant to ensure the non-triviality of the transformation. Similarly, it might be possible to use non-square determinants to ensure the non-triviality of upsampling or downsampling transformations. Of course, this is a conceptual idea; the author is currently reflecting on these types of problems and welcomes interested readers to discuss.