By 苏剑林 | March 1, 2019
From the article "Appreciation of the Identity det(exp(A)) = exp(Tr(A))", we learned that the matrix $\exp(\boldsymbol{A})$ is always invertible, and its inverse is $\exp(-\boldsymbol{A})$. The problem is that $\exp(\boldsymbol{A})$ is only a theoretical definition; simply writing it this way has little practical value because it requires calculating every $\boldsymbol{A}^n$.
Are there any specific examples? Yes. This article will construct an explicit, always invertible matrix.
The logic is actually very simple. Suppose $\boldsymbol{x}, \boldsymbol{y}$ are two $k$-dimensional column vectors. Then $\boldsymbol{x}\boldsymbol{y}^{\top}$ is a $k \times k$ matrix. Let's consider:
\begin{equation}
\begin{aligned}
\exp\left(\boldsymbol{x}\boldsymbol{y}^{\top}\right)=&\sum_{n=0}^{\\infty}\frac{\left(\boldsymbol{x}\boldsymbol{y}^{\top}\right)^n}{n!}\\
=&\boldsymbol{I}+\boldsymbol{x}\boldsymbol{y}^{\top}+\frac{\boldsymbol{x}\boldsymbol{y}^{\top}\boldsymbol{x}\boldsymbol{y}^{\top}}{2}+\frac{\boldsymbol{x}\boldsymbol{y}^{\top}\boldsymbol{x}\boldsymbol{y}^{\top}\boldsymbol{x}\boldsymbol{y}^{\top}}{6}+\dots
\end{aligned}
\end{equation}
Noticing that
\begin{equation}\boldsymbol{y}^{\top}\boldsymbol{x}=\langle \boldsymbol{x},\boldsymbol{y}\rangle\end{equation}
is actually just a scalar, we can continue to simplify:
\begin{equation}
\begin{aligned}\exp\left(\boldsymbol{x}\boldsymbol{y}^{\top}\right)=&\boldsymbol{I}+\boldsymbol{x}\boldsymbol{y}^{\top}\left(1+\frac{\langle \boldsymbol{x},\boldsymbol{y}\rangle}{2}+\frac{\langle \boldsymbol{x},\boldsymbol{y}\rangle^2}{6}+\dots\right)\\
=&\boldsymbol{I}+\boldsymbol{x}\boldsymbol{y}^{\top}\left(\frac{e^{\langle \boldsymbol{x},\boldsymbol{y}\rangle}-1}{\langle \boldsymbol{x},\boldsymbol{y}\rangle}\right)
\end{aligned}
\end{equation}
Now this matrix is very concrete and can be calculated easily because it only involves scalar exponential operations. The term $(e^x - 1)/x$ in the parentheses has a removable discontinuity at $x=0$; when $x=0$, its value is 1.
According to the identity $\det(\exp(\boldsymbol{A})) = \exp(\text{Tr}(\boldsymbol{A}))$, the determinant of this matrix is:
\begin{equation}\det\left(\exp\left(\boldsymbol{x}\boldsymbol{y}^{\top}\right)\right)= e^{\langle \boldsymbol{x},\boldsymbol{y}\rangle}\end{equation}
Its inverse matrix is:
\begin{equation}\exp\left(-\boldsymbol{x}\boldsymbol{y}^{\top}\right)=\boldsymbol{I}-\boldsymbol{x}\boldsymbol{y}^{\top}\left(\frac{1 - e^{-\langle \boldsymbol{x},\boldsymbol{y}\rangle}}{\langle \boldsymbol{x},\boldsymbol{y}\rangle}\right)\end{equation}
which is also an explicit result.
Of course, a general matrix has $k^2$ independent parameters, whereas the matrix constructed here from two vectors only has $2k$ parameters, so its expressive power is likely insufficient. To enhance the expressive power, one could consider multiplying several such matrices together:
\begin{equation}\exp\left(\boldsymbol{x}_1\boldsymbol{y}_1^{\top}\right)\exp\left(\boldsymbol{x}_2\boldsymbol{y}_2^{\top}\right)\exp\left(\boldsymbol{x}_3\boldsymbol{y}_3^{\top}\right)\dots\end{equation}
Note that this is generally not equal to:
\begin{equation}\exp\left(\boldsymbol{x}_1\boldsymbol{y}_1^{\top}+\boldsymbol{x}_2\boldsymbol{y}_2^{\top}+\boldsymbol{x}_3\boldsymbol{y}_3^{\top}+\dots\right)\end{equation}
Well, after deriving all of this, what is it actually useful for?
Uh... I don't know what it's useful for either; just take it as something to appreciate.
(Actually, my original intention was to directly construct an invertible neural network through a constructive method. Following this ready-made matrix, constructing an invertible fully-connected network is not difficult, but I haven't figured out how to generalize it to convolutional layers. Once I have settled that generalization, I will return to discuss the applications of this matrix~)