Constructing an Explicit, Always Invertible Matrix

By 苏剑林 | March 1, 2019

From the article "Appreciation of the Identity det(exp(A)) = exp(Tr(A))", we learned that the matrix $\exp(\boldsymbol{A})$ is always invertible, and its inverse is $\exp(-\boldsymbol{A})$. The problem is that $\exp(\boldsymbol{A})$ is only a theoretical definition; simply writing it this way has little practical value because it requires calculating every $\boldsymbol{A}^n$.

Are there any specific examples? Yes. This article will construct an explicit, always invertible matrix.

The logic is actually very simple. Suppose $\boldsymbol{x}, \boldsymbol{y}$ are two $k$-dimensional column vectors. Then $\boldsymbol{x}\boldsymbol{y}^{\top}$ is a $k \times k$ matrix. Let's consider:

\begin{equation} \begin{aligned} \exp\left(\boldsymbol{x}\boldsymbol{y}^{\top}\right)=&\sum_{n=0}^{\\infty}\frac{\left(\boldsymbol{x}\boldsymbol{y}^{\top}\right)^n}{n!}\\ =&\boldsymbol{I}+\boldsymbol{x}\boldsymbol{y}^{\top}+\frac{\boldsymbol{x}\boldsymbol{y}^{\top}\boldsymbol{x}\boldsymbol{y}^{\top}}{2}+\frac{\boldsymbol{x}\boldsymbol{y}^{\top}\boldsymbol{x}\boldsymbol{y}^{\top}\boldsymbol{x}\boldsymbol{y}^{\top}}{6}+\dots \end{aligned} \end{equation}

Noticing that

\begin{equation}\boldsymbol{y}^{\top}\boldsymbol{x}=\langle \boldsymbol{x},\boldsymbol{y}\rangle\end{equation}

is actually just a scalar, we can continue to simplify:

\begin{equation} \begin{aligned}\exp\left(\boldsymbol{x}\boldsymbol{y}^{\top}\right)=&\boldsymbol{I}+\boldsymbol{x}\boldsymbol{y}^{\top}\left(1+\frac{\langle \boldsymbol{x},\boldsymbol{y}\rangle}{2}+\frac{\langle \boldsymbol{x},\boldsymbol{y}\rangle^2}{6}+\dots\right)\\ =&\boldsymbol{I}+\boldsymbol{x}\boldsymbol{y}^{\top}\left(\frac{e^{\langle \boldsymbol{x},\boldsymbol{y}\rangle}-1}{\langle \boldsymbol{x},\boldsymbol{y}\rangle}\right) \end{aligned} \end{equation}

Now this matrix is very concrete and can be calculated easily because it only involves scalar exponential operations. The term $(e^x - 1)/x$ in the parentheses has a removable discontinuity at $x=0$; when $x=0$, its value is 1.

According to the identity $\det(\exp(\boldsymbol{A})) = \exp(\text{Tr}(\boldsymbol{A}))$, the determinant of this matrix is:

\begin{equation}\det\left(\exp\left(\boldsymbol{x}\boldsymbol{y}^{\top}\right)\right)= e^{\langle \boldsymbol{x},\boldsymbol{y}\rangle}\end{equation}

Its inverse matrix is:

\begin{equation}\exp\left(-\boldsymbol{x}\boldsymbol{y}^{\top}\right)=\boldsymbol{I}-\boldsymbol{x}\boldsymbol{y}^{\top}\left(\frac{1 - e^{-\langle \boldsymbol{x},\boldsymbol{y}\rangle}}{\langle \boldsymbol{x},\boldsymbol{y}\rangle}\right)\end{equation}

which is also an explicit result.

Of course, a general matrix has $k^2$ independent parameters, whereas the matrix constructed here from two vectors only has $2k$ parameters, so its expressive power is likely insufficient. To enhance the expressive power, one could consider multiplying several such matrices together:

\begin{equation}\exp\left(\boldsymbol{x}_1\boldsymbol{y}_1^{\top}\right)\exp\left(\boldsymbol{x}_2\boldsymbol{y}_2^{\top}\right)\exp\left(\boldsymbol{x}_3\boldsymbol{y}_3^{\top}\right)\dots\end{equation}

Note that this is generally not equal to:

\begin{equation}\exp\left(\boldsymbol{x}_1\boldsymbol{y}_1^{\top}+\boldsymbol{x}_2\boldsymbol{y}_2^{\top}+\boldsymbol{x}_3\boldsymbol{y}_3^{\top}+\dots\right)\end{equation}

Well, after deriving all of this, what is it actually useful for?

Uh... I don't know what it's useful for either; just take it as something to appreciate.

(Actually, my original intention was to directly construct an invertible neural network through a constructive method. Following this ready-made matrix, constructing an invertible fully-connected network is not difficult, but I haven't figured out how to generalize it to convolutional layers. Once I have settled that generalization, I will return to discuss the applications of this matrix~)