Appreciation of the Identity \( \det(\exp(\boldsymbol{A})) = \exp(\text{Tr}(\boldsymbol{A})) \)

By 苏剑林 | February 18, 2019

The theme of this article is an interesting identity regarding the determinant of a matrix:

\begin{equation}\det(\exp(\boldsymbol{A})) = \exp(\text{Tr}(\boldsymbol{A}))\label{eq:main}\end{equation}

This identity appears in many mathematical and physical calculations; the author has encountered it several times in different literatures.

Note that the left-hand side involves the exponential of a matrix followed by the calculation of its determinant—both of which are computationally intensive operations. The right-hand side is simply the trace of the matrix (a scalar), followed by a scalar exponential. The computational cost between the two sides differs by an immense factor, yet they are surprisingly equal! This is undoubtedly a fascinating fact.

Therefore, this article aims to appreciate this identity thoroughly.

Matrix Exponential

To appreciate this identity, some preparatory work is required. First, how should we understand \( \exp(\boldsymbol{A}) \)? Generally, it is defined according to the standard Taylor series expansion of \( e^x \):

\begin{equation}\exp(\boldsymbol{A})=\sum_{n=0}^{\infty}\frac{\boldsymbol{A}^n}{n!}\end{equation}

Here, \( \boldsymbol{A} \) is a \( k \times k \) matrix. It can be proven that this definition converges for any matrix, making it a well-defined concept.

With this definition, we can directly write the solution to a system of linear differential equations with constant coefficients:

\begin{equation}\frac{d}{dt}\boldsymbol{x}=\boldsymbol{A}\boldsymbol{x}\quad\Rightarrow \quad \boldsymbol{x}=\exp(t\boldsymbol{A})\boldsymbol{x}_0\end{equation}

Of course, this result is primarily of theoretical value because, in practical calculations, you still have to painstakingly compute \( \boldsymbol{A}^2, \boldsymbol{A}^3, \dots \) one by one.

Is there a simpler calculation scheme? Yes, it becomes easier if \( \boldsymbol{A} \) is diagonalizable. Diagonalizability means:

\begin{equation}\boldsymbol{A}=\boldsymbol{P}\boldsymbol{\Lambda}\boldsymbol{P}^{-1}\end{equation}

Where \( \boldsymbol{P} \) is an invertible matrix and \( \boldsymbol{\Lambda}=\text{diag}(\lambda_1,\dots,\lambda_k) \) is a diagonal matrix. This case is simple because:

\begin{equation}\boldsymbol{A}^n=\boldsymbol{P}\boldsymbol{\Lambda}^n\boldsymbol{P}^{-1}\end{equation}

And since \( \boldsymbol{\Lambda} \) is a diagonal matrix, \( \boldsymbol{\Lambda}^n \) only requires raising each diagonal element to the power of \( n \). Therefore:

\begin{equation}\begin{aligned}\exp(\boldsymbol{A})=&\exp(\boldsymbol{P}\boldsymbol{\Lambda}\boldsymbol{P}^{-1})\\ =&\boldsymbol{P}\left(\sum_{n=0}^{\infty}\frac{\boldsymbol{\Lambda}^n}{n!}\right)\boldsymbol{P}^{-1}\\ =&\boldsymbol{P}\exp(\boldsymbol{\Lambda})\boldsymbol{P}^{-1} \end{aligned}\end{equation}

Here, \( \exp(\boldsymbol{\Lambda})=\text{diag}\left(e^{\lambda_1},\dots,e^{\lambda_k}\right) \).

However, it should be noted that although the matrix exponential definition mimics the real-number exponential series, for any two matrices \( \boldsymbol{A} \) and \( \boldsymbol{B} \), it is generally true that:

\begin{equation}\exp(\boldsymbol{A}+\boldsymbol{B})\neq \exp(\boldsymbol{A})\exp(\boldsymbol{B})\end{equation}

A sufficient condition for equality is \( \boldsymbol{A}\boldsymbol{B}=\boldsymbol{B}\boldsymbol{A} \), i.e., the multiplication is commutative. In other words, when performing mixed operations on multiple matrices, many real-number operational formulas can only be applied if the commutativity condition is met.

Since \( \boldsymbol{A} \) and \( -\boldsymbol{A} \) are obviously commutative, we have:

\begin{equation}\boldsymbol{I}=\exp(\boldsymbol{A}-\boldsymbol{A})=\exp(\boldsymbol{A})\exp(-\boldsymbol{A})\end{equation}

This means \( \exp(\boldsymbol{A}) \) is always invertible, and its inverse matrix is \( \exp(-\boldsymbol{A}) \).

Matrix Functions

In fact, through power series, many real-number series can be extended to matrices, thereby becoming matrix functions, such as:

\begin{equation}\begin{aligned}\sin(\boldsymbol{A})=&\boldsymbol{A}-\frac{\boldsymbol{A}^3}{3!}+\frac{\boldsymbol{A}^5}{5!}-\frac{\boldsymbol{A}^7}{7!}+\dots\\ \cos(\boldsymbol{A})=&\boldsymbol{I}-\frac{\boldsymbol{A}^2}{2!}+\frac{\boldsymbol{A}^4}{4!}-\frac{\boldsymbol{A}^6}{6!}+\dots \end{aligned}\end{equation}

Similarly, if \( \boldsymbol{A}\boldsymbol{B}=\boldsymbol{B}\boldsymbol{A} \), then:

\begin{equation}\sin(\boldsymbol{A}+\boldsymbol{B})=\sin(\boldsymbol{A})\cos(\boldsymbol{B})+\sin(\boldsymbol{B})\cos(\boldsymbol{A})\end{equation}

We discussed the exponential function earlier; naturally, where there is an exponential, there is a logarithm. There are generally two definitions for the matrix logarithm. The first definition is: if a matrix \( \boldsymbol{B} \) satisfies \( \exp(\boldsymbol{B})=\boldsymbol{A} \), then \( \boldsymbol{B} \) is called a logarithm of matrix \( \boldsymbol{A} \).

However, according to this definition, the matrix logarithm is not unique, even when restricted to real matrices. For example, for \( \boldsymbol{A}=\begin{pmatrix}\cos\alpha & -\sin\alpha \\ \sin\alpha & \cos\alpha\end{pmatrix} \), any \( (\alpha + 2\pi n)\begin{pmatrix}0 & -1 \\ 1 & 0\end{pmatrix} \) is its logarithm, where \( n \) is any integer.

The other definition mimics the logarithmic expansion of real series:

\begin{equation}\ln (\boldsymbol{I}+\boldsymbol{A}) = \sum_{n=1}^{\infty}(-1)^{n-1}\frac{\boldsymbol{A}^n}{n}\end{equation}

This definition is simple and the result is unique, but the convergence condition is \( \Vert \boldsymbol{A}\Vert_2 < 1 \), where \( \Vert\cdot\Vert_2 \) is the spectral norm of the matrix (refer to the "Matrix Norm" section in "Lipschitz Constraint in Deep Learning: Generalization and Generative Models"). When the constraint is satisfied, the logarithm defined this way satisfies:

\begin{equation}\exp(\ln (\boldsymbol{I}+\boldsymbol{A})) = \boldsymbol{I}+\boldsymbol{A}\end{equation}

In other words, the definition is self-consistent.

The diagonalization technique discussed for the matrix exponential applies equally to any matrix function defined via power series, such as:

\begin{equation}\ln \left(\boldsymbol{I}+\boldsymbol{P}\boldsymbol{\Lambda}\boldsymbol{P}^{-1}\right) = \boldsymbol{P}\ln(\boldsymbol{I}+\boldsymbol{\Lambda})\boldsymbol{P}^{-1}\end{equation}

Here, \( \ln(\boldsymbol{I}+\boldsymbol{\Lambda})=\text{diag}\big(\ln(1+\lambda_1),\dots,\ln(1+\lambda_k)\big) \).

det(exp(A)) = exp(Tr(A))

After so much preparation, we can finally get to the main point. For identity \(\eqref{eq:main}\), if the matrix is diagonalizable, the proof is not difficult. This is because:

\begin{equation}\begin{aligned}\text{Left Side}=&\det(\exp(\boldsymbol{A}))\\ =&\det\big(\boldsymbol{P}\exp(\boldsymbol{\Lambda})\boldsymbol{P}^{-1}\big)\\ =&\det(\boldsymbol{P}) \det(\exp(\boldsymbol{\Lambda})) \underbrace{\det(\boldsymbol{P}^{-1})}_{=1/\det(\boldsymbol{P})}\\ =&\det(\exp(\boldsymbol{\Lambda}))\\ =&e^{\lambda_1 + \dots + \lambda_k} \end{aligned}\end{equation}

And:

\begin{equation}\begin{aligned}\text{Right Side}=&\exp(\text{Tr}(\boldsymbol{A}))\\ =&\exp(\text{Tr}(\boldsymbol{P}\boldsymbol{\Lambda}\boldsymbol{P}^{-1}))\\ =&\exp(\text{Tr}(\boldsymbol{P}^{-1}\boldsymbol{P}\boldsymbol{\Lambda}))\quad [\text{The order can be swapped because for square matrices } \boldsymbol{A}, \boldsymbol{B}, \text{Tr}(\boldsymbol{AB})=\text{Tr}(\boldsymbol{BA})]\\ =&\exp(\text{Tr}(\boldsymbol{\Lambda}))\\ =&e^{\lambda_1 + \dots + \lambda_k} \end{aligned}\end{equation}

What if the matrix is not diagonalizable? If one can prove that the set of diagonalizable matrices is dense in the set of all matrices, the remaining parts can be supplemented using limits, but this is obviously cumbersome. Is there a more direct and elegant proof? Yes! And it is extremely ingenious, which is exactly the main reason I wrote this article.

This ingenious proof requires us to consider a function with parameter \( t \):

\begin{equation}f(t)=\det(\exp(t\boldsymbol{A}))\label{eq:detexp}\end{equation}

Then we find its derivative (this involves the derivative of a determinant; for the method, please refer to "Derivative of the Determinant"):

\begin{equation}\begin{aligned}\frac{d}{dt}f(t)=&f(t)\text{Tr}\left(\exp(-t\boldsymbol{A})\underbrace{\frac{d}{dt}\exp(t\boldsymbol{A})}_{=\exp(t\boldsymbol{A})\boldsymbol{A}}\right)\\ =&f(t)\text{Tr}(\boldsymbol{A})\end{aligned}\end{equation}

Note that \( \text{Tr}(\boldsymbol{A}) \) is just a number, so we have obtained an ordinary differential equation for \( f(t) \)!! Its solution is:

\begin{equation}f(t)=C\exp(t\,\text{Tr}(\boldsymbol{A}))\end{equation}

From equation \(\eqref{eq:detexp}\), we can see that \( f(0)=1 \), thus \( C=1 \), so \( f(t)=\exp(t\,\text{Tr}(\boldsymbol{A})) \). Therefore, we have proven:

\begin{equation}\det(\exp(t\boldsymbol{A}))=\exp(t\,\text{Tr}(\boldsymbol{A}))\end{equation}

Setting \( t=1 \), we obtain the identity \(\eqref{eq:main}\).

Isn't it wonderful?

By taking the logarithm of both sides of \(\eqref{eq:main}\) and letting \( \exp(\boldsymbol{A})=\boldsymbol{B} \), we can obtain another common form of this identity:

\begin{equation}\ln\det(\boldsymbol{B}) = \text{Tr}(\ln (\boldsymbol{B}))\end{equation}

That concludes this appreciation session. Thank you for reading!