What can the matrix sign function mcsgn compute?

By 苏剑林 | June 23, 2025

In the article "The Derivative of msign", we formally introduced two matrix sign functions $\newcommand{msign}{\mathop{\text{msign}}}\msign$ and $\newcommand{mcsgn}{\mathop{\text{mcsgn}}}\mcsgn$, where $\msign$ is the core operation of Muon, while $\mcsgn$ is used to solve the Sylvester equation. So, what else can $\mcsgn$ do besides solving the Sylvester equation? This article aims to summarize the answers to this question.

Two Signs

Let the matrix $\boldsymbol{M}\in\mathbb{R}^{n\times m}$. We have two types of matrix sign functions:

\begin{gather} \msign(\boldsymbol{M}) = (\boldsymbol{M}\boldsymbol{M}^{\top})^{-1/2}\boldsymbol{M}= \boldsymbol{M}(\boldsymbol{M}^{\top}\boldsymbol{M})^{-1/2} \\[6pt] \mcsgn(\boldsymbol{M}) = (\boldsymbol{M}^2)^{-1/2}\boldsymbol{M}= \boldsymbol{M}(\boldsymbol{M}^2)^{-1/2} \end{gather}

The first type is applicable to matrices of any shape, while the second type is only applicable to square matrices. The exponent $^{-1/2}$ refers to the inverse of the matrix square root; if it is not invertible, it is calculated according to the "Pseudo-inverse". Generally, $\msign$ and $\mcsgn$ yield different results, but they are equal when $\boldsymbol{M}$ is a symmetric matrix.

The difference between them is: if $\boldsymbol{M}=\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^{\top}$ where $\boldsymbol{U},\boldsymbol{V}$ are orthogonal matrices, then $\msign(\boldsymbol{M}) = \boldsymbol{U}\msign(\boldsymbol{\Sigma})\boldsymbol{V}^{\top}$; if $\boldsymbol{M}=\boldsymbol{P}\boldsymbol{\Lambda}\boldsymbol{P}^{-1}$ where $\boldsymbol{P}$ is an invertible matrix, then $\mcsgn(\boldsymbol{M})=\boldsymbol{P}\mcsgn(\boldsymbol{\Lambda})\boldsymbol{P}^{-1}$. Simply put, one possesses orthogonal invariance while the other possesses similarity invariance; one turns all non-zero singular values into 1, while the other turns all non-zero eigenvalues into $\pm 1$.

Regarding the calculation of $\msign$, you can refer to "Newton-Schulz Iteration for msign Operator (Part 1)" and "Newton-Schulz Iteration for msign Operator (Part 2)", which is GPU-efficient. As for $\mcsgn$, since eigenvalues can be complex numbers, the general case can be quite complicated. However, when the eigenvalues of $\boldsymbol{M}$ are all real (which is the case in almost all scenarios where $\mcsgn$ is used), the $\msign$ iteration can be reused:

\begin{equation}\newcommand{tr}{\mathop{\text{tr}}}\boldsymbol{X}_0 = \frac{\boldsymbol{M}}{\sqrt{\tr(\boldsymbol{M}^2)}},\qquad \boldsymbol{X}_{t+1} = a_{t+1}\boldsymbol{X}_t + b_{t+1}\boldsymbol{X}_t^3 + c_{t+1}\boldsymbol{X}_t^5\end{equation}

We won't expand further on more properties; next, we mainly look at the applications of $\mcsgn$.

Block Identities

Historically, $\mcsgn$ was introduced to solve equations—not just the Sylvester equation, but also the more general Algebraic Riccati Equation. The original paper is "Solving the algebraic Riccati equation with the matrix sign function".

Consider the block matrix $\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}$. We have $\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}^{-1} = \begin{bmatrix}\boldsymbol{0} & \boldsymbol{I} \\ -\boldsymbol{I} & \boldsymbol{X}\end{bmatrix}$. It can be verified that:

\begin{equation}\begin{bmatrix}\boldsymbol{0} & \boldsymbol{I} \\ -\boldsymbol{I} & \boldsymbol{X}\end{bmatrix}\begin{bmatrix}\boldsymbol{A} & \boldsymbol{C} \\ \boldsymbol{D} & \boldsymbol{B}\end{bmatrix}\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}=\begin{bmatrix}\boldsymbol{B} + \boldsymbol{D}\boldsymbol{X} & -\boldsymbol{D} \\ \boldsymbol{X}\boldsymbol{D}\boldsymbol{X} + \boldsymbol{X}\boldsymbol{B} - \boldsymbol{A}\boldsymbol{X} - \boldsymbol{C} & \boldsymbol{A} - \boldsymbol{X}\boldsymbol{D}\end{bmatrix}\end{equation}

If

\begin{equation}\boldsymbol{X}\boldsymbol{D}\boldsymbol{X} + \boldsymbol{X}\boldsymbol{B} - \boldsymbol{A}\boldsymbol{X} - \boldsymbol{C} = \boldsymbol{0}\label{eq:riccati}\end{equation}

then

\begin{equation}\begin{bmatrix}\boldsymbol{A} & \boldsymbol{C} \\ \boldsymbol{D} & \boldsymbol{B}\end{bmatrix}=\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}\begin{bmatrix}\boldsymbol{B} + \boldsymbol{D}\boldsymbol{X} & -\boldsymbol{D} \\ \boldsymbol{0} & \boldsymbol{A} - \boldsymbol{X}\boldsymbol{D}\end{bmatrix}\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}^{-1}\end{equation}

Equation \eqref{eq:riccati} is the Algebraic Riccati Equation. Taking the $\mcsgn$ of both sides, we have the identity:

\begin{equation}\begin{aligned} \mcsgn\left(\begin{bmatrix}\boldsymbol{A} & \boldsymbol{C} \\ \boldsymbol{D} & \boldsymbol{B}\end{bmatrix}\right)=&\,\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}\mcsgn\left(\begin{bmatrix}\boldsymbol{B} + \boldsymbol{D}\boldsymbol{X} & -\boldsymbol{D} \\ \boldsymbol{0} & \boldsymbol{A} - \boldsymbol{X}\boldsymbol{D}\end{bmatrix}\right)\begin{bmatrix}\boldsymbol{X} & - \boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}^{-1} \\[6pt] =&\,\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}\begin{bmatrix}\mcsgn(\boldsymbol{B} + \boldsymbol{D}\boldsymbol{X}) & \boldsymbol{Y} \\ \boldsymbol{0} & \mcsgn(\boldsymbol{A} - \boldsymbol{X}\boldsymbol{D})\end{bmatrix}\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}^{-1} \end{aligned}\end{equation}

The second equality utilizes the properties of (block) triangular matrices. The eigenvalues of a triangular matrix are its diagonal elements, so taking the $\mcsgn$ of a triangular matrix also results in a triangular matrix, where the diagonal elements equal the $\mathop{\text{csgn}}$ of the original diagonal elements. This property also holds for block triangular matrices, thus the result takes the form of the second equality, where $\boldsymbol{Y}$ is an undetermined matrix.

Some Results

Below we further simplify based on specific situations to obtain some results that might be useful.

First Example

Assume $\boldsymbol{D}=\boldsymbol{0}$, $\boldsymbol{B}$ is positive definite, and $\boldsymbol{A}$ is negative definite. The operation on block diagonal matrices is closed, so $\boldsymbol{Y}=\boldsymbol{0}$. Then:

\begin{equation}\mcsgn\left(\begin{bmatrix}\boldsymbol{A} & \boldsymbol{C} \\ \boldsymbol{0} & \boldsymbol{B}\end{bmatrix}\right)=\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}\begin{bmatrix}\boldsymbol{I} & \boldsymbol{0} \\ \boldsymbol{0} & -\boldsymbol{I}\end{bmatrix}\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}^{-1} = \begin{bmatrix}-\boldsymbol{I} & 2\boldsymbol{X} \\ \boldsymbol{0} & \boldsymbol{I}\end{bmatrix}\end{equation}

This means that the solution to the Sylvester equation $\boldsymbol{X}\boldsymbol{B} - \boldsymbol{A}\boldsymbol{X} = \boldsymbol{C}$ can be read directly from $\mcsgn\left(\begin{bmatrix}\boldsymbol{A} & \boldsymbol{C} \\ \boldsymbol{0} & \boldsymbol{B}\end{bmatrix}\right)$.

Second Example

Assume $\boldsymbol{A},\boldsymbol{B}=\boldsymbol{0}$, $\boldsymbol{D}=\boldsymbol{I}$, and $\boldsymbol{C}$ is a positive definite matrix. Then the Riccati equation simplifies to $\boldsymbol{X}^2 = \boldsymbol{C}$, i.e., $\boldsymbol{X}=\boldsymbol{C}^{1/2}$. Thus $\mcsgn(\boldsymbol{C}^{1/2})=\boldsymbol{I}$, so:

\begin{equation}\mcsgn\left(\begin{bmatrix}\boldsymbol{0} & \boldsymbol{C} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}\right)=\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}\begin{bmatrix}\boldsymbol{I} & \boldsymbol{Y} \\ \boldsymbol{0} & - \boldsymbol{I}\end{bmatrix}\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}^{-1} = \begin{bmatrix}-\boldsymbol{X}\boldsymbol{Y}-\boldsymbol{I} & 2\boldsymbol{X} + \boldsymbol{X}\boldsymbol{Y}\boldsymbol{X} \\ -\boldsymbol{Y} & \boldsymbol{Y}\boldsymbol{X} + \boldsymbol{I}\end{bmatrix}\end{equation}

Note that $\mcsgn$ is an odd function; an odd function of an anti-diagonal matrix must also be an anti-diagonal matrix. Therefore $\boldsymbol{Y}\boldsymbol{X} + \boldsymbol{I}=\boldsymbol{0}$, from which we solve $\boldsymbol{Y} = -\boldsymbol{X}^{-1} = -\boldsymbol{C}^{-1/2}$. Substituting this back into the above equation, we get:

\begin{equation}\mcsgn\left(\begin{bmatrix}\boldsymbol{0} & \boldsymbol{C} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}\right)=\begin{bmatrix}\boldsymbol{0} & \boldsymbol{C}^{1/2} \\ \boldsymbol{C}^{-1/2} & \boldsymbol{0}\end{bmatrix}\end{equation}

This indicates that $\mcsgn$ can also be used to calculate the square root and inverse square root of a matrix. More generally, if the eigenvalues of the matrix $\boldsymbol{A}\boldsymbol{B}$ are non-negative, then:

\begin{equation}\mcsgn\left(\begin{bmatrix}\boldsymbol{0} & \boldsymbol{A} \\ \boldsymbol{B} & \boldsymbol{0}\end{bmatrix}\right)=\begin{bmatrix}\boldsymbol{0} & \boldsymbol{C} \\ \boldsymbol{C}^{-1} & \boldsymbol{0}\end{bmatrix}\end{equation}

where $\boldsymbol{C}=\boldsymbol{A}(\boldsymbol{B}\boldsymbol{A})^{-1/2}$.

Third Example

Assume $\boldsymbol{A},\boldsymbol{B}=\boldsymbol{0}$, and $\boldsymbol{D}=\boldsymbol{C}^{\top}$. Then the Riccati equation simplifies to $\boldsymbol{X}\boldsymbol{C}^{\top}\boldsymbol{X} = \boldsymbol{C}$. It is easy to verify that $\boldsymbol{X}=\msign(\boldsymbol{C})$ is indeed its solution. We will only demonstrate the most ideal case where $\boldsymbol{C}$ is a full-rank square matrix. Then $\boldsymbol{C}^{\top}\boldsymbol{X}$ and $\boldsymbol{X}\boldsymbol{C}^{\top}$ are both positive definite, so we have:

\begin{equation}\mcsgn\left(\begin{bmatrix}\boldsymbol{0} & \boldsymbol{C} \\ \boldsymbol{C}^{\top} & \boldsymbol{0}\end{bmatrix}\right)=\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}\begin{bmatrix}\boldsymbol{I} & \boldsymbol{Y} \\ \boldsymbol{0} & -\boldsymbol{I}\end{bmatrix}\begin{bmatrix}\boldsymbol{X} & -\boldsymbol{I} \\ \boldsymbol{I} & \boldsymbol{0}\end{bmatrix}^{-1}=\begin{bmatrix}-\boldsymbol{X}\boldsymbol{Y}-\boldsymbol{I} & 2\boldsymbol{X} + \boldsymbol{X}\boldsymbol{Y}\boldsymbol{X} \\ -\boldsymbol{Y} & \boldsymbol{Y}\boldsymbol{X} + \boldsymbol{I}\end{bmatrix}\end{equation}

By the same reasoning as in the previous section, $\boldsymbol{Y}\boldsymbol{X} + \boldsymbol{I}=0$, so:

\begin{equation}\mcsgn\left(\begin{bmatrix}\boldsymbol{0} & \boldsymbol{C} \\ \boldsymbol{C}^{\top} & \boldsymbol{0}\end{bmatrix}\right)=\begin{bmatrix}\boldsymbol{0} & \msign(\boldsymbol{C}) \\ \msign(\boldsymbol{C}^{\top}) & \boldsymbol{0}\end{bmatrix}\end{equation}

That is, $\mcsgn$ can also be used to calculate $\msign$. In fact, it can be directly proven that this equality holds for any matrix $\boldsymbol{C}$, but proving it from the perspective of solving the Riccati equation here would involve some tedious details, which the reader can supplement independently.

Summary

This article mainly organizes several identities related to $\mcsgn$ from the perspective of solving the Algebraic Riccati Equation.