[Understanding Riemannian Geometry] 2. From Pythagorean Theorem to Riemannian Metric

By 苏剑林 | October 14, 2016

Riemannian Metric

Geometry, derived from the Greek "Geometry," originally means earth-measurement. Since it is a measurement, there must be a frame of reference, and one must know how to calculate distances.

With a reference, we can establish a coordinate system and record the coordinates of every point. As for calculating distances, we have the great Pythagorean theorem:

$$ds^2 = dx^2 + dy^2 \tag{1}$$

But we have overlooked two problems here.

The first problem is that we do not necessarily use a Cartesian coordinate system. If we use polar coordinates, then it should be:

$$ds^2 = dr^2 + r^2 d\theta^2 \tag{2}$$

Therefore, one can imagine that the most general form should be:

$$ds^2 = E(x^1, x^2)(dx^1)^2 + 2F(x^1, x^2)dx^1 dx^2 + G(x^1, x^2)(dx^2)^2 \tag{3}$$

What does this formula mean? It's simple: just as there is no reason to require the whole world to use the same currency, there is no need to require every part of the world to use the same coordinate system. A more reasonable approach is to let each location use its own coordinate system (local coordinate system) and then provide the method for calculating distances locally. Therefore, the above formula says that at position $(x^1, x^2)$, the formula for calculating the length of the vector $(dx^1, dx^2)$ (the local Pythagorean theorem) is $ds^2 = E(x^1, x^2)(dx^1)^2 + 2F(x^1, x^2)dx^1 dx^2 + G(x^1, x^2)(dx^2)^2$.

The second problem is that we are not only studying 2D planes; we also need to study $n$-dimensional spaces. Thus, the most general formula is:

$$ds^2 = g_{\mu\nu}(\boldsymbol{x}) dx^{\mu} dx^{\nu} \tag{4}$$

Here $\boldsymbol{x}=(x^1,x^2,\dots,x^n)$, and the Einstein summation convention is used, meaning that identical upper and lower indices in a monomial imply summation. $g_{\mu\nu}$ is what we call the Riemannian metric. We can choose a symmetric metric, i.e., $g_{\mu\nu}=g_{\nu\mu}$, without changing the form of $ds^2$. And the entire $ds^2$, based on our previous discussion, is simply the different ways of calculating distances used at different positions in a high-dimensional space. Here, we restore the meaning of measurement to geometry.

On the other hand, a Riemannian metric can also be seen as a standard of measurement. Just as different countries have different currencies, they are not the same; but if there is a formula that can convert any country's currency into an equivalent amount of gold, then the problem of comparing different amounts of currency can be solved. The Riemannian metric plays a similar role. Rather than saying it provides different ways to calculate distance at different locations, it is better to say it unifies the way distance is calculated across all locations.

Incidentally, from the perspective of simply defining distance, we do not strictly have to use quadratic forms; for example, cubic forms, quartic forms, or even more complex forms could be used. However, in terms of practical value, we only study quadratic ones. Even so, there are already more problems than we can ever finish researching.

Some Examples

Under what circumstances does a non-constant Riemannian metric arise (or rather, become necessary)? As we saw earlier, converting Cartesian coordinates to polar coordinates results in a non-constant Riemannian metric. That is, even in a flat space, as long as a curvilinear coordinate system is used, a non-constant Riemannian metric will appear.

Furthermore, the Riemannian metric of a curved space must be non-constant. Are there specific examples? The most classic is likely the 2D sphere (it is a 2D sphere, not 3D spherical coordinates; readers should not confuse them). If we set the radius to 1, then the spherical coordinates are:

\begin{equation} \left\{\begin{aligned}&x=\sin\theta\cos\varphi\\ &y=\sin\theta\sin\varphi\\ &z=\cos\theta\end{aligned}\right. \tag{5} \end{equation}

Then its Riemannian metric is:

$$ds^2=dx^2+dy^2+dz^2=d\theta^2+\sin^2\theta d\varphi^2 \tag{6}$$

Another very vivid example is learned from the second volume of The Feynman Lectures on Physics. Suppose we are in a flat space, but the temperature varies everywhere. We use a ruler with a very high thermal expansion coefficient as our measurement tool. What would be the result? In high-temperature areas, the ruler expands, so the measurement result will be smaller; conversely, in low-temperature areas, the ruler shrinks, and the measurement result will be larger. For the same distance, the result might be 50cm in a high-temperature area and 100cm in a low-temperature area. Therefore, a non-constant metric is needed to unify them—either multiply 50 by 2, or divide 100 by 2, or multiply 50 by 4 while multiplying 100 by 2, etc. It is not hard to imagine that the Riemannian metric would then take the form (considering 2D or 3D cases):

$$ds^2 = f(x,y,z)(dx^2+dy^2+dz^2) \tag{7}$$

This leads to the appearance of a curved space—in this case, it is not that the space itself is "curved," but that the ruler is "curved." This example also appears in The Feynman Lectures on Gravitation; according to Feynman, it was invented by a student of Robertson. Due to its clear physical meaning, it is also called "isothermal parameters" or an "isothermal coordinate system." In fact, this is somewhat similar to the idea that "motion is relative." The appearance of a curved space may be due to the curvature of the space itself (like a sphere) or due to the "curvature" of the ruler (like a ruler with thermal expansion), but their mathematical results are identical.

Local Cartesian Coordinate System

Now we attempt to describe the Riemannian metric in matrix form. Let $\boldsymbol{g}=g_{\mu\nu}, \boldsymbol{x}=x^{\alpha}, d\boldsymbol{x}=dx^\alpha$, where vectors are treated as column vectors, and we do not distinguish between the components of a vector and the vector itself. Therefore, the Riemannian metric can be written as:

$$ds^2 = d\boldsymbol{x}^T \boldsymbol{g}d\boldsymbol{x} \tag{8}$$

Note that the matrix $\boldsymbol{g}$ is symmetric. Therefore, in general, it can be decomposed as $\boldsymbol{h}^T \boldsymbol{h}$, where $\boldsymbol{h}$ is a matrix of the same order as $\boldsymbol{g}$. In this case:

$$ds^2 = d\boldsymbol{x}^T \boldsymbol{h}^T \boldsymbol{h} d\boldsymbol{x}=\left(\boldsymbol{h}d\boldsymbol{x}\right)^T\left(\boldsymbol{h}d\boldsymbol{x}\right)=\left\|\boldsymbol{h}d\boldsymbol{x}\right\|^2 \tag{9}$$

That is to say, it ultimately transforms into the squared norm of $\boldsymbol{h}d\boldsymbol{x}$, and this norm is consistent with the Pythagorean theorem in flat space. We can consider that the matrix $\boldsymbol{h}$ exactly describes the local coordinate system; the vector $d\boldsymbol{x}$ under coordinate system $\boldsymbol{h}$ is exactly equivalent to the vector $\boldsymbol{h}d\boldsymbol{x}$ in the local Cartesian coordinate system. Or rather, $\boldsymbol{h}$ is the transformation matrix (Jacobian matrix) from the local coordinate system to a local Cartesian system.

Having a transformation to a Cartesian coordinate system, we can define many geometric quantities, all of which are extended from flat space. For example, given a vector $\boldsymbol{A}=A^{\mu}$, its squared norm is:

$$\|\boldsymbol{h}\boldsymbol{A}\|^2 = \boldsymbol{A}^T \boldsymbol{h}^T\boldsymbol{h}\boldsymbol{A}=\boldsymbol{A}^T \boldsymbol{g}\boldsymbol{A}=g_{\mu\nu}A^{\mu}A^{\nu} \tag{10}$$

Given two vectors $\boldsymbol{A}$ and $\boldsymbol{B}$, their inner product is:

$$\left(\boldsymbol{h}\boldsymbol{A}\right)^T \left(\boldsymbol{h}\boldsymbol{B}\right)= \boldsymbol{A}^T \boldsymbol{h}^T\boldsymbol{h}\boldsymbol{B}=\boldsymbol{A}^T \boldsymbol{g}\boldsymbol{B}=g_{\mu\nu}A^{\mu}B^{\nu} \tag{11}$$

If you wish, you can also define the angle $\theta$ between them as:

$$\theta=\arccos \frac{g_{\mu\nu}A^{\mu}B^{\nu}}{\sqrt{g_{\mu\nu}A^{\mu}A^{\nu}}\sqrt{g_{\mu\nu}B^{\mu}B^{\nu}}} \tag{12}$$

At the same time, we can calculate the area of the parallelogram spanned by two vectors $\boldsymbol{A}$ and $\boldsymbol{B}$:

$$\left\|\boldsymbol{h}\boldsymbol{A}\right\| \times \left\|\boldsymbol{h}\boldsymbol{B}\right\|\times \sin\theta = \sqrt{(g_{\mu\nu}A^{\mu}A^{\nu})(g_{\mu\nu}B^{\mu}B^{\nu})-(g_{\mu\nu}A^{\mu}B^{\nu})^2} \tag{13}$$

If we want to calculate (hyper)volume integrals, the volume element is:

$$\det(\boldsymbol{h})\prod_{\alpha} dx^{\alpha} = \sqrt{\det(\boldsymbol{g})} \prod_{\alpha} dx^{\alpha} = \sqrt{g}d\Omega \tag{14}$$

Here $g$ is shorthand for $\det(\boldsymbol{g})$, and $d\Omega$ is shorthand for $\prod_{\alpha} dx^{\alpha}$. Note that we have $\det(\boldsymbol{g})=\det(\boldsymbol{h}^T \boldsymbol{h})=(\det(\boldsymbol{h}))^2$. $\sqrt{g}$ is effectively a volume scaling factor. With this result, we can write: given $n$ vectors $\boldsymbol{A}^1,\boldsymbol{A}^2,\dots,\boldsymbol{A}^n$, writing them as column vectors, the hypervolume of the $n$-dimensional parallelotope they span is:

$$\sqrt{g}\det(\boldsymbol{A}^1,\boldsymbol{A}^2,\dots,\boldsymbol{A}^n)\tag{15}$$

Here $(\boldsymbol{A}^1,\boldsymbol{A}^2,\dots,\boldsymbol{A}^n)$ refers to arranging these $n$ column vectors into an $n\times n$ matrix.