By 苏剑林 | Oct 16, 2016
Vectors and Connection
Once we establish our own coordinate system at our location, we can perform various measurements. The result of a measurement might be a scalar, such as temperature or mass; these quantities remain the same regardless of the coordinate system you use. However, sometimes we measure vectors, such as velocity, acceleration, or force. These quantities are objective entities, but because the measurement results are expressed in terms of coordinate components, the components will be completely different if we change coordinates.
If all positions used the same coordinates, there would naturally be no controversy. However, as we have emphasized repeatedly, people at different locations may use different coordinate systems for various reasons. Therefore, when we write down a vector $A^{\mu}$, strictly speaking, we should also specify that it was measured at position $\boldsymbol{x}$: $A^{\mu}(\boldsymbol{x})$. We only omit this when there is no risk of ambiguity.
At this point, we are already able to perform some calculations. For instance, if $A^{\mu}$ is measured at $\boldsymbol{x}$, and the formula for calculating the square of the length element at $\boldsymbol{x}$ is $ds^2 = g_{\mu\nu} dx^{\mu} dx^{\nu}$, then the magnitude of $A^{\mu}$ is $\sqrt{g_{\mu\nu} A^{\mu}A^{\nu}}$, which is an objective entity.
As shown in the figure, different local coordinate systems can be established at each point on the sphere; at the very least, the orientations of the vertical axes of these coordinate systems are different.
Sometimes we need to compare two vectors at different positions, which involves finding their difference. Specifically, because the coordinate systems at different positions are different, if we directly subtract the components of two vectors at different positions, the result is meaningless. This is much like comparing 5 Chinese Yuan with 5 US Dollars; we cannot conclude that "5 USD - 5 RMB = (5-5) = 0". While the currency case is just a matter of inconsistent units, which can be resolved by selecting a common unit, the measurement of a vector depends not only on units but also on the coordinate system itself. For example, if a plane flies from China to the United States and passes a certain location in China, the measured velocity is $(300, 300, 300)$. When it reaches a location in the United States, the measured velocity is also $(300, 300, 300)$, both using the unit "km/h". However, the measurement in China was done by someone in China, and the measurement in the US was done by someone in the US. As everyone knows, the US and China are located on opposite sides of the Earth, and the local coordinate systems they established are definitely different. Thus, we cannot say that the velocity difference of the plane between the two places is $(300-300, 300-300, 300-300)=(0,0,0)$, because the difference also depends on direction!
Simply put, the root of this problem is that different locations use different coordinate systems—the unit lengths of coordinates are different, and the directions of coordinate axes are different, etc. Therefore, we need to transform the vector coordinates from one location into the coordinates of another location before we can compare the components. Here, we only consider the transformation between two positions $\boldsymbol{x}$ and $\boldsymbol{x}+d\boldsymbol{x}$ separated by an infinitesimal distance—that is, how to express the vector $A^{\mu}(\boldsymbol{x}+d\boldsymbol{x})$ measured at $\boldsymbol{x}+d\boldsymbol{x}$ (which has its own coordinate system) using the coordinate system at $\boldsymbol{x}$. Obviously, this involves a transformation matrix, so the key is determining this transformation matrix.
How do we find this matrix? Imagine that we can place a test vector at $\boldsymbol{x}$ and then move this vector to $\boldsymbol{x}+d\boldsymbol{x}$ to see what it becomes. If we have enough test vectors, we can determine the transformation matrix. Therefore, we need a set of the most natural test vectors to serve as our reference. We already know that geodesics are objective entities; in fact, geodesics provide the most natural reference imaginable—nothing is more natural. We can rewrite the geodesic equation as:
$$d\left(\frac{d x^{\mu} }{ds}\right)=-\Gamma_{\alpha\beta}^{\mu} \frac{d x^{\alpha} }{ds}d x^{\beta} \tag{28} $$
The meaning of this formula is that if there is a unit vector $\frac{dx^{\mu}}{ds}$ at the current position $\boldsymbol{x}$, then after moving a distance $d\boldsymbol{x}$ along the geodesic, the amount of change in the unit vector $\frac{dx^{\mu}}{ds}$ is $d\left(\frac{d x^{\mu} }{ds}\right)$, which is also equal to $-\Gamma_{\alpha\beta}^{\mu} \frac{d x^{\alpha} }{ds}d x^{\beta}$, i.e.,
$$\frac{dx^{\mu}}{ds}\quad\to\quad \frac{dx^{\mu}}{ds}-\Gamma_{\alpha\beta}^{\mu} \frac{d x^{\alpha} }{ds}d x^{\beta} \tag{29} $$
Thus, the coordinate transformation matrix (Jacobian matrix) from $\boldsymbol{x}$ to $\boldsymbol{x}+d\boldsymbol{x}$ is
$$\frac{\partial\left(\frac{dx^{\mu}}{ds}-\Gamma_{\alpha\beta}^{\mu} \frac{d x^{\alpha} }{ds}d x^{\beta}\right)}{\partial \frac{dx^{\nu}}{ds}}=\delta_{\nu}^{\mu}-\Gamma_{\nu\beta}^{\mu} d x^{\beta} \tag{30} $$
Consequently, the inverse coordinate transformation matrix from $\boldsymbol{x}+d\boldsymbol{x}$ back to $\boldsymbol{x}$ is
$$\label{lianluo}\delta_{\nu}^{\mu}+\Gamma_{\nu\beta}^{\mu} d x^{\beta} \tag{31} $$
Viewed this way, **it is highly appropriate to call the coefficients $\Gamma_{\alpha\beta}^{\mu}$ "connection coefficients," as they link the coordinates at $\boldsymbol{x}$ and $\boldsymbol{x}+d\boldsymbol{x}$ together.**
Knowing the coordinate transformation method, it is easy to see that if we place $A^{\mu}(\boldsymbol{x}+d\boldsymbol{x})$ into the position $\boldsymbol{x}$ for measurement, the result will be
$$\left(\delta_{\nu}^{\mu}+\Gamma_{\nu\beta}^{\mu} d x^{\beta}\right) A^{\nu}(\boldsymbol{x}+d\boldsymbol{x})=A^{\mu}(\boldsymbol{x}+d\boldsymbol{x})+\Gamma_{\nu\beta}^{\mu} A^{\nu}(\boldsymbol{x}+d\boldsymbol{x}) d x^{\beta} \tag{32} $$
Covariant Derivative
As mentioned earlier, there is a need to subtract vectors, and in the "Connection" section, we studied the coordinate transformation between two positions separated by an infinitesimal distance, obtaining the result in equation $(32)$: if we place $A^{\mu}(\boldsymbol{x}+d\boldsymbol{x})$ at position $\boldsymbol{x}$ for measurement, the result will be
$$A^{\mu}(\boldsymbol{x}+d\boldsymbol{x})+\Gamma_{\nu\beta}^{\mu} A^{\nu}(\boldsymbol{x}+d\boldsymbol{x}) d x^{\beta} \tag{33} $$
Now, we can perform the subtraction directly because both vectors are now measured at the same position.
$$A^{\mu}(\boldsymbol{x}+d\boldsymbol{x})+\Gamma_{\nu\beta}^{\mu} A^{\nu}(\boldsymbol{x}+d\boldsymbol{x}) d x^{\beta}-A^{\mu}(\boldsymbol{x}) \tag{34} $$
We can study the limiting case:
$$\lim_{d\boldsymbol{x} \to 0} \frac{A^{\mu}(\boldsymbol{x}+d\boldsymbol{x})+\Gamma_{\nu\beta}^{\mu} A^{\nu}(\boldsymbol{x}+d\boldsymbol{x}) d x^{\beta}-A^{\mu}(\boldsymbol{x})}{dx^{\beta}} \tag{35} $$
It is not difficult to see that the result will be
$$\frac{\partial A^{\mu}}{\partial x^{\beta}}+\Gamma_{\nu\beta}^{\mu} A^{\nu} \tag{36} $$
This is called the **covariant derivative** of the vector $A^{\mu}$, denoted as
$$A^{\mu}_{;\beta}=\frac{\partial A^{\mu}}{\partial x^{\beta}}+\Gamma_{\nu\beta}^{\mu} A^{\nu} \tag{37} $$
And naturally, the following result is called the **covariant differential**:
$$D A^{\mu}=dA^{\mu}+\Gamma_{\nu\beta}^{\mu} A^{\nu}dx^{\beta} \tag{38} $$
Dividing by the line element $ds$, we obtain the derivative
$$\frac{D A^{\mu}}{Ds}=\frac{dA^{\mu}}{ds}+\Gamma_{\nu\beta}^{\mu} A^{\nu}\frac{dx^{\beta}}{ds} \tag{39} $$
It can be seen that the result depends on the choice of the curve $x^{\beta}$. We generally select it to be a geodesic, and the above expression is called the **geodesic derivative (derivative along a geodesic)**.
Looking back at the whole process, our starting point for deriving the covariant derivative was: **Directly subtracting the components of vectors at two different positions is meaningless. If necessary, one must transform the vector from one position to the other. By studying the limiting case, we arrive at the covariant derivative. The covariant derivative is a definition of a derivative in space that has geometric significance; therefore, it is also an objective entity.**
In general tensor analysis or Riemannian geometry textbooks, there are many ways to derive the covariant derivative. Some textbooks adopt the following logic: Since we know the specific form of the gradient in a Cartesian coordinate system, we start from the Cartesian system and obtain the derivative forms in other coordinate systems through transformation laws. This seems reasonable, but it actually implicitly assumes the space is flat—implying that although curvilinear coordinates are used, they can be transformed back into a Cartesian system via some transformation; otherwise, this approach would not hold. This means the space is flat; the fact that curved spaces yield the same result is, in a sense, a coincidence. If one were to derive the Riemann curvature using the same logic, the result would be identically zero—because the space was assumed to be flat to begin with.
There are other derivation methods. Overall, I believe most methods seem "insufficiently geometric" and are more focused on algebraic manipulation. Here, the author has used a logic that is as geometric as possible.