Deriving the Continuity Equation and Fokker-Planck Equation using the Test Function Method

By 苏剑林 | February 11, 2023

In the article "Discussion on Generative Diffusion Models (6): ODEs in the General Framework", we derived the Fokker-Planck equation for SDEs; and in "Discussion on Generative Diffusion Models (12): 'Hardcore' Diffusion ODEs", we independently derived the continuity equation for ODEs. Both are equations describing the distribution change of random variables evolving along SDEs/ODEs, where the continuity equation is a special case of the Fokker-Planck equation. When deriving the Fokker-Planck equation, we relied on a somewhat crude application of Taylor expansion to the Dirac delta function; while the result was correct, the method was a bit unconventional. When deriving the continuity equation, we combined the Jacobian determinant with Taylor expansion, which is a standard approach but difficult to generalize to the Fokker-Planck equation.

In this article, we introduce the "Test Function Method." It is one of the standard methods for deriving continuity equations and Fokker-Planck equations. Its analytical process is more formal and its application scope is broader.

Integration by Parts

Before the formal derivation, we first introduce a key mathematical result used later—the high-dimensional generalization of integration by parts.

General tutorials usually limit the introduction of integration by parts to the one-dimensional case, namely: \begin{equation}\int_a^b uv'dx = uv|_a^b - \int_a^b vu'dx\end{equation} where $u, v$ are functions of $x$, and $'$ denotes the derivative with respect to $x$. We need a high-dimensional version of this. To that end, let's recall the derivation of one-dimensional integration by parts, which depends on the product rule for derivatives: \begin{equation}(uv)' = uv' + vu'\end{equation} Integrating both sides with respect to $x$ and rearranging gives the formula. For the high-dimensional case, we consider a similar formula: \begin{equation}\nabla\cdot(u\boldsymbol{v}) = \boldsymbol{v}\cdot\nabla u + u\nabla \cdot\boldsymbol{v}\end{equation} where $u$ is a scalar function of $\boldsymbol{x}$, $\boldsymbol{v}$ is a vector function of $\boldsymbol{x}$ (with the same dimension), and $\nabla$ denotes the gradient with respect to $\boldsymbol{x}$. Now we integrate both sides over a region $\Omega$: \begin{equation}\int_{\Omega}\nabla\cdot(u\boldsymbol{v})d\boldsymbol{x} = \int_{\Omega}\boldsymbol{v}\cdot\nabla u d\boldsymbol{x} + \int_{\Omega}u\nabla \cdot\boldsymbol{v} d\boldsymbol{x}\end{equation} According to the Gauss Divergence Theorem, the left side equals $\int_{\partial\Omega}u\boldsymbol{v}\cdot\hat{\boldsymbol{n}}dS$, where $\partial\Omega$ is the boundary of $\Omega$, $\hat{\boldsymbol{n}}$ is the outward unit normal vector of the boundary, and $dS$ is the area element. Thus, after rearranging, we have: \begin{equation}\int_{\Omega}\boldsymbol{v}\cdot\nabla u d\boldsymbol{x} = \int_{\partial\Omega}u\boldsymbol{v}\cdot\hat{\boldsymbol{n}}dS - \int_{\Omega}u\nabla \cdot\boldsymbol{v} d\boldsymbol{x}\label{eq:int-by-parts}\end{equation} This is the high-dimensional integration by parts formula we seek. Specifically, for a probability density function $p$, due to its non-negativity and the constraint that its integral is 1, it must be that $p \to 0$ and $\nabla p \to \boldsymbol{0}$ at infinity. Therefore, if $\Omega$ is chosen as the entire space (integration without a specified region defaults to the whole space), by substituting $u=p$ and $\boldsymbol{v}$ as a vector function or $u$ as a scalar function and $\boldsymbol{v}=\nabla p$ into the equation, we obtain: \begin{align}\int\boldsymbol{v}\cdot\nabla p d\boldsymbol{x} =&\, - \int p\nabla \cdot\boldsymbol{v} d\boldsymbol{x}\label{eq:int-by-parts-p} \\ \int u\nabla \cdot\nabla p d\boldsymbol{x} = &\,-\int\nabla p\cdot\nabla u d\boldsymbol{x}\label{eq:int-by-parts-gp}\end{align} To further formalize this conclusion, one could assume that the support of $p$ is compact. However, this is purely for mathematical rigor; for general understanding, simply assuming $p \to 0$ and $\nabla p \to \boldsymbol{0}$ at infinity is sufficient.

ODE Evolution

The principle of the test function method is that if for any function $\phi(\boldsymbol{x})$, the following holds: \begin{equation}\int f(\boldsymbol{x})\phi(\boldsymbol{x})d\boldsymbol{x} = \int g(\boldsymbol{x})\phi(\boldsymbol{x})d\boldsymbol{x}\end{equation} then $f(\boldsymbol{x})=g(\boldsymbol{x})$ must hold, where $\phi(\boldsymbol{x})$ is called the test function. A more rigorous definition would require specifying the selection space for $\phi(\boldsymbol{x})$ and the specific meaning of the equality (e.g., strictly equal, almost everywhere equal, equal in probability, etc.), but we will not delve into those details here.

For an ODE \begin{equation}\frac{d\boldsymbol{x}_t}{dt}=\boldsymbol{f}_t(\boldsymbol{x}_t)\label{eq:ode}\end{equation} we discretize it as: \begin{equation}\boldsymbol{x}_{t+\Delta t} = \boldsymbol{x}_t + \boldsymbol{f}_t(\boldsymbol{x}_t)\Delta t\label{eq:ode-diff}\end{equation} Then we have: \begin{equation}\phi(\boldsymbol{x}_{t+\Delta t}) = \phi(\boldsymbol{x}_t + \boldsymbol{f}_t(\boldsymbol{x}_t)\Delta t)\approx \phi(\boldsymbol{x}_t) + \Delta t\,\,\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)\end{equation} Taking the expectation on both sides, we get: \begin{equation}\int p_{t+\Delta t}(\boldsymbol{x}_{t+\Delta t})\phi(\boldsymbol{x}_{t+\Delta t}) d\boldsymbol{x}_{t+\Delta t}\approx \int p_t(\boldsymbol{x}_t)\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \Delta t\int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t\end{equation} Since the result of the integration does not depend on the notation of the variable of integration, replacing $\boldsymbol{x}_{t+\Delta t}$ with $\boldsymbol{x}_t$ on the left side is equivalent: \begin{equation}\int p_{t+\Delta t}(\boldsymbol{x}_t)\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t\approx \int p_t(\boldsymbol{x}_t)\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \Delta t\int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t\label{eq:change-var}\end{equation} Moving the first term on the right to the left and taking the limit $\Delta t \to 0$, we get: \begin{equation}\int \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t = \int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t\label{eq:dt-0}\end{equation} Applying the integration by parts formula $\eqref{eq:int-by-parts-p}$ to the right side gives: \begin{equation}\int \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t = -\int \Big[\nabla_{\boldsymbol{x}_t}\cdot\big(p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\big)\Big]\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t\end{equation} According to the equality principle of the test function method, we have: \begin{equation}\frac{\partial p_t(\boldsymbol{x}_t)}{\partial t} = -\nabla_{\boldsymbol{x}_t}\cdot\big(p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\big)\end{equation} This is known as the "Continuity Equation."

SDE Evolution

For an SDE \begin{equation}d\boldsymbol{x}_t = \boldsymbol{f}_t(\boldsymbol{x}_t) dt + g_t d\boldsymbol{w}\label{eq:sde}\end{equation} we discretize it as: \begin{equation}\boldsymbol{x}_{t+\Delta t} = \boldsymbol{x}_t + \boldsymbol{f}_t(\boldsymbol{x}_t) \Delta t + g_t \sqrt{\Delta t}\boldsymbol{\varepsilon},\quad \boldsymbol{\varepsilon}\sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{I})\label{eq:sde-diff}\end{equation} Then: \begin{equation}\begin{aligned} \phi(\boldsymbol{x}_{t+\Delta t}) =&\, \phi(\boldsymbol{x}_t + \boldsymbol{f}_t(\boldsymbol{x}_t) \Delta t + g_t \sqrt{\Delta t}\boldsymbol{\varepsilon}) \\ \approx&\, \phi(\boldsymbol{x}_t) + \left(\boldsymbol{f}_t(\boldsymbol{x}_t) \Delta t + g_t \sqrt{\Delta t}\boldsymbol{\varepsilon}\right)\cdot \nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) + \frac{1}{2} \left(g_t\sqrt{\Delta t}\boldsymbol{\varepsilon}\cdot \nabla_{\boldsymbol{x}_t}\right)^2\phi(\boldsymbol{x}_t) \end{aligned}\end{equation} Taking expectations on both sides, note that the right side must be averaged over both $\boldsymbol{x}_t$ and $\boldsymbol{\varepsilon}$. The expectation over $\boldsymbol{\varepsilon}$ can be calculated first, yielding: \begin{equation}\phi(\boldsymbol{x}_t) + \Delta t\,\,\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot \nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) + \frac{1}{2} \Delta t\,g_t^2\nabla_{\boldsymbol{x}_t}\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) \end{equation} Thus: \begin{equation}\begin{aligned} &\,\int p_{t+\Delta t}(\boldsymbol{x}_{t+\Delta t})\phi(\boldsymbol{x}_{t+\Delta t}) d\boldsymbol{x}_{t+\Delta t}\\ \approx&\, \int p_t(\boldsymbol{x}_t)\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \Delta t\int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \int\frac{1}{2} \Delta t\,g_t^2 p_t(\boldsymbol{x}_t)\nabla_{\boldsymbol{x}_t}\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t \end{aligned}\end{equation} Similar to equations $\eqref{eq:change-var}$ and $\eqref{eq:dt-0}$, taking the limit $\Delta t \to 0$ yields: \begin{equation}\int \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t = \int p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t + \int\frac{1}{2} \,g_t^2 p_t(\boldsymbol{x}_t)\nabla_{\boldsymbol{x}_t}\cdot\nabla_{\boldsymbol{x}_t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t\end{equation} Applying equation $\eqref{eq:int-by-parts-p}$ to the first term on the right, and applying equation $\eqref{eq:int-by-parts-gp}$ followed by equation $\eqref{eq:int-by-parts-p}$ to the second term, we get: \begin{equation}\int \frac{\partial p_t(\boldsymbol{x}_t)}{\partial t}\phi(\boldsymbol{x}_t) d\boldsymbol{x}_t = \int \left[-\nabla_{\boldsymbol{x}_t}\cdot\big(p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\big)+\frac{1}{2}g_t^2 \nabla_{\boldsymbol{x}}\cdot\nabla_{\boldsymbol{x}}p_t(\boldsymbol{x})\right]\phi(\boldsymbol{x}_t)d\boldsymbol{x}_t\end{equation} According to the equality principle of the test function method: \begin{equation}\frac{\partial p_t(\boldsymbol{x}_t)}{\partial t} = -\nabla_{\boldsymbol{x}_t}\cdot\big(p_t(\boldsymbol{x}_t)\boldsymbol{f}_t(\boldsymbol{x}_t)\big)+\frac{1}{2}g_t^2 \nabla_{\boldsymbol{x}}\cdot\nabla_{\boldsymbol{x}}p_t(\boldsymbol{x})\end{equation} This is the "Fokker-Planck Equation."

Summary

This article introduced the test function method used to derive certain probability equations. The main content included the high-dimensional generalization of integration by parts, as well as the derivation process for the continuity equation of ODEs and the Fokker-Planck equation of SDEs.