Proving Noether’s theorem

In the present post I want to record some notes I made on the mathematical nuances involved in a proof of Noether’s theorem and the mathematical relevance of the theorem to some simple conservation laws in classical physics, namely, the conservation of energy and the conservation of linear momentum. Noether’s Theorem has important applications in a wide range of classical mechanics problems as well as in quantum mechanics and Einstein’s relativity theory. It is also used in the study of certain classes of partial differential equations that can be derived from variational principles.

The theorem was first published by Emmy Noether in 1918. An interesting book by Yvette Kosmann-Schwarzbach presents an English translation of Noether’s 1918 paper and discusses in detail the history of the theorem’s development and its impact on theoretical physics in the 20th Century. (Kosmann-Schwarzbach, Y, 2011, The Noether Theorems: Invariance and Conservation Laws in the Twentieth Century. Translated by Bertram Schwarzbach. Springer). At the time of writing, the book is freely downloadable online.

Mathematical setup of Noether’s theorem

The case I explore in detail here is that of a variational calculus functional of the form

S[y] = \int_a^b \mathrm{d}x F(x, y, y^{\prime})

where x is a single independent variable and y = (y_1, y_2, \ldots, y_n) is a vector of n dependent variables. The functional has stationary paths defined by the usual Euler-Lagrange equations of variational calculus. Noether’s theorem concerns how the value of this functional is affected by families of continuous transformations of the dependent and independent variables (e.g., translations, rotations) that are defined in terms of one or more real parameters. The case I explore in detail here involves transformations defined in terms of only a single parameter, call it \delta. The transformations can be represented in general terms as

\overline{x} = \Phi(x, y, y^{\prime}; \delta)

\overline{y}_k = \Psi_k(x, y, y^{\prime}; \delta)

for k = 1, 2, \ldots, n. The functions \Phi and \Psi_k are assumed to have continuous first derivatives with respect to all the variables, including the parameter \delta. Furthermore, the transformations must reduce to identities when \delta = 0, i.e.,

x \equiv \Phi(x, y, y^{\prime}; 0)

y_k \equiv \Psi_k(x, y, y^{\prime}; 0)

for k = 1, 2, \ldots, n. As concrete examples, translations and rotations are continuous differentiable transformations that can be defined in terms of a single parameter and that reduce to identities when the parameter takes the value zero.

Noether’s theorem is assumed to apply to infinitesimally small changes in the dependent and independent variables, so we can assume |\delta| \ll 1 and then use perturbation theory to prove the theorem. Treating \overline{x} and \overline{y}_k as functions of \delta and Taylor-expanding them about \delta = 0 we get

\overline{x}(\delta) = \overline{x}(0) + \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)

\iff

\overline{x}(\delta) = x + \delta \phi + O(\delta^2)

where

\phi(x, y, y^{\prime}) \equiv \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}

and

\overline{y}_k (\delta) = \overline{y}_k (0) + \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)

\iff

\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)

where

\psi_k (x, y, y^{\prime}) \equiv \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0} for k = 1, 2, \ldots, n.

Noether’s theorem then says that whenever the functional S[y] is invariant under the above family of transformations, i.e., whenever

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})

for all c and d such that a \leq c < d \leq b, where \overline{c} = \Phi(c, y(c), y^{\prime}(c)) and \overline{d} = \Phi(d, y(d), y^{\prime}(d)), then for each stationary path of S[y] the following equation holds:

\sum_{k = 1}^n \frac{\partial F}{\partial y_k^{\prime}}\psi_k + \bigg(F - \sum_{k = 1}^n y_k^{\prime}\frac{\partial F}{\partial y_k^{\prime}}\bigg)\phi = \mathrm{constant}

As illustrated below, this remarkable equation encodes a number of conservation laws in physics, including conservation of energy, linear and angular momentum given that the relevant equations of motion are invariant under translations in time and space, and under rotations in space respectively. Thus, Noether’s theorem is often expressed as a statement along the lines that whenever a system has a continuous symmetry there must be corresponding quantities whose values are conserved.

Application of the theorem to familiar conservation laws in classical physics

It is, of course, not necessary to use the full machinery of Noether’s theorem for simple examples of conservation laws in classical physics. The theorem is most useful in unfamiliar situations in which it can reveal conserved quantities which were not previously known. However, going through the motions in simple cases clarifies how the mathematical machinery works in more sophisticated and less familiar situations.

To obtain the law of the conservation of energy in the simplest possible scenario, consider a particle of mass m moving along a straight line in a time-invariant potential field V(x) with position at time t given by the function x(t). The Lagrangian formulation of mechanics then says that the path followed by the particle will be a stationary path of the action functional

\int_0^{T} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2 - V(x)\big)

The Euler-Lagrange equation for this functional would give Newton’s second law as the equation governing the particle’s motion. With regard to demonstrating energy conservation, we notice that the Lagrangian, which is more generally of the form L(t, x, \dot{x}) when there is a time-varying potential, here takes the simpler form L(x, \dot{x}) because there is no explicit dependence on time. Therefore we might expect the functional to be invariant under translations in time, and thus Noether’s theorem to hold. We will verify this. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

\overline{t}(\delta) = t + \delta \phi + O(\delta^2) \equiv t + \delta

and

\overline{x}(\delta) = x + \delta \cdot 0 + O(\delta^2) \equiv x

From the first equation we see that \phi = 1 in the case of a simple translation in time by an amount \delta, and from the second equation we see that \psi = 0, which simply reflects the fact that we are only translating in the time direction. The invariance of the functional under these transformations can easily be demonstrated by writing

\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\overline{x}, \dot{\overline{x}}) = \int_{\overline{0}-\delta}^{\overline{T}-\delta} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t L(x, \dot{x})

where the limits in the second integral follow from the change of the time variable from \overline{t} to t. Thus, Noether’s theorem holds and with \phi = 1 and \psi = 0 the fundamental equation in the theorem reduces to

L - \dot{x}\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}

Evaluating the terms on the left-hand side we get

\frac{1}{2}m\dot{x}^2 - V(x) - \dot{x} m\dot{x} =\mathrm{constant}

\iff

\frac{1}{2}m\dot{x}^2 + V(x) = E = \mathrm{constant}

which is of course the statement of the conservation of energy.

To obtain the law of conservation of linear momentum in the simplest possible scenario, assume now that the above particle is moving freely in the absence of any potential field, so V(x) = 0 and the only energy involved is kinetic energy. The path followed by the particle will now be a stationary path of the action functional

\int_0^{T} \mathrm{d}t L(\dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2\big)

The Euler-Lagrange equation for this functional would give Newton’s first law as the equation governing the particle’s motion (constant velocity in the absence of any forces). To get the law of conservation of linear momentum we will consider a translation in space rather than time, and check that the action functional is invariant under such translations. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

\overline{t}(\delta) = t + \delta \cdot 0 + O(\delta^2) \equiv t

and

\overline{x}(\delta) = x + \delta \psi + O(\delta^2) \equiv x + \delta

From the first equation we see that \phi = 0 reflecting the fact that we are only translating in the space direction, and from the second equation we see that \psi = 1 in the case of a simple translation in space by an amout \delta. The invariance of the functional under these transformations can easily be demonstrated by noting that \dot{\overline{x}} = \dot{x}, so we can write

\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\dot{\overline{x}}) = \int_0^{T} \mathrm{d}t L(\dot{x})

since the limits of integration are not affected by the translation in space. Thus, Noether’s theorem holds and with \phi = 0 and \psi = 1 the fundamental equation in the theorem reduces to

\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}

\iff

m\dot{x} = \mathrm{constant}

This is, of course, the statement of the conservation of linear momentum.

Proof of Noether’s theorem

To prove Noether’s theorem we will begin with the transformed functional

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})

We will substitute into this the linearised forms of the transformations, namely

\overline{x}(\delta) = x + \delta \phi + O(\delta^2)

and

\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)

for k = 1, 2, \ldots, n, and then expand to first order in \delta. Note that the integration limits are, to first order in \delta,

\overline{c} = c + \delta \phi(c)

and

\overline{d} = d + \delta \phi(d)

Using the linearised forms of the transformations and writing \psi = (\psi_1, \psi_2, \ldots, \psi_n) we get

\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \frac{\mathrm{d}x}{\mathrm{d}\overline{x}}

\frac{\mathrm{d}\overline{x}}{\mathrm{d}x} = 1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}

Inverting the second equation we get

\frac{\mathrm{d}x}{\mathrm{d}\overline{x}} = \big(1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big)^{-1} = 1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} + O(\delta^2)

Using this in the first equation we find, to first order in \delta,

\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \big(1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big) = \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)

Making the necessary substitutions we can then write the transformed functional as

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})

= \int_{\overline{c}-\delta \phi(c)}^{\overline{d}-\delta \phi(d)} \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

Treating F as a function of \delta and expanding about \delta = 0 to first order we get

F(\delta) = F(0) + \delta \frac{\partial F}{\partial \delta}\big|_{\delta = 0}

= F(x, y, y^{\prime}) + \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)

Then using the expression for \frac{\mathrm{d}\overline{x}}{\mathrm{d}x} above, the transformed functional becomes

\int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x F(x, y, y^{\prime})

+ \int_c^d \mathrm{d}x \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)

+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F(x, y, y^{\prime}) + O(\delta^2)

Ignoring the second order term in \delta we can thus write

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})

+ \delta \int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg)

Since the functional is invariant, however, this implies

\int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg) = 0

We now manipulate this equation by integrating the terms involving \frac{\mathrm{d}\phi}{\mathrm{d}x} and \frac{\mathrm{d}\psi_k}{\mathrm{d}x} by parts. We get

\int_c^d \mathrm{d}x \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} = \bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg]_c^d

- \int_c^d \mathrm{d}x \phi \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)

and

\int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} = \bigg[\sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d - \int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\psi_k

Substituting these into the equation gives

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d

+ \int_c^d \mathrm{d}x \phi \bigg(\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg)

+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0

We can manipulate this equation further by expanding the integrand in the second term on the left-hand side. We get

\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)

= \frac{\partial F}{\partial x} - \frac{\partial F}{\partial x} - \sum_{k=1}^n \frac{\partial F}{\partial y_k}y^{\prime}_k - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n y^{\prime}_k \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)

= \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)

Thus, the equation becomes

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d

+ \int_c^d \mathrm{d}x \phi \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)

+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0

We can now see at a glance that the second and third terms on the left-hand side must vanish because of the Euler-Lagrange expressions appearing in the brackets (which are identically zero on stationary paths). Thus we arrive at the equation

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d = 0

which proves that the formula inside the square brackets is constant as per Noether’s theorem.

Alternative approaches to formulating geodesic equations on manifolds

A geodesic can be defined as an extremal path between two points on a manifold in the sense that it minimises or maximises some criterion of interest (e.g., minimises distance travelled, maximises proper time, etc). Such a path will satisfy some geodesic equations equivalent to the Euler-Lagrange equations of the calculus of variations. A geodesic can also be defined in a conceptually different way as the straightest possible path between two points on a manifold. In this case the path will satisfy geodesic equations derived by requiring parallel transport of a tangent vector along the path. Although these are conceptually different ways of defining geodesics, they are mathematically equivalent. In the present note I want to explore the derivation of geodesic equations in these two different ways and prove their mathematical equivalence.

Now, in the calculus of variations we typically define a system’s action S to be the time-integral of a Lagrangian L:

S \equiv \int^{t_B}_{t_A} L(q_i, \dot{q_i}) dt

where L(q_i, \dot{q_i}) says that the Lagrangian is a function of position coordinates q_i and velocities \dot{q_i} (and i ranges over however many coordinates there are). We find the trajectory that yields a desired extremal value of the action S as the one that satisfies the Euler-Lagrange equations

0 = \frac{d}{dt} \big(\frac{\partial L}{\partial \dot{q_i}} \big) - \frac{\partial L}{\partial q_i}

Let us now suppose that we are facing an exactly analogous situation in which there are two points on the manifold, A and B, and we are considering possible paths between them to try to find the extremal one. We can describe any path between A and B by specifying the coordinates of the points along it as functions of a parameter \sigma that goes from a value of 0 at A to a value of 1 at B, i.e., by specifying the functions x^{\mu}(\sigma). Noting that the line element can be written as

ds^2 = g_{\mu \gamma} dx^{\mu} dx^{\gamma}

we can write the length of a particular path as

s = \int \sqrt{ds^2} = \int^1_0 \sqrt{g_{\mu \gamma} \frac{dx^{\mu}}{d \sigma} \frac{dx^{\gamma}}{d \sigma}} d \sigma

Note that the metric is a function of the coordinates of points along the path, which in turn are functions of the parameter \sigma, i.e., g_{\mu \gamma} = g_{\mu \gamma}(x^{\alpha}(\sigma)). This situation is exactly analogous to the usual calculus of variations scenario because, writing \dot{x}^{\alpha} \equiv d x^{\alpha}/d \sigma, we see that we have a Lagrangian function

L(x^{\alpha}, \dot{x}^{\alpha}) = \sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}

and we hope to find the path x^{\mu}(\sigma) that makes the integral of the Lagrangian extreme. This will be the path that satisfies the Euler-Lagrange equations

0 = \frac{d}{d \sigma} \big(\frac{\partial L}{\partial \dot{x}^{\alpha}}\big) - \frac{\partial L}{\partial x^{\alpha}}

This corresponds to N separate differential equations in an N-dimensional manifold, one equation for each value of the index \alpha.

We can manipulate the Euler-Lagrange equations to get geodesic equations which are easier to use in particular contexts. First, note that

\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{\partial}{\partial \dot{x}^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}

= \frac{1}{2L} (g_{\mu \gamma} \ \delta^{\mu}{\alpha} \ \dot{x}^{\gamma} + g_{\mu \gamma} \ \dot{x}^{\mu} \ \delta^{\ \gamma}_{\alpha})

because, for example, \partial \dot{x}^{\mu}/\partial \dot{x}^{\alpha} = \delta^{\mu}_{\alpha}. Also note that the metric is treated as a constant as it depends on x^{\alpha} not on \dot{x}^{\alpha}. Doing the sums over the Kronecker deltas we get

\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{2L}(g_{\alpha \gamma} \ \dot{x}^{\gamma} + g_{\mu \alpha} \ \dot{x}^{\mu})

= \frac{1}{2L}(g_{\alpha \mu} \ \dot{x}^{\mu} + g_{\alpha \mu} \ \dot{x}^{\mu})

= \frac{1}{L} g_{\alpha \mu} \ \dot{x}^{\mu}

But notice that since

s = \int L d \sigma

we have

\frac{ds}{d \sigma} = L

so

\frac{1}{L} = \frac{d \sigma}{ds}

and we can write

\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{L} g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma}

= g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma} \frac{d \sigma}{ds}

= g_{\alpha \mu} \frac{d x^{\mu}}{ds}

Next, we have

\frac{\partial L}{\partial x^{\alpha}} = \frac{\partial}{\partial x^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}

= \frac{1}{2L} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \dot{x}^{\mu} \dot{x}^{\gamma}

= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{d \sigma} \frac{d x^{\gamma}}{d \sigma} \frac{d \sigma}{ds}

= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}

Putting these results into the Euler-Lagrange equations we get

0 = \frac{d}{d \sigma} \big(g_{\alpha \mu} \frac{d x^{\mu}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}

Finally, multiplying through by d \sigma/ds we get

0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}

where I have also renamed \mu \rightarrow \beta in the first term to make it clearer that the Einstein summations in the first and second terms are independent. This is the first version of the geodesic equations, derived by requiring that the path between the points A and B should be extremal in the sense of satisfying the Euler-Lagrange equations of the calculus of variations.

We will now derive a second version of the geodesic equations by requiring the geodesic to be a path that is locally straight. In differential geometry a path is defined as straight if it parallel transports its own tangent vector, i.e., if the tangent vector does not change as we move an infinitesimal step along the path. If we take an arbitrary point on the path to be x^{\mu} \ e_{\mu} and we take ds to be an infinitesimal displacement along the path, then a tangent vector to the path is

\frac{d x^{\mu}}{d \sigma} e_{\mu}

and we want

\frac{d}{ds}\big(\frac{d x^{\mu}}{d \sigma}e_{\mu} \big) = \frac{d^2 x^{\mu}}{ds d \sigma} e_{\mu} + \frac{d x^{\mu}}{d \sigma} \frac{d e_{\mu}}{ds} = 0

Multiplying through by d \sigma/ds this gives

\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d e_{\mu}}{ds} = 0

But

\frac{d e_{\mu}}{ds} = \frac{\partial e_{\mu}}{\partial x^{\gamma}} \frac{d x^{\gamma}}{d s}

= \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha}

Putting this into the equation gives

\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha} = 0

To enable us to factor out the basis vector we can rename the indices in the second term as \mu \rightarrow \alpha and \gamma \rightarrow \beta to get

\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} e_{\mu} = 0

\iff

\big[\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} \big] e_{\mu} = 0

\implies

\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = 0

This is the second version of the geodesic equations, derived by assuming that the path between the two points on the manifold is locally straight.

We now have two seemingly different versions of the geodesic equations, namely

0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}

and

0 = \frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta}

We will next show that they are in fact mathematically equivalent. Starting from the first version, we can expand out the brackets to get

0 = \frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}

\iff

0 = \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}

Now we rename the indices as follows: \sigma \rightarrow \alpha in the first term; \sigma \rightarrow \beta in the second term; \beta \rightarrow \mu and \alpha \rightarrow \sigma in the third term; and \alpha \rightarrow \sigma, \mu \rightarrow \alpha, \gamma \rightarrow \beta in the fourth term. We get

0 = \frac{1}{2}\frac{\partial g_{\sigma \beta}}{\partial x^{\alpha}}\frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\sigma \alpha}}{\partial x^{\beta}} \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2} - \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}} \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds}

We can write this as

0 = \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} \frac{1}{2} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}] + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2}

Finally, multiplying through by g^{\mu \sigma} and using the facts that

g^{\mu \sigma} \ g_{\sigma \mu} = 1

and

\Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = \frac{1}{2} g^{\mu \sigma} [\partial{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}]

we get

0 = \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} + \frac{d^2 x^{\mu}}{ds^2}

which is the second version of the geodesic equation. Thus, the two versions are equivalent as claimed.

Simple variational setups yielding Newton’s second law and Schrödinger’s equation

It is a delightful fact that one can get both the fundamental equation of classical mechanics (Newton’s Second Law) and the fundamental equation of quantum mechanics (Schrödinger’s equation) by solving very simple variational problems based on the familiar conservation of mechanical energy equation

K + U = E

In the present note I want to briefly set out the relevant calculations in a unified way emphasising the common underlying structure provided by the conservation of mechanical energy and the Calculus of Variations. The kinetic energy K will be taken to be

K = \frac{1}{2}m \dot{x}^2 = \frac{p^2}{2m}

where \dot{x} = \frac{\mathrm{d}x}{\mathrm{d}t} is the particle’s velocity, p = m\dot{x} is its momentum, and m is its mass. The potential energy U will be regarded as some function of x only.

To obtain Newton’s Second Law we find the stationary path followed by the particle with respect to the functional

S[x]= \int_{t_1}^{t_2} L(t, x, \dot{x}) dt = \int_{t_1}^{t_2} (K - U) dt

The function L(t, x, \dot{x}) = K - U is usually termed the Lagrangian in classical mechanics. The functional S[x] is usually called the action. The Euler-Lagrange equation for this Calculus of Variations problem is

\frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t}\big(\frac{\partial L}{\partial \dot{x}}\big) = 0

and this is Newton’s Second Law in disguise! We have

\frac{\partial L}{\partial x} = -\frac{\mathrm{d}U}{\mathrm{d}x} \equiv F

\frac{\partial L}{\partial \dot{x}} = m\dot{x} \equiv p

and

\frac{\mathrm{d}}{\mathrm{d}t} \big(\frac{\partial L}{\partial \dot{x}}\big) = \frac{\mathrm{d}p}{\mathrm{d}t} = m\ddot{x} \equiv ma

so substituting these into the Euler-Lagrange equation we get Newton’s Second Law, F = ma.

To obtain Schrödinger’s equation we introduce a function

\psi(x) = exp\big(\frac{1}{\hbar}\int p\mathrm{d}x\big)

where p = m \dot{x} is again the momentum of the particle and \hbar is the reduced Planck’s constant from quantum mechanics. (Note that \int p dx has units of \text{length}^2 \ \text{mass} \ \text{time}^{-1} so we need to remove these by dividing by \hbar which has the same units, as the function \psi(x) in quantum mechanics is dimensionless). We then have

\text{ln} \psi = \frac{1}{\hbar}\int p\mathrm{d}x

and differentiating both sides gives

\frac{\psi^{\prime}}{\psi} = \frac{1}{\hbar} p

so

p^2 = \hbar^2 \big(\frac{\psi^{\prime}}{\psi}\big)^2

Therefore we can write the kinetic energy as

K = \frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2

and putting this into the conservation of mechanical energy equation gives

\frac{\hbar^2}{2m} \big(\frac{\psi^{\prime}}{\psi}\big)^2 + U = E

\iff

\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2 = 0

We now find the stationary path followed by the particle with respect to the functional

T[\psi] = \int_{-\infty}^{\infty} M(x, \psi, \psi^{\prime}) \mathrm{d}x = \int_{-\infty}^{\infty}  \big(\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2\big)\mathrm{d}x

The Euler-Lagrange equation for this Calculus of Variations problem is

\frac{\partial M}{\partial \psi} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial M}{\partial \psi^{\prime}}\big) = 0

and this is Schrödinger’s equation in disguise! We have

\frac{\partial M}{\partial \psi} = 2(U - E)\psi

\frac{\partial M}{\partial \psi^{\prime}} = \frac{\hbar^2}{m} \psi^{\prime}

and

\frac{\mathrm{d}}{\mathrm{d}x} \big(\frac{\partial M}{\partial \psi^{\prime}}\big) = \frac{\hbar^2}{m} \psi^{\prime \prime}

so substituting these into the Euler-Lagrange equation we get

2(U - E) \psi - \frac{\hbar^2}{m} \psi^{\prime \prime} = 0

\iff

-\frac{\hbar^2}{2m} \frac{\mathrm{d}^2 \psi}{\mathrm{d} x^2} + U \psi = E \psi

and this is the (time-independent) Schrödinger equation for a particle of mass m with fixed total energy E in a potential U(x) on the line -\infty < x < \infty.

Changing variables in the square of the angular momentum operator

A particle of mass m with position vector \mathbf{r} and velocity \mathbf{v} (with respect to some specified origin) has a linear momentum vector \mathbf{p} = m \mathbf{v} and angular momentum vector

\mathbf{L} = \mathbf{r} \times \mathbf{p} = m \mathbf{r} \times \mathbf{v}

where \times is the vector product operation. The magnitude of the angular momentum vector is L = rmv\sin \theta where \theta is the angle between \mathbf{r} and \mathbf{v}. The direction of \mathbf{L} is given by the right-hand rule when the vectors \mathbf{r} and \mathbf{v} are placed with their tails at the same point: one curls the fingers of the right hand in the direction of rotation of \mathbf{r} into \mathbf{v} and the thumb then points in the direction of \mathbf{L}.

One can find the components of \mathbf{L} in the x, y and z directions in Cartesian coordinates using

\mathbf{L} = \mathbf{r} \times \mathbf{p}

= \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & z \\ \ \\p_x & p_y & p_z \end{vmatrix}

= (yp_z - zp_y)\mathbf{i} + (zp_x - xp_z)\mathbf{j} + (xp_y - yp_x)\mathbf{k}

Therefore the components of \mathbf{L} in Cartesian coordinates are

L_x = yp_z - zp_y

L_y = zp_x - xp_z

L_z = xp_y - yp_x

In classical mechanics the angular momentum magnitude L can take a continuum of values but in quantum mechanics only certain discrete values for L are permissible. Furthermore, the linear momentum vector \mathbf{p} appearing in \mathbf{L} must obey Heisenberg’s uncertainty principle in each direction, i.e.,

\Delta x \Delta p_x \geq \frac{\hbar}{2}

in the x-direction and similarly for the y and z directions. These features of quantized variables like \mathbf{p} and \mathbf{L} make it necessary to do calculations with them in quantum mechanics using their operator representations. (For example, on the quantum scale one cannot calculate the expectation of momentum in the x-direction using an integral involving momentum as a function of x because no such function can exist: Heisenberg’s uncertainty principle prevents accurate knowledge of momentum when the value of x is known exactly. One must therefore use the operator representation of momentum rather than momentum as some function of x in order to calculate the expectation). It is a basic postulate of quantum mechanics that every observable quantity characterising a physical system can be represented by a quantum mechanical operator obtained by expressing the quantity in terms of \mathbf{r} and \mathbf{p} and then replacing the vector \mathbf{p} by - i \hbar \nabla where

\nabla = \mathbf{i} \frac{\partial}{\partial x} + \mathbf{j} \frac{\partial}{\partial y} + \mathbf{k} \frac{\partial}{\partial z}

and its components p_x, p_y and p_z by

p_x = - i \hbar \frac{\partial}{\partial x}

p_y = - i \hbar \frac{\partial}{\partial y}

p_z = - i \hbar \frac{\partial}{\partial z}

Taking this on board we can then write \mathbf{L} and L_x, L_y and L_z in quantum mechanical operator form in Cartesian coordinates as

\mathbf{L} = - i \hbar (\mathbf{r} \times \nabla)

L_x = - i \hbar \big( y \frac{\partial}{\partial z} - z \frac{\partial}{\partial y} \big)

L_y = - i \hbar \big( z \frac{\partial}{\partial x} - x \frac{\partial}{\partial z} \big)

L_x = - i \hbar \big( x \frac{\partial}{\partial y} - y \frac{\partial}{\partial x} \big)

In order to perform some calculations involving Schrödinger’s equation I needed to employ the square of the quantum mechanical angular momentum operator L^2, but in spherical polar coordinates rather than Cartesian coordinates, where

L^2 = L_x^2 + L_y^2 + L_z^2

I used the matrix calculus approach in my previous note to achieve the necessary change of variables in L^2. In the present note I want to record the details of this calculation as I have never seen this approach used elsewhere. (This change of variables can also be done using a scale factor method based on vector calculus which I will not go into here).

As in my previous note we begin the matrix calculus approach with the standard conversion equations for spherical polar coordinates:

x = r \sin \theta \cos \phi

y = r \sin \theta \sin \phi

z = r \cos \theta

Differentiating with respect to the vector (r, \theta, \phi) we get

\begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \\ \ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \\ \ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \\ \ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \\ \ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix}

We can then use the matrix version of the chain rule to write

\begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \\ \ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \\ \ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \\ \ \frac{\partial F}{\partial y} \\ \\ \ \frac{\partial F}{\partial z} \end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \\ \ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \\ \ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \\ \ \frac{\partial F}{\partial y} \\ \\ \ \frac{\partial F}{\partial z} \end{bmatrix}

We can solve this system by inverting the coefficient matrix to get

\begin{bmatrix} \frac{\partial F}{\partial x}\\ \\ \ \frac{\partial F}{\partial y} \\ \\ \ \frac{\partial F}{\partial z}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\\ \\ \ \sin \theta \sin \phi & \frac{\cos \theta \sin \theta}{r} & \frac{\cos \phi}{r \sin \theta}\\ \\ \ \cos \theta & -\frac{\sin \theta}{r} & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi} \end{bmatrix}

Using the equations in this system together with the standard conversion equations we then have

y \frac{\partial F}{\partial z} = r \sin \theta \sin \phi \big( \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta} \big) = r \sin \theta \cos \theta \sin \phi \frac{\partial F}{\partial r} - \sin^2 \theta \sin \phi \frac{\partial F}{\partial \theta}

and

z \frac{\partial F}{\partial y} = r \cos \theta \big( \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)

= r \cos \theta \sin \theta \sin \phi \frac{\partial F}{\partial r} + \cos^2 \theta \sin \phi \frac{\partial F}{\partial \theta} + \frac{\cos \theta \cos \phi}{\sin \theta}\frac{\partial F}{\partial \phi}

Subtracting the second expression from the first and ignoring the F in the numerators of the partial derivatives we can then write the angular momentum operator in the x-direction in terms of spherical polar coordinates as

L_x = - i \hbar \big( y \frac{\partial}{\partial z} - z \frac{\partial}{\partial y} \big)

= - i \hbar \big( - \cot \theta \cos \phi \frac{\partial}{\partial \phi} - \sin \phi \frac{\partial}{\partial \theta} \big)

= i \hbar \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} + \sin \phi \frac{\partial}{\partial \theta} \big)

Similarly we have

z \frac{\partial F}{\partial x} = r \cos \theta \big( \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)

= r \cos \theta \sin \theta \cos \phi \frac{\partial F}{\partial r} + \cos^2 \theta \cos \phi \frac{\partial F}{\partial \theta} - \frac{\cos \theta \sin \phi}{\sin \theta}\frac{\partial F}{\partial \phi}

and

x \frac{\partial F}{\partial z} = r \sin \theta \cos \phi \big( \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta} \big) = r \sin \theta \cos \theta \cos \phi \frac{\partial F}{\partial r} - \sin^2 \theta \cos \phi \frac{\partial F}{\partial \theta}

Therefore

L_y = - i \hbar \big( z \frac{\partial}{\partial x} - x \frac{\partial}{\partial z} \big)

= - i \hbar \big(\cos \phi \frac{\partial}{\partial \theta} - \cot \theta \sin \phi \frac{\partial}{\partial \phi} \big)

Finally, we have

x \frac{\partial F}{\partial y} = r \sin \theta \cos \phi \big( \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)

= r \sin^2 \theta \cos \phi \sin \phi \frac{\partial F}{\partial r} + \sin \theta \cos \theta \sin \phi \cos \phi \frac{\partial F}{\partial \theta} + \cos^2 \phi \frac{\partial F}{\partial \phi}

and

y \frac{\partial F}{\partial x} = r \sin \theta \sin \phi \big( \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)

= r \sin^2 \theta \cos \phi \sin \phi \frac{\partial F}{\partial r} + \sin \theta \cos \theta \sin \phi \cos \phi \frac{\partial F}{\partial \theta} - \sin^2 \phi \frac{\partial F}{\partial \phi}

Therefore

L_z = - i \hbar \big( x \frac{\partial}{\partial y} - y \frac{\partial}{\partial x} \big)

= - i \hbar \frac{\partial}{\partial \phi}

Having obtained the components of \mathbf{L} in spherical polar coordinates we can now finally calculate the operator representation of L^2 as

L^2 = L_x^2 + L_y^2 + L_z^2

= \big[ i \hbar \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} + \sin \phi \frac{\partial}{\partial \theta} \big) \big]^2 + \big[ - i \hbar \big(\cos \phi \frac{\partial}{\partial \theta} - \cot \theta \sin \phi \frac{\partial}{\partial \phi} \big) \big]^2 + \big[ - i \hbar \frac{\partial}{\partial \phi} \big]^2

= - \hbar^2 \big[ \sin^2 \phi \frac{\partial^2}{\partial \theta^2} + \big(\sin \phi \frac{\partial}{\partial \theta} \big) \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} \big) + \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} \big)\big(\sin \phi \frac{\partial}{\partial \theta} \big) + \cot^2 \theta \cos^2 \phi \frac{\partial^2}{\partial \phi^2} \big]

- \hbar^2 \big[ \cos^2 \phi \frac{\partial^2}{\partial \theta^2} - \big(\cos \phi \frac{\partial}{\partial \theta} \big) \big(\cot \theta \sin \phi \frac{\partial}{\partial \phi} \big) - \big(\cot \theta \sin \phi \frac{\partial}{\partial \phi} \big)\big(\cos \phi \frac{\partial}{\partial \theta} \big) + \cot^2 \theta \sin^2 \phi \frac{\partial^2}{\partial \phi^2} \big]

- \hbar^2 \frac{\partial^2}{\partial \phi^2}

= - \hbar^2 \big( \frac{\partial^2}{\partial \theta^2} + \cot^2 \theta \frac{\partial^2}{\partial \phi^2} + \frac{\partial^2}{\partial \phi^2} - \sin \phi \frac{1}{\sin^2 \theta} \cos \phi \frac{\partial}{\partial \phi} + \cot \theta \cos^2 \phi \frac{\partial}{\partial \theta}

+ \cos \phi \frac{1}{\sin^2 \theta} \sin \phi \frac{\partial}{\partial \phi} + \cot \theta \sin^2 \phi \frac{\partial}{\partial \theta}\big)

= - \hbar^2 \big( \frac{\partial^2}{\partial \theta^2} + \cot \theta \frac{\partial}{\partial \theta} + \cot^2 \theta \frac{\partial^2}{\partial \phi^2} + \frac{\partial^2}{\partial \phi^2}\big)

= - \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)

Matrix calculus approach to changing variables in Laplace’s equation

In three-dimensional rectangular coordinates, the partial differential equation known as Laplace’s equation takes the form

\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} = 0

This equation is applicable to a wide range of problems in physics but it is often necessary to make a change of variables from rectangular to spherical polar coordinates in order to better match the spherical symmetry of particular contexts. One can work out this change of variables using a scale factor method based on the formula for the Laplacian of a function F in general orthogonal curvilinear coordinates:

\nabla^2 F = \frac{1}{h_1 h_2 h_3}\bigg[ \frac{\partial }{\partial x_1}\big( \frac{h_2 h_3}{h_1}\frac{\partial F}{\partial x_1} \big) + \frac{\partial }{\partial x_2}\big( \frac{h_1 h_3}{h_2}\frac{\partial F}{\partial x_2} \big) + \frac{\partial }{\partial x_3}\big( \frac{h_1 h_2}{h_3}\frac{\partial F}{\partial x_3} \big) \bigg]

This formula is derived using basic results from vector calculus and allows Laplace’s equation to be easily expressed in any orthogonal coordinate system once the form of the Euclidean metric is known in that coordinate system. (Orthogonal coordinate systems are those for which the coordinate basis vectors are always orthogonal at each point). The Euclidean metric in general orthogonal coordinates x_1, x_2, x_3 will take the form

ds^2 = h_1^2 dx_1^2 + h_2^2 dx_2^2 + h_3^2 dx_3^2

The coefficients h_1, h_2 and h_3 are the scale factors appearing in the Laplacian formula. In the metric, these convert the coordinate differentials dx_i into lengths h_i dx_i.

In the case of rectangular coordinates we have x_1 = x, x_2 = y, x_3 = z and h_1 = h_2 = h_3 = 1 so the Euclidean metric and the Laplacian formula reduce to the familiar forms

ds^2 = dx^2 + dy^2 + dz^2

and

\nabla^2 F = \frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2}

The scale factors are all equal to 1 in this case since the coordinate differentials are already in units of length. In the case of spherical polar coordinates (an orthogonal coordinate system) we can find the scale factors and hence the Laplacian by using the standard conversion equations

x = r \sin \theta \cos \phi

y = r \sin \theta \sin \phi

z = r \cos \theta

Taking the differentials of these, squaring them and adding them we find that the Euclidean metric in spherical polar coordinates takes the form

ds^2 = dr^2 + r^2 d \theta^2 + r^2 \sin^2 \theta d \phi^2

The scale factors in this case are therefore h_1 = 1, h_2 = r and h_3 = r \sin \theta. Putting these into the Laplacian formula we immediately get

\nabla^2 F = \frac{1}{r^2 \sin \theta} \bigg[ \frac{\partial }{\partial r}\big( \frac{r^2 \sin \theta}{1}\frac{\partial F}{\partial r} \big) + \frac{\partial }{\partial \theta}\big( \frac{r \sin \theta}{r}\frac{\partial F}{\partial \theta} \big) + \frac{\partial }{\partial \phi}\big( \frac{r}{r \sin \theta}\frac{\partial F}{\partial \phi} \big) \bigg]

= \frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2}

Therefore Laplace’s equation in spherical polar coordinates is

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2} = 0

I recently had occasion to use Laplace’s equation in spherical polar coordinates again and this time I decided to have a go at justifying the change of variables to myself using a matrix calculus approach instead of the above scalar factor method. This turned out to be a laborious but interesting process. In the present note I want to record my calculations using the matrix calculus approach in detail as I have never seen something like this elsewhere. The advantage of the matrix calculus approach is that it works for any coordinate system irrespective of whether it is orthogonal or not.

In the matrix calculus approach we use the standard conversion equations for spherical polar coordinates

x = r \sin \theta \cos \phi

y = r \sin \theta \sin \phi

z = r \cos \theta

to work out the coefficient matrix in the system

\begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \\ \ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \\ \ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \\ \ \frac{\partial F}{\partial y} \\ \\ \ \frac{\partial F}{\partial z} \end{bmatrix}

\big(which of course is equivalent to the set of equations

\frac{\partial F}{\partial r} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial r} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial r} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial r}

\frac{\partial F}{\partial \theta} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial \theta} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial \theta} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial \theta}

\frac{\partial F}{\partial \phi} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial \phi} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial \phi} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial \phi} \big)

We get the system

\begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \\ \ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \\ \ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \\ \ \frac{\partial F}{\partial y} \\ \\ \ \frac{\partial F}{\partial z} \end{bmatrix}

We can solve this system either by inverting the coefficient matrix or by using Cramer’s rule. Using Cramer’s rule we get

\frac{\partial F}{\partial x} = \frac{\begin{vmatrix} \frac{\partial F}{\partial r} & \sin \theta \sin \phi & \cos \theta\\ \\ \ \frac{\partial F}{\partial \theta} & r \cos \theta \sin \phi & - r \sin \theta\\ \\ \ \frac{\partial F}{\partial \phi} & r \sin \theta \cos \phi & 0\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \\ \ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \\ \ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \sin^2 \theta \cos \phi \frac{\partial F}{\partial r} + r \cos \theta \sin \theta \cos \phi \frac{\partial F}{\partial \theta} - r \sin \phi \frac{\partial F}{\partial \phi} }{r^2 \sin \theta}
\
= \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}

and similarly

\frac{\partial F}{\partial y} = \frac{\begin{vmatrix} \sin \theta \cos \phi & \frac{\partial F}{\partial r} & \cos \theta\\ \\ \ r \cos \theta \cos \phi & \frac{\partial F}{\partial \theta} & - r \sin \theta\\ \\ \ - r \sin \theta \sin \phi & \frac{\partial F}{\partial \phi} & 0\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \\ \ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \\ \ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \sin^2 \theta \sin \phi \frac{\partial F}{\partial r} + r \cos \theta \sin \theta \sin \phi \frac{\partial F}{\partial \theta} + r \cos \phi \frac{\partial F}{\partial \phi} }{r^2 \sin \theta}
\
= \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}

and

\frac{\partial F}{\partial z} = \frac{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \frac{\partial F}{\partial r}\\ \\ \ r \cos \theta \cos \phi & r \cos \theta \sin \phi & \frac{\partial F}{\partial \theta}\\ \\ \ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & \frac{\partial F}{\partial \phi}\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \\ \ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \\ \ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \cos \theta \sin \theta \frac{\partial F}{\partial r} - r \sin^2 \theta \frac{\partial F}{\partial \theta}}{r^2 \sin \theta}
\
= \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta}

We can write these solutions more concisely in matrix form as

\begin{bmatrix} \frac{\partial F}{\partial x}\\ \\ \ \frac{\partial F}{\partial y} \\ \\ \ \frac{\partial F}{\partial z}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\\ \\ \ \sin \theta \sin \phi & \frac{\cos \theta \sin \theta}{r} & \frac{\cos \phi}{r \sin \theta}\\ \\ \ \cos \theta & -\frac{\sin \theta}{r} & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi} \end{bmatrix}

The coefficient matrix here is, of course, the inverse of the coefficient matrix in the original system.

To find the second-order partials \frac{\partial^2F}{\partial x^2}, \frac{\partial^2F}{\partial y^2} and \frac{\partial^2F}{\partial z^2}, let G \equiv \frac{\partial F}{\partial x}, H \equiv \frac{\partial F}{\partial y} and I \equiv \frac{\partial F}{\partial z}. Then we need to find \frac{\partial G}{x}, \frac{\partial H}{\partial y} and \frac{\partial I}{\partial z}. We can use the first, second and third equations respectively in the above matrix system to find these, with F replaced by G, H and I respectively. Thus,

\frac{\partial G}{\partial x} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial G}{\partial r}\\ \\ \ \frac{\partial G}{\partial \theta} \\ \\ \ \frac{\partial G}{\partial \phi} \end{bmatrix}

\frac{\partial H}{\partial y} = \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial H}{\partial r}\\ \\ \ \frac{\partial H}{\partial \theta} \\ \\ \ \frac{\partial H}{\partial \phi} \end{bmatrix}

\frac{\partial I}{\partial z} = \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix} \begin{bmatrix} \frac{\partial I}{\partial r}\\ \\ \ \frac{\partial I}{\partial \theta} \end{bmatrix}

But differentiating

G \equiv \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi} \end{bmatrix}

with respect to r, \theta and \phi we get

\begin{bmatrix} \frac{\partial G}{\partial r}\\ \\ \ \frac{\partial G}{\partial \theta} \\ \\ \ \frac{\partial G}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \\ \ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \\ \ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix}  \begin{bmatrix} \sin \theta \cos \phi\\ \\ \ \frac{\cos \theta \cos \phi}{r} \\ \\ \ -\frac{\sin \phi}{r \sin \theta} \end{bmatrix} + \begin{bmatrix} 0 & -\frac{\cos \theta \cos \phi}{r^2} & \frac{\sin \phi}{r^2 \sin \theta}\\ \\ \ \cos \theta \cos \phi & -\frac{\sin \theta \cos \phi}{r} & \frac{\sin \phi}{r \sin^2 \theta \cos \theta}\\ \\ \ -\sin \theta \sin \phi & -\frac{\cos \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi} \end{bmatrix}

Similarly, differentiating

H \equiv \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi} \end{bmatrix}

with respect to r, \theta and \phi we get

\begin{bmatrix} \frac{\partial H}{\partial r}\\ \\ \ \frac{\partial H}{\partial \theta} \\ \\ \ \frac{\partial H}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \\ \ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \\ \ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \sin \phi\\ \\ \ \frac{\cos \theta \sin \phi}{r} \\ \\ \ \frac{\cos \phi}{r \sin \theta} \end{bmatrix} + \begin{bmatrix} 0 & -\frac{\cos \theta \sin \phi}{r^2} & -\frac{\cos \phi}{r^2 \sin \theta}\\ \\ \ \cos \theta \sin \phi & -\frac{\sin \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin^2 \theta \cos \theta}\\ \\ \ \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi} \end{bmatrix}

and differentiating

I \equiv \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \end{bmatrix}

with respect to r and \theta we get

\begin{bmatrix} \frac{\partial I}{\partial r}\\ \\ \ \frac{\partial I}{\partial \theta} \end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta}\\ \\ \ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} \end{bmatrix} \begin{bmatrix} \cos \theta \\ \\ \ -\frac{\sin \theta}{r} \end{bmatrix} + \begin{bmatrix} 0 & \frac{\sin \theta}{r^2}\\ \\ \ -\sin \theta & -\frac{\cos \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \end{bmatrix}

Therefore we have

\frac{\partial G}{\partial x} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \\ \ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \\ \ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \cos \phi\\ \\ \ \frac{\cos \theta \cos \phi}{r} \\ \\ \ -\frac{\sin \phi}{r \sin \theta} \end{bmatrix}

+ \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} 0 & -\frac{\cos \theta \cos \phi}{r^2} & \frac{\sin \phi}{r^2 \sin \theta}\\ \\ \ \cos \theta \cos \phi & -\frac{\sin \theta \cos \phi}{r} & \frac{\sin \phi}{r \sin^2 \theta \cos \theta}\\ \\ \ -\sin \theta \sin \phi & -\frac{\cos \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi} \end{bmatrix}

and similarly

\frac{\partial H}{\partial y} = \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \\ \ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \\ \ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \sin \phi\\ \\ \ \frac{\cos \theta \sin \phi}{r} \\ \\ \ \frac{\cos \phi}{r \sin \theta} \end{bmatrix}

+ \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} 0 & -\frac{\cos \theta \sin \phi}{r^2} & -\frac{\cos \phi}{r^2 \sin \theta}\\ \\ \ \cos \theta \sin \phi & -\frac{\sin \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin^2 \theta \cos \theta}\\ \\ \ \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \\ \\ \ \frac{\partial F}{\partial \phi} \end{bmatrix}

and

\frac{\partial I}{\partial z} = \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta}\\ \\ \ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} \end{bmatrix} \begin{bmatrix} \cos \theta \\ \\ \ -\frac{\sin \theta}{r} \end{bmatrix} + \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix}\begin{bmatrix} 0 & \frac{\sin \theta}{r^2}\\ \\ \ -\sin \theta & -\frac{\cos \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \\ \ \frac{\partial F}{\partial \theta} \end{bmatrix}

Laplace’s equation is given by \frac{\partial G}{\partial x} + \frac{\partial H}{\partial y} + \frac{\partial I}{\partial z} = 0 so we need to work out the three matrix equations above and sum them. The result will be a sum involving the nine partials \frac{\partial^2 F}{\partial r^2}, \frac{\partial^2F}{\partial \theta^2}, \frac{\partial^2F}{\partial \phi^2}, \frac{\partial^2F}{\partial r \partial \theta}, \frac{\partial^2F}{\partial r \partial \phi}, \frac{\partial^2F}{\partial \theta \phi}, \frac{\partial F}{\partial r}, \frac{\partial F}{\partial \theta} and \frac{\partial F}{\partial \phi}. I found that the most convenient way to work out the sum was by working out the coefficients of each of these nine partials individually in turn. The result is

\frac{\partial G}{\partial x} + \frac{\partial H}{\partial y} + \frac{\partial I}{\partial z} =

\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} =

\big \{\sin^2 \theta \cos^2 \phi + \sin^2 \theta \sin^2 \phi + \cos^2 \theta \big \} \frac{\partial^2 F}{\partial r^2}

+ \big \{ \frac{\cos^2 \theta \cos^2 \phi}{r^2} + \frac{\cos^2 \theta \sin^2 \phi}{r^2} + \frac{\sin^2 \theta}{r^2} \big \} \frac{\partial^2 F}{\partial \theta^2}

+ \big \{ \frac{\sin^2 \phi}{r^2 \sin^2 \theta} + \frac{\cos^2 \phi}{r^2 \sin^2 \theta}\big \} \frac{\partial^2 F}{\partial \phi^2}

+ \big \{ \frac{\sin \theta \cos \theta \cos^2 \phi}{r} + \frac{\sin \theta \cos \theta \sin^2 \phi}{r} - \frac{\sin \theta \cos \theta}{r}\big \} \frac{\partial^2 F}{\partial r \partial \theta}

+ \big \{ -\frac{\cos \phi \sin \phi}{r} + \frac{\cos \phi \sin \phi}{r} \big \} \frac{\partial^2 F}{\partial r \partial \phi}

+ \big \{ -\frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} -\frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} + \frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} + \frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} \big \} \frac{\partial^2 F}{\partial \theta \partial \phi}

+ \big \{ \frac{\cos^2 \theta \cos^2 \phi}{r} + \frac{\sin^2 \phi}{r} + \frac{\cos^2 \theta \sin^2 \phi}{r} + \frac{\cos^2 \phi}{r} + \frac{\sin^2 \theta}{r}\big \} \frac{\partial F}{\partial r}

+ \big \{ -\frac{2\sin \theta \cos \theta \cos^2 \phi}{r^2} + \frac{\cos \theta \sin^2 \phi}{r^2 \sin \theta} -\frac{2\sin \theta \cos \theta \sin^2 \phi}{r^2} + \frac{\cos \theta \cos^2 \phi}{r^2 \sin \theta} + \frac{2\cos \theta \sin \theta}{r^2} \big \} \frac{\partial F}{\partial \theta}

+ \big \{ \frac{\cos \phi \sin \phi}{r^2} + \frac{2 \cos \phi \sin \phi}{r^2 \sin^2 \theta} - \frac{\cos \phi \sin \phi}{r^2} - \frac{2 \cos \phi \sin \phi}{r^2 \sin^2 \theta} \big \} \frac{\partial F}{\partial \phi}

which simplifies to

\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} =

\frac{\partial^2 F}{\partial r^2} + \frac{1}{r^2} \frac{\partial^2 F}{\partial \theta^2} + \frac{1}{r^2 \sin^2 \theta} \frac{\partial^2 F}{\partial \phi^2} + \frac{2}{r} \frac{\partial F}{\partial r} + \frac{\cos \theta}{r^2 \sin \theta} \frac{\partial F}{\partial \theta}

= \frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2}

Therefore Laplace’s equation in spherical polar coordinates is

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2} = 0

as before.

On classifying singularities, with a quick look at a Schwarzschild black hole

In mathematics, a singularity is a point at which a mathematical object, e.g., a function, is not defined or behaves badly in some way. Singularities can be isolated, e.g., removable singularities, poles and essential singularities, or nonisolated, e.g., branch cuts. For teaching purposes, I want to delve into some of the mathematical aspects of isolated singularities in this note using simple examples involving the complex sine function. (I will not consider nonisolated singularities in detail here). I will also very briefly look at how some singularities arise in the context of black hole physics in a short final section.

Definition: A function f has an isolated singularity at the point \alpha if f is analytic on a punctured open disc \{z: 0 < |z - \alpha| < r \}, where r > 0, but not at \alpha itself.

Note that a function f is analytic at a point \alpha if it is differentiable on a region containing \alpha. Strangely, a function can have a derivative at a point without being analytic there. For example, the function f(z) = |z|^2 has a derivative at z = 0 but at no other point, as can easily be verified using the Cauchy-Riemann equations. Therefore this function is not analytic at z = 0. Also note with regard to the definition of an isolated singularity that the function MUST be analytic on the whole of the punctured open disc for the singularity to be defined. For example, despite appearances, the function

f(z) = \frac{1}{\sqrt{z}}

does not have a singularity at z = 0 because it is impossible to define a punctured open disc centred at 0 on which f(z) is analytic (the function z \rightarrow \sqrt{z} is discontinuous everywhere on the negative real axis, so f(z) fails to be analytic there).

I find it appealing that all three types of isolated singularity (removable, poles and essential singularities) can be illustrated by using members of the following family of functions:

f(z) = \frac{\sin(z^m)}{z^n}

where m, n \in \mathbb{N}. For example, if m = n = 1 we get

f_1(z) = \frac{\sin(z)}{z}

which has a removable singularity at z = 0. If m = 1, n = 3 we get

f_2(z) = \frac{\sin(z)}{z^3}

which has a pole of order 2 at z = 0. Finally, if m = -1, n = 0 we get

f_3(z) = \sin\big( \frac{1}{z} \big)

which has an essential singularity at z = 0. In each of these three cases, the function is not analytic at z = 0 but is analytic on a punctured open disc with centre 0, e.g., \{z: 0 < |z| < 1\} or indeed \mathbb{C} - \{0\} (which can be thought of as a punctured disc with infinite radius). In what follows I will use these three examples to delve into structural definitions of the three types of singularity. I will then explore their classification using Laurent series expansions.

Structural definitions of isolated singularities

Removable singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a removable singularity at \alpha if there is a function g which is analytic at \alpha such that

f(z) = g(z) for 0 < |z - \alpha| < r

We can see that g extends the analyticity of f to include \alpha, so we say that g is an analytic extension of f to the circle

\{z: |z - \alpha| < r \}

With removable singularities we always have that \lim_{z \rightarrow \alpha} f(z) exists since

\lim_{z \rightarrow \alpha} f(z) = g(\alpha)

(this will not be true for the other types of singularity) and the name of this singularity comes from the fact that we can effectively `remove’ the singularity by defining f(\alpha) = g(\alpha).

To apply this to the function

f_1(z) = \frac{\sin(z)}{z}

we first observe that the Maclaurin series expansion of \sin(z) is

\sin(z) = z - \frac{z^3}{3!} + \frac{z^5}{5!} - \frac{z^7}{7!} + \cdots for z \in \mathbb{C}

Therefore we can write

f_1(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C} - \{0\}

If we then set

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

we see that g(z) extends the analyticity of f_1(z) to include z = 0. We also see that

\lim_{z \rightarrow 0} f_1(z) = g(0)

Therefore f_1(z) has a removable singularity at z = 0.

Poles of order k, k > 0

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a pole of order k at \alpha if there is a function g, analytic at \alpha with g(\alpha) \neq 0, such that

f(z) = \frac{g(z)}{(z - \alpha)^k} for 0 < |z - \alpha| < r

With poles of order k we always have that

f(z) \rightarrow \infty as z \rightarrow \alpha

(which distinguishes them from removable singularities)

and

\lim_{z \rightarrow \alpha} (z - \alpha)^k f(z)

exists and is nonzero (since \lim_{z \rightarrow \alpha} (z - \alpha)^k f(z) = g(\alpha) \neq 0).

To apply this to the function

f_2(z) = \frac{\sin(z)}{z^3}

we first observe that

f_2(z) = \frac{\sin(z)/z}{z^2} = \frac{g(z)}{z^2} for z \in \mathbb{C} - \{0\}

where g is the function

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

Since g(0) = 1 > 0, we see that f_2(z) behaves like \frac{1}{z^2} near z = 0 and

f_2(z) \rightarrow \infty as z \rightarrow 0

so the singularity at z = 0 is not removable. We also see that

\lim_{z \rightarrow 0} z ^2 f_2(z) = g(0) = 1

Therefore the function f_2(z) has a pole of order 2 at z = 0.

Essential singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has an essential singularity at \alpha if the singularity is neither removable nor a pole. Such a singularity cannot be removed in any way, including by mutiplying by any (z - \alpha)^k, hence the name.

With essential singularities we have that

\lim_{z \rightarrow \alpha} f(z)

does not exist, and f(z) does not tend to infinity as z \rightarrow \alpha.

To apply this to the function

f_3(z) = \sin\big( \frac{1}{z}\big)

we observe that if we restrict the function to the real axis and consider a sequence of points

z_n = \frac{2}{(2n + 1) \pi}

then we have that z_n \rightarrow 0 whereas

f_3(z_n) = \sin\big(\frac{(2n + 1) \pi}{2}\big) = (-1)^n

Therefore

\lim_{z \rightarrow 0} f_3(z)

does not exist, so the singularity is not removable, but it is also the case that

\lim_{z \rightarrow 0} f_3(z) \not \rightarrow \infty

so the singularity is not a pole. Since it is neither a removable singularity nor a pole, it must be an essential singularity.

Classification of isolated singularities using Laurent series

By Laurent’s Theorem, a function f which is analytic on an open annulus

A = \{z: 0 \leq r_1 < |z - \alpha| < r_2 \leq \infty \}

(shown in the diagram) can be represented as an extended power series of the form

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

= \cdots + \frac{a_{-2}}{(z - \alpha)^2} + \frac{a_{-1}}{(z - \alpha)} + a_0 + a_1 (z - \alpha) + a_2 (z - \alpha)^2 + \cdots

for z \in A, which converges at all points in the annulus. It is an `extended’ power series because it involves negative powers of (z - \alpha). (The part of the power series involving negative powers is often referred to as the singular part. The part involving non-negative powers is referred to as the analytic part). This extended power series representation is the Laurent series about \alpha for the function f on the annulus A. Laurent series are also often used in the case when A is a punctured open disc, in which case we refer to the series as the Laurent series about \alpha for the function f.

The Laurent series representation of a function on an annulus A is unique. We can often use simple procedures, such as finding ordinary Maclaurin or Taylor series expansions, to obtain an extended power series and we can feel safe in the knowledge that the power series thus obtained must be the Laurent series.

Laurent series expansions can be used to classify singularities by virtue of the following result: If a function f has a singularity at \alpha and if its Laurent series expansion about \alpha is

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

then

(a) f has a removable singularity at \alpha iff a_n = 0 for all n < 0;

(b) f has a pole of order k at \alpha iff a_n = 0 for all n < -k and a_{-k} \neq 0;

(c) f has an essential singularity at \alpha iff a_n \neq 0 for infinitely many n < 0.

To apply this to our three examples, observe that the function

f_1(z) = \frac{\sin(z)}{z}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z} = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots

for z \in \mathbb{C} - \{0\}. This has no non-zero coefficients in its singular part (i.e., it only has an analytic part) so the singularity is a removable one.

The function

f_2(z) = \frac{\sin(z)}{z^3}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z^3} = \frac{1}{z^2} - \frac{1}{3!} + \frac{z^2}{5!} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n = 0 for all n < -2 and a_{-2} \neq 0, so the singularity in this case is a pole of order 2.

Finally, the function

f_3(z) = \sin\big( \frac{1}{z} \big)

has a singularity at 0 and its Laurent series expansion about 0 is

\sin \big(\frac{1}{z} \big) = \frac{1}{z} - \frac{1}{3! z^3} + \frac{1}{5! z^5} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n \neq 0 for infinitely many n < 0 so the singularity here is an essential singularity.

Singularities in Schwarzschild black holes

One often hears about singularities in the context of black hole physics and I wanted to quickly look at singularities in the particular case of nonrotating black holes. A detailed investigation of the various singularities that appear in exact solutions of Einstein’s field equations was conducted in the 1960s and 1970s by Penrose, Hawking, Geroch and others. There is now a vast literature on this topic. The following discussion is just a quick look at how some ideas involving singularities appear in a simple black hole model.

The spacetime of a nonrotating spherical black hole is usually analysed using the Schwarzschild solution of the Einstein field equations for an isolated spherical mass m. In spherical coordinates this is the metric

\Delta \tau = \bigg[ \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2} \bigg\{\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)} + r^2(\Delta \theta)^2 + r^2 \sin^2 \theta (\Delta \phi)^2\bigg\} \bigg]^{1/2}

where

k = \frac{2mG}{c^2} and m is the mass of the spherically symmetric static object exterior to which the Schwarzschild metric applies. If we consider only radial motion (i.e., world lines for which \Delta \theta = \Delta \phi = 0) the Schwarzschild metric simplifies to

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2}\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)}

We can see that the \Delta r term in the metric becomes infinite at r = k so there is apparently a singularity here. However, this singularity is `removable’ by re-expressing the metric in a new set of coordinates, r and t^{\prime}, known as the Eddington-Finkelstein coordinates. The transformed metric has the form

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t^{\prime})^2 - \frac{2k \Delta t^{\prime} \Delta r}{cr} - \frac{(\Delta r)^2}{c^2}\big(1 + \frac{k}{r}\big)

which does not behave badly at r = k. In general relativity, this type of removable singularity is known as a coordinate singularity. Another example is the apparent singularity at the 90^{\circ} latitude in spherical coordinates, which disappears when a different coordinate system is used.

Since the term \big(1 - \frac{k}{r} \big) in the Schwarzschild metric becomes infinite at r = 0, it appears that we also have a singularity at this point. This is not a removable singularity and can in fact be recognised in terms of the earlier discussion above as a pole of order 1 (also called a simple pole).

Different branch cuts for the principal argument, log and square root functions

For teaching purposes, I was trying to find different ways of proving the familiar result that the complex square root function f(z) = \sqrt{z} is discontinuous everywhere on the negative real axis. As I was working on alternative proofs, it became very clear to me how sensitive all the proofs were to the particular definition of the principal argument I was using, namely that the principal argument \theta = \text{Arg}z is the unique argument of z satisfying -\pi < \theta \leq \pi. In a sense, this definition manufactures the discontinuity of the complex square root function on the negative real axis, because the principal argument function itself is discontinuous here: the principal argument of a sequence of points approaching the negative real axis from above will tend to \pi, whereas the principal argument of a sequence approaching the same point on the negative real axis from below will tend to -\pi. I realised that all the proofs I was coming up with were exploiting this discontinuity of the principal argument function. However, this particular choice of principal argument function is completely arbitrary. An alternative could be to say that the principal argument of z is the unique argument satisfying 0 \leq \theta < 2\pi which we can call \text{Arg}_{2\pi} z. The effect of this choice of principal argument function is to make the complex square root function discontinuous everywhere on the positive real axis! It turns out that we can choose an infinite number of different lines to be lines of discontinuity for the complex square root function, simply by choosing different definitions of the principal argument function. The same applies to the complex logarithm function. In this note I want to record some of my thoughts about this.

The reason for having to specify principal argument functions in the first place is that we need to make complex functions of complex variables single-valued rather than multiple-valued, to make them well-behaved with regard to operations like differentiation. Specifying a principal argument function in order to make a particular complex function single-valued is called choosing a branch of the function. If we specify the principal argument function to be f(z) = \text{Arg} z where -\pi < \text{Arg} z \leq \pi then we define the principal branch of the logarithm function to be

\text{Log} z = \text{log}_e |z| + i \text{Arg} z

for z \in \mathbb{C} - \{0\}, and the principal branch of the square root function to be

z^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z \big)

for z \in \mathbb{C} with z \neq 0.

If we define the functions \text{Log} z and z^{\frac{1}{2}} in this way they will be single-valued, but the cost of doing this is that they will not be continuous on the whole of the complex plane (essentially because of the discontinuity of the principal argument function, which both functions `inherit’). They will be discontinuous everywhere on the negative real axis. The negative real axis is known as a branch cut for these functions. Using this terminology, what I want to explore in this short note is the fact that different choices of branch for these functions will result in different branch cuts for them.

To begin with, let’s formally prove the discontinuity of the principal argument function f(z) = \text{Arg} z, z \neq 0, and then see how this discontinuity is `inherited’ by the principal logarithm and square root functions. For the purposes of the proof we can consider the sequence of points

z_n = |\alpha| \text{e}^{(-\pi + 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x < 0 \}

Clearly, as n \rightarrow \infty, we have z_n \rightarrow -|\alpha| = \alpha. However,

f(z_n) = \text{Arg} \big( |\alpha| \text{e}^{(-\pi + 1/n)i}\big)

= -\pi + \frac{1}{n}

\rightarrow -\pi

whereas

f(\alpha) = \text{Arg}\big(|\alpha| \text{e}^{\pi i} \big) = \pi

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal argument function is discontinuous at all points on the negative real axis.

Now consider how the following proof of the discontinuity of f(z) = z^{\frac{1}{2}} on the negative real axis depends crucially on the discontinuity of \text{Arg} z. We again consider the sequence of points

z_n = |\alpha| \text{e}^{(-\pi + 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x < 0 \}

so that z_n \rightarrow -|\alpha| = \alpha. However,

f(z_n) = z_n^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |z_n| + \frac{1}{2} i \text{Arg} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |\alpha| + \frac{1}{2} i (- \pi + \frac{1}{n}) \big)

\rightarrow |\alpha|^{\frac{1}{2}} \text{e}^{-i \pi /2} = - i |\alpha|^{\frac{1}{2}}

whereas

f(\alpha) = \big( |\alpha| \text{e}^{i \pi}\big)^{\frac{1}{2}}

= |\alpha|^{\frac{1}{2}} \text{e}^{i \pi/2} = i |\alpha|^{\frac{1}{2}}

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal square root function is discontinuous at all points on the negative real axis.

Now suppose we choose a different branch for the principal logarithm and square root functions, say \text{Arg}_{2\pi} z which as we said earlier satisfies 0 \leq \text{Arg}_{2\pi} z < 2\pi. The effect of this is to change the branch cut of these functions to the positive real axis! The reason is that the principal argument function will now be discontinuous everywhere on the positive real axis, and this discontinuity will again be `inherited’ by the principal logarithm and square root functions.

To prove the discontinuity of the principal argument function f(z) = \text{Arg}_{2\pi} z on the positive real axis we can consider the sequence of points

z_n = \alpha \text{e}^{(2 \pi - 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x > 0 \}

We have z_n \rightarrow \alpha. However,

f(z_n) = \text{Arg}_{2\pi} \big(\alpha \text{e}^{(2\pi - 1/n)i}\big)

= 2\pi - \frac{1}{n}

\rightarrow 2\pi

whereas

f(\alpha) = \text{Arg}_{2\pi}(\alpha) = 0

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal argument function is discontinuous at all points on the positive real axis.

We can now again see how the following proof of the discontinuity of f(z) = z^{\frac{1}{2}} on the positive real axis depends crucially on the discontinuity of \text{Arg}_{2\pi} z there. We again consider the sequence of points

z_n = \alpha \text{e}^{(2\pi - 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x > 0 \}

so that z_n \rightarrow \alpha. However,

f(z_n) = z_n^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}e |z_n| + \frac{1}{2} i \text{Arg}_{2\pi} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |\alpha| + \frac{1}{2} i (2 \pi - \frac{1}{n}) \big)

\rightarrow \alpha^{\frac{1}{2}} \text{e}^{i 2 \pi /2} = - \alpha^{\frac{1}{2}}

whereas

f(\alpha) = \alpha^{\frac{1}{2}}

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal square root function is discontinuous at all points on the positive real axis.

There are infinitely many other branches to choose from. In general, if \tau is any real number, we can define the principal argument function to be f(z) = \text{Arg}_{\tau} z where

\tau \leq \text{Arg}_{\tau} < \tau + 2\pi

and this will give rise to a branch cut for the principal logarithm and square root functions consisting of a line emanating from the origin and containing all those points z such that \text{arg}(z) = \tau modulo 2\pi.

A note on the quaternion rotation operator

Sir William Rowan Hamilton famously discovered the key rules for quaternion algebra while walking with his wife past a bridge in Dublin in 1843. A plaque now commemorates this event. I needed to use the quaternion rotation operator recently and while digging around the literature on this topic I noticed that a lot of it is quite unclear and over-complicated. A couple of simple, yet vital, ideas, if they were spelt out, would make key results seem less mysterious but these never seem to be mentioned. The vector notation often used in this area also seems to over-complicate things. In this note I want to record some thoughts about the quaternion rotation operator, bringing out some key underlying ideas that (to me) make things seem far less mysterious.

Quaternions are hypercomplex numbers of the form

q = a + bi + cj + dk

In many ways (as I will show below) they can usefully be thought about using familiar ideas for two-dimensional complex numbers x + yi in which i \equiv \sqrt{-1}.

In the case of quaternions, the identities i^2 = j^2 = k^2 = ijk = -1 discovered by Hamilton determine all possible products of i, j and k. They imply a cyclic relationship when calculating their products (similar to that of the cross products of the three-dimensional basis vectors i, j, k, which is why authors often use these basis vectors when defining quaternions).

Taking products clockwise one obtains

ij = k
jk = i
ki = j

and taking products anticlockwise one obtains

ik = -j
kj = -i
ji = -k

This algebraic structure leads to some interesting differences from the familiar algebra of ordinary two-dimensional complex numbers, particularly non-commutativity of multiplication. Technically, one says that the set of all quaternions with the operations of addition and multiplication constitute a non-commutative division ring, i.e., every non-zero quaternion has an inverse and quaternion products are generally non-commutative.

I was interested to notice that one way in which this algebraic difference with ordinary complex numbers manifests itself is in taking the complex conjugate of products. With ordinary complex numbers z_1 = x_1 + y_1 i and z_2 = x_2 + y_2 i one obtains

\overline{z_1 \cdot z_2} = \overline{z_1} \cdot \overline{z_2}

since

z_1 \cdot z_2 = x_1 x_2 + (x_1 y_2 + x_2 y_1)i - y_1 y_2 = (x_1 x_2 - y_1 y_2) + (x_1 y_2 + x_2 y_1)i

and therefore

\overline{z_1 \cdot z_2} = (x_1 x_2 - y_1 y_2) - (x_1 y_2 + x_2 y_1)i

but this is the same as

\overline{z_1} \cdot \overline{z_2} = (x_1 - y_1 i)(x_2 - y_2 i) = x_1 x_2 - (x_1 y_2 + x_2 y_1)i - y_1 y_2

= (x_1 x_2 - y_1 y_2) - (x_1 y_2 + x_2 y_1)i

With the product of two quaternions q_1 and q_2 we get a different result:

\overline{q_1 \cdot q_2} = \overline{q_2} \cdot \overline{q_1}

In words, the complex conjugate of the product is the product of the complex conjugates in reverse order. To see this, let

q_1 = a_1 + b_1i + c_1j + d_1k

q_2 = a_2 + b_2i + c_2j + d_2k

Then

q_1 \cdot q_2 =

a_1a_2 + a_1b_2i + a_1c_2j + a_1d_2k

+ a_2b_1i - b_1b_2 + b_1c_2k - b_1d_2j

+ a_2c_1j - b_2c_1k - c_1c_2 + c_1d_2i

+ a_2d_1k + b_2d_1j - c_2d_1i - d_1d_2

=

(a_1a_2 - b_1b_2 - c_1c_2 - d_1d_2)

+ (a_1b_2 + a_2b_1 + c_1d_2 - c_2d_1)i

+ (a_1c_2 - b_1d_2 + a_2c_1 + b_2d_1)j

+ (a_1d_2 + b_1c_2 - b_2c_1 + a_2d_1)k

Therefore

\overline{q_1 \cdot q_2} =

(a_1a_2 - b_1b_2 - c_1c_2 - d_1d_2)

+ (c_2d_1 - a_1b_2 - a_2b_1 - c_1d_2)i

+ (b_1d_2 - a_1c_2 - a_2c_1 - b_2d_1)j

+ (b_2c_1 - a_1d_2 - b_1c_2 - a_2d_1)k

But this is the same as

\overline{q_2} \cdot \overline{q_1}

= (a_2 - b_2i - c_2j - d_2k)(a_1 - b_1i - c_1j - d_1k)

= a_1a_2 - a_2b_1i - a_2c_1j - a_2d_1k

- a_1b_2i - b_1b_2 + b_2c_1k - b_2d_1j

- a_1c_2j - b_1c_2k - c_1c_2 + c_2d_1i

- a_1d_2k + b_1d_2j - c_1d_2i - d_1d_2

=

(a_1a_2 - b_1b_2 - c_1c_2 - d_1d_2)

+ (c_2d_1 - a_1b_2 - a_2b_1 - c_1d_2)i

+ (b_1d_2 - a_1c_2 - a_2c_1 - b_2d_1)j

+ (b_2c_1 - a_1d_2 - b_1c_2 - a_2d_1)k

The complex conjugate is used to define the length |q| of a quaternion

q = a + bi + cj + dk

as

|q| = \sqrt{\overline{q} \cdot q}

= \sqrt{(a - bi - cj - dk)(a + bi + cj + dk)}

= \sqrt{a^2 + b^2 + c^2 + d^2}

To find the inverse q^{-1} of a quaternion we observe that

q \cdot q^{-1} = 1

so

\overline{q} \cdot q \cdot q^{-1} = \overline{q}

\iff |q|^2 \cdot q^{-1} = \overline{q}

\iff q^{-1} = \frac{\overline{q}}{|q|^2}

For a quaternion

q = a + bi + cj + dk

let

r \equiv \sqrt{b^2 + c^2 + d^2}

A key result that helps to clarify the literature on quaternion rotation is that

\frac{bi + cj + dk}{r} = \sqrt{-1}

This seems mysterious at first but can easily be verified by confirming that when the term on the left hand side is multiplied by itself, the result is -1:

\frac{1}{r^2}(bi + cj + dk)(bi + cj + dk)

= \frac{1}{r^2}(-b^2 + bck - bdj -cbk - c^2 + cdi + dbj - dci - d^2)

= \frac{-b^2 - c^2 - d^2}{r^2}

= -\frac{r^2}{r^2} = -1

This result means that any quaternion of the above form can be written as

q = a + r \big( \frac{bi + cj + dk}{r} \big)

= a + r \sqrt{-1}

This is just a familiar two-dimensional complex number! It therefore has an angle \theta associated with it, given by the equations

\cos \theta = \frac{a}{|q|}

\sin \theta = \frac{r}{|q|}

\tan \theta = \frac{r}{a}

We can express the quaternion in terms of this angle as

q = |q|(\cos \theta + \frac{bi + cj + dk}{r} \sin \theta)

If q is a unit quaternion (i.e., |q| = 1) we then get that

q = \cos \theta + \frac{bi + cj + dk}{r} \sin \theta

This is the form that is needed in the context of quaternion rotation. It turns out that for any unit quaternion of this form and for any vector (v_1, v_2, v_3) \in \mathbb{R} the operation

L_q(v_1, v_2, v_3) = q \cdot (v_1, v_2, v_3) \cdot \overline{q}

will result in a rotation of the vector (v_1, v_2, v_3) through an angle 2 \theta about the vector (b, c, d) as the axis of rotation. The direction of rotation is given by the familiar right-hand rule, i.e., the thumb of the right hand points in the direction of the vector (b, c, d) and the fingers then curl in the direction of rotation.

As an example, suppose we want to rotate the vector (0, 0, 1) through 90^{\circ} about the vector (0, 1, 0) in the sense of the right-hand rule.

Looking at the diagram above, we would expect the result to be the vector (1, 0, 0). Using the quaternion rotation operator to achieve this, we would specify

2 \theta = 90^{\circ} \implies \theta = 45^{\circ}

\cos \theta = \frac{1}{\sqrt{2}}

\sin \theta = \frac{1}{\sqrt{2}}

\frac{bi + cj + dk}{r} = \frac{0i + 1j + 0k}{1} = j

q = \frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}} j

\overline{q} = \frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}} j

v_1i + v_2j + v_3k = 0i + 0j + 1k = k

The resulting vector would be

L_q(v_1, v_2, v_3) = q \cdot (v_1i, v_2j, v_3k) \cdot \overline{q}

= \big(\frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}}j\big)(k)\big(\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}}j\big)

= \big(\frac{1}{\sqrt{2}}k + \frac{1}{\sqrt{2}}i\big)\big(\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}}j\big)

= \frac{1}{2}k + \frac{1}{2}i + \frac{1}{2}i - \frac{1}{2}k

= i

= 1i + 0j + 0k

This result is interpreted as the vector (1, 0, 0) which is exactly what we expected based on the diagram above.

Note that to achieve the same result through conventional matrix algebra we would have to use the rotation matrix

\begin{bmatrix} \cos \varphi & 0 & \sin \varphi\\ 0 & 1 & 0\\-\sin \varphi & 0 & \cos \varphi\end{bmatrix}

Setting \varphi = 90^{\circ} and applying this to the vector (0, 0, 1)^T we get

\begin{bmatrix} 0 & 0 & 1\\0 & 1 & 0\\-1 & 0 & 0\end{bmatrix} \begin{bmatrix}0\\0\\1\end{bmatrix}

= \begin{bmatrix}1\\0\\0\end{bmatrix}

This is the same result, but note that this approach is more cumbersome to implement because there are different rotation matrices for different axes of rotation and for different rotation conventions. The above rotation matrix happens to be the one required to achieve a rotation about the y-axis using the right-hand rule. In general, the correct rotation matrix would have to be computed each time to suit the particular rotation required. Once the angle of rotation and the axis of rotation are known, the quaternion rotation operator is easier to specify and implement.

A division algorithm for converting prime reciprocals into ternary numbers

For the purposes of some work I was doing involving ternary numbers, I became interested in finding a quick and easily programmable method for converting prime reciprocals into ternary representation. By trial and error I found a Euclidean-like division algorithm which works well, as illustrated by the following application to the calculation of the ternary representation of the prime reciprocal \frac{1}{7}:

3 = 0 \times 7 + 3

9 = 1 \times 7 + 2

6 = 0 \times 7 + 6

18 = 2 \times 7 + 4

12 = 1 \times 7 + 5

15 = 2 \times 7 + 1

The first equation always has 3 on the left hand side and the remainder in each equation is multiplied by 3 to obtain the left-hand side of the next equation. The procedure halts when a remainder is obtained which is equal to 1. From then on, the pattern of the coefficients of the prime denominator will repeat indefinitely. In the above example a remainder of 1 is obtained at the sixth step, and the pattern will repeat from then on, so looking at the coefficients of 7 we conclude that the ternary representation of \frac{1}{7} is 0.\dot{0}1021\dot{2} where the dot notation is used to indicate that the string of digits between the dots repeats indefinitely. This is completely analogous to the fact that the decimal, i.e., base-10, representation of \frac{1}{7} has the repeating cycle structure 0.\dot{1}4285\dot{7}. Note that the cycle length is 6 in the ternary representation of \frac{1}{7} precisely because 6 is the order of 3 modulo 7. (The order of an integer modulo a prime p is the lowest power of the integer which is congruent to 1 mod p. A well known theorem of Fermat guarantees that the order of any integer modulo a prime p is at most p - 1).

A further illustration is the computation of the ternary form of \frac{1}{13}, for which the division algorithm yields

3 = 0 \times 13 + 3

9 = 0 \times 13 + 9

27 = 2 \times 13 + 1

before the remainder repeats, giving the ternary representation 0.\dot{0}0\dot{2}. The cycle length for \frac{1}{13} is 3, which is due to the fact that 3 is the order of 3 mod 13.

In general, for a prime reciprocal \frac{1}{p}, the ternary cycle length will always be the order of 3 mod p, and the cycle will repeat from the first digit after the point.

To implement the above algorithm I wrote the following Python program which reads a list of primes from a text file (in this case, the first 100 primes greater than 3), then computes the ternary representation of the corresponding prime reciprocals, depositing the output in another text file. The output for each prime reciprocal is the first cycle of ternary digits (the cycle repeats indefinitely thereafter), as well as a statement of the cycle length.

A variant of Einstein’s partial differential equation for Brownian motion

In a series of papers from 1905, his `annus mirabilis’, Albert Einstein analysed Brownian motion and derived the following partial differential equation (known as the diffusion equation) from physical principles to describe the process:

\displaystyle D \frac{\partial^2 f}{\partial x^2} = \frac{\partial f}{\partial t}

This equation has as a solution the probability density function

\displaystyle f(x, t) = \frac{1}{\sqrt{4 \pi Dt}} \exp\bigg(-\frac{x^2}{4Dt}\bigg)

(see Einstein, A., 1956, Investigations on the Theory of Brownian Movement, New York: Dover, ISBN 978-0-486-60304-9, pp. 15-16).

Writing \displaystyle D = \frac{\sigma^2}{2} this formula can be rewritten as

\displaystyle f(x, t) = \frac{1}{\sqrt{2 \pi \sigma^2 t}} \exp\bigg(-\frac{x^2}{2 \sigma^2 t}\bigg)

This is the density of the \displaystyle N(0, \sigma^2 t) distribution of the increments of a Wiener process \displaystyle \{X(t): t \geq 0, X(0) = 0\}, also commonly referred to as Brownian motion. This continuous-time stochastic process is symmetric about zero, continuous, and has stationary independent increments, i.e., the change from time \displaystyle u to time \displaystyle u + t, given by the random variable \displaystyle X(u, u + t) = X(u + t) - X(u), has the same \displaystyle N(0, \sigma^2 t) probability distribution as the change from time \displaystyle 0 to time \displaystyle t, given by the random variable \displaystyle X(0, t) = X(t), and the change is also independent of the history of the process before time \displaystyle u.

A continuous-time stochastic process which is also symmetric about zero and continuous like the Wiener process, but which has non-stationary increments, has a probability density function of the form

\displaystyle f(x, t) = \frac{1}{\sqrt{2 \pi \sigma^2 t^2}} \exp\bigg(-\frac{x^2}{2 \sigma^2 t^2}\bigg)

The fact that time appears as a squared term in this formula rather than linearly is enough to destroy the increment-stationarity property of the Wiener process. This can be demonstrated by observing that increment-stationarity requires

\displaystyle X(u, u + t) \,{\buildrel d \over =}\, X(t)

and therefore by definition of \displaystyle X(u, u + t) we must also have

\displaystyle X(u + t) \,{\buildrel d \over =}\, X(u, u + t) + X(u) \,{\buildrel d \over =}\, X(t) + X(u) \sim N(0, \sigma^2 (t^2 + u^2))

but this not true here since

\displaystyle X(u + t) \sim N(0, \sigma^2(u + t)^2)

We have a contradiction and must therefore conclude that

\displaystyle X(u, u + t) \,{\buildrel d \over \neq}\, X(t)

which means that this alternative continuous-time process does not have the increment-stationarity property of the Wiener process. (Note that this kind of contradiction does not arise with the Wiener process: we have \displaystyle X(u, u + t) + X(u) \sim N(0, \sigma^2(t + u)) which is the same as the distribution of \displaystyle X(u + t)).

Is there a partial differential equation describing this alternative continuous-time process analogous to the partial differential equation derived by Einstein for the standard Brownian motion process? I did indeed find such a partial differential equation for the alternative process, as follows. Taking partial derivatives of the probability density function

\displaystyle f(x, t) = \frac{1}{\sqrt{2 \pi \sigma^2 t^2}} \exp\bigg(-\frac{x^2}{2 \sigma^2 t^2}\bigg)

we obtain

\displaystyle \frac{\partial f}{\partial t} = \frac{1}{\sqrt{2 \pi \sigma^2 t^2}} \exp\bigg(-\frac{x^2}{2 \sigma^2 t^2}\bigg)\bigg\{\frac{x^2}{\sigma^2 t^3} - \frac{1}{t}\bigg\}

\displaystyle \frac{\partial f}{\partial x} = \frac{1}{\sqrt{2 \pi \sigma^2 t^2}} \exp\bigg(-\frac{x^2}{2 \sigma^2 t^2}\bigg)\bigg\{-\frac{x}{\sigma^2 t^2}\bigg\}

\displaystyle \frac{\partial^2 f}{\partial x^2} = \frac{1}{\sqrt{2 \pi \sigma^2 t^2}} \exp\bigg(-\frac{x^2}{2 \sigma^2 t^2}\bigg)\bigg\{\frac{x^2}{(\sigma^2 t^2)^2} - \frac{1}{\sigma^2 t^2}\bigg\}

Comparing the expressions for \displaystyle \frac{\partial f}{\partial t} and \displaystyle \frac{\partial^2 f}{\partial x^2} we see that

\displaystyle \sigma^2 \frac{\partial^2 f}{\partial x^2} = \frac{1}{t}\frac{\partial f}{\partial t}

and this is the required variant of Einstein’s partial differential equation for the alternative continuous-time process.

I was intrigued to find that a slight generalisation of this framework makes it applicable to quantum wave-packet dispersion. To see this, let \displaystyle \sigma(t^2) = \sqrt{a + bt^2}> 0 where a and b are some parameters. Then the partial differential equation

\displaystyle b \frac{\partial^2 f}{\partial x^2} = \frac{1}{t}\frac{\partial f}{\partial t}

has as a solution the probability density function

\displaystyle f(x, t) = \frac{1}{\sqrt{2 \pi \sigma^2(t^2)}} \exp\bigg(-\frac{x^2}{2 \sigma^2(t^2)}\bigg)

as can be verified by comparing the partial derivatives

\displaystyle \frac{\partial f}{\partial t} = \frac{1}{\sqrt{2 \pi \sigma^2 (t^2)}} \exp\bigg(-\frac{x^2}{2 \sigma^2 (t^2)}\bigg)\bigg\{\frac{bx^2 t}{\sigma^4(t^2)} - \frac{bt}{\sigma^2(t^2)}\bigg\}

and

\displaystyle \frac{\partial^2 f}{\partial x^2} = \frac{1}{\sqrt{2 \pi \sigma^2(t^2)}} \exp\bigg(-\frac{x^2}{2 \sigma^2(t^2)}\bigg)\bigg\{\frac{x^2}{\sigma^4(t^2)} - \frac{1}{\sigma^2(t^2)}\bigg\}

(As a check, note that all of this reduces to the previously obtained differential equation and probability density function when \displaystyle a = 0 and \displaystyle b = \sigma^2). Now, in quantum mechanics a wave representation of a moving body is obtained as a wave-packet consisting of a superposition of individual plane waves of different wavelengths (or equivalently, different wave numbers k = \frac{2 \pi}{\lambda}) in the form

\displaystyle \psi(x, t) = \frac{1}{2\pi}\int_{-\infty}^{\infty} \Phi(k) e^{i(kx - i\hbar k^2 t/2m)}dk

where \displaystyle \Phi(k) is the Fourier transform of the x-space wavefunction \displaystyle \psi(x, t) at \displaystyle t = 0, i.e.,

\displaystyle \Phi(k) = \int_{-\infty}^{\infty} \psi(x', 0)e^{-ikx'}dx'

The wave-packet \displaystyle \psi(x, t) disperses over time and it has been shown that the probability density as a function of time of such a moving body as the wave-packet disperses, given by \displaystyle |\psi(x, t)|^2, always becomes Gaussian (irrespective of the original shape of the wave-packet) and has the form of the probability density function above, i.e.,

\displaystyle |\psi(x, t)|^2 = \frac{1}{\sqrt{2 \pi \sigma^2(t^2)}} \exp\bigg(-\frac{x^2}{2 \sigma^2(t^2)}\bigg)

See, for example, Mita, K, 2007, Dispersion of non-Gaussian free particle wave packets, Am. J. Phys. 75 (10), who derives an expression for \displaystyle |\psi(x, t)|^2 like the one above with \displaystyle x^2 replaced by \displaystyle (x - \gamma_1)^2 and

\displaystyle \sigma^2(t^2) = \frac{\gamma_2^2 + (\hbar t/m)^2}{2\gamma_2}

(see equations (15) to (18) on page 952 of the paper). Therefore the partial differential equation

\displaystyle b \frac{\partial^2 f}{\partial x^2} = \frac{1}{t}\frac{\partial f}{\partial t}

has as a solution the probability density of a moving body undergoing quantum wave-packet dispersion as time progresses (with b = \frac{(\hbar/m)^2}{2\gamma_2} in the above set-up).