Markov’s inequality and the Cramér-Chernoff bounding method

Markov’s inequality is a basic tool for putting upper bounds on tail probabilities. To derive it, suppose Y is a non-negative random variable. We want to put an upper bound on the tail probability P(Y > t) where t > 0. We do this by observing that

t \cdot 1_{\{Y \ge t\}} \leq Y \cdot 1_{\{Y \ge t\}} \qquad \qquad \qquad (1)

where 1_{\{Y \ge t\}} is the indicator function which takes the value 1 when Y \geq t and the value 0 otherwise. (It is obvious that (1) above holds, simply by the definition of this indicator function). Taking expectations of both sides of (1) we get

t \cdot P(Y \geq t) \leq E[Y \cdot 1_{\{Y \geq t\}}]

But

E[Y \cdot 1_{\{Y \geq t\}}] = E[Y | Y \geq t] P(Y \geq t)

\leq E[Y | Y \geq t] P(Y \geq t) + E[Y | Y < t] P(Y < t)

= E[Y]

(by the law of total probability), so we can write

t \cdot P(Y \geq t) \leq E[Y]

and so

P(Y \geq t) \leq \frac{E[Y]}{t} \qquad \qquad \qquad (2)

This is Markov’s inequality. It can be modified, and thereby made sharper for particular situations, by considering a non-decreasing and non-negative function \phi defined on an interval containing the values of Y. Then we have

P(\phi(Y) \geq \phi(t)) \leq \frac{E[\phi(Y)]}{\phi(t)} \qquad \qquad \qquad (3)

As an example of applying this modified version, note that we obtain Chebyshev’s inequality by using the function \phi(t) = t^2, and defining Y = Z - E[Z]. Putting these in (3) we get

P(|Z - E[Z]| \geq t) \leq \frac{Var[Z]}{t^2}

More generally, we can get a version of Markov’s inequality involving the Laplace transform of the random variable Z by specifying \phi(t) = e^{kt}, where k is a positive number. In this case, noting that P(\phi(Z) \geq \phi(t)) \equiv P(Z \geq t) since \phi is monotonically increasing, Markov’s inequality implies

P(Z \geq t) \leq \frac{E[e^{kZ}]}{e^{kt}} \qquad \qquad \qquad (4)

So, now the Laplace transform of Z appears in the upper bound.

The so-called Cramér-Chernoff bounding method determines the best possible bound for a tail probability that one can possibly obtain by using Markov’s inequality with an exponential function \phi(t) = e^{kt}. (This is one of the ways to understand intuitively how the entropy function in large deviation theory arises – see below). To explain this, begin by writing (4) above as

P(Z \geq t) \leq e^{-kt} E[e^{kZ}] \qquad \qquad \qquad (5)

Since this inequality holds for all k \geq 0, one may choose k to minimise the upper bound in (5). Define the log of the Laplace transform as

\lambda(k) = \ln E[e^{kz}] \qquad \qquad \qquad (6)

for all k \geq 0. Then we can write (5) as

P(Z \geq t) \leq e^{-(kt - \lambda(k))} \qquad \qquad \qquad (7)

Since we are free to replace the right-hand side of (7) by its minimum, we can do so, and we thus obtain Chernoff’s inequality

P(Z \geq t) \leq e^{-\Lambda^{\ast}(t)} \qquad \qquad \qquad (8)

where the right-hand side of

\Lambda^{\ast}(t) \equiv \sup_{k \geq 0} (kt - \lambda(k)) \qquad \qquad \qquad (9)

is known as the Cramér transform of (6). A straightforward argument, given in the Appendix below, shows that we can extend the supremum in (9) over all \mathbb{R} in the definition of the Cramér transform:

\Lambda^{\ast}(t) \equiv \sup_{\mathbb{R}} (kt - \lambda(k)) \qquad \qquad \qquad (10)

The expression on the right-hand side of (10) is known as the Fenchel-Legendre dual function of \lambda(k), and is exactly the Fenchel-Legendre transform that is used in large deviation theory. As discussed in the Appendix below, the Fenchel-Legendre transform will coincide with the Cramér transform at every t \geq E[Z], but will equal zero for t < E[Z].

If \lambda(k) is differentiable, the Cramér transform can be computed by differentiating kt - \lambda(k) with respect to k. The optimising value of k is found by setting the derivative with respect to k equal to zero, that is

\Lambda^{\ast}(t) \equiv (k^{\ast} t - \lambda(k^{\ast})) \qquad \qquad \qquad (11)

where k^{\ast} is such that

\lambda^{\prime}(k^{\ast}) = t \qquad \qquad \qquad (12)

The strict convexity of \lambda(k) implies that \lambda^{\prime} has an increasing inverse (\lambda^{\prime})^{-1}, so

k^{\ast} = (\lambda^{\prime})^{-1}(t) \qquad \qquad \qquad (13)

This formula can be used to compute the Cramér transform explicitly in simple cases. As an example, consider an exponential variate X with probability density

f(x) = \frac{1}{\mu} e^{-x/\mu} \qquad \qquad \qquad (14)

Then the log of the Laplace transform of the random variable is

\lambda(k) = \ln E[e^{kX}] = -\ln(1 + \mu k) \qquad \qquad \qquad (15)

Therefore

\lambda^{\prime}(k) = -\frac{\mu}{1 + \mu k}

and so

k^{\ast} = (\lambda^{\prime})^{-1}(s) = -\big(\frac{1}{s} + \frac{1}{\mu}\big) \qquad \qquad \qquad (16)

Appendix

The Cramér transform on the right-hand side of (9) is a non-negative function because it is identically zero for all t when k = 0 (since \lambda(0) = 0), so the supremum can always be chosen to be at least zero. Since the exponential function is convex, Jensen’s inequality applies to it, and we can write

e^{E[X]} \leq E[e^{X}]

Therefore

\ln E[e^X] \geq E[X]

Applying this to our case, with X = k Z, we get

\lambda(k) \geq k E[Z] \qquad \qquad \qquad (17)

Therefore, we will have for all negative values of k that

k t - \lambda(k) \leq 0

whenever t \geq E[Z] (because (17) will then hold a fortiori). But we are still free to choose the supremum on the right-hand side of (9) to be at least zero when kt - \lambda(k) \leq 0, so we can formally extend the supremum on the right-hand side of (9) over all k \in \mathbb{R} in the definition of the Cramér transform, thereby obtaining the Fenchel-Legendre transform in (10). Therefore, as stated in the main text, the Fenchel-Legendre transform will coincide with the Cramér transform at every t \geq E[Z].

To see that the Fenchel-Legendre transform is zero for k > 0 whenever t < E[Z], observe that this inequality implies kt < kE[Z], but by Jensen’s inequality we will then have \lambda(k) \geq kE[Z] > kt, so kt - \lambda(k) < 0. Since the supremum can always be chosen to be at least zero, the Fenchel-Legendre transform will indeed always be set to zero in this case.

Differentiating a double integral with respect to its upper limits

In relation to a calculation involving sums of random variables, I needed to differentiate with respect to \bar{T} a double integral of the form

\int_0^{\bar{T}} \int_0^{\bar{T}-T_1} f(T_1, T_2) \ dT_2 \  dT_1

The parameter \bar{T} appears in both upper limits, but appears on its own in the outer upper limit, and as \bar{T} - T_1 in the inner upper limit. I went through the motions of using Liebniz’s Rule and was interested to see how it worked in this case. I have not seen anything like this anywhere else online, so I want to record it here.

Liebniz’s Rule in the case of a single integral works thus:

\frac{d}{dx} \int_0^x f(t) dt = \frac{d}{dx} \big[F(x) - F(0)\big] = f(x)

Here, F is the antiderivative of f. The trick in the double integral case is to treat the inner integral like the function in the single integral case above. So, we differentiate the outer integral treating the inner integral as the function in the single integral case, then we differentiate the inner integral in the normal way. Thus, we write

\frac{d}{d\bar{T}} \int_0^{\bar{T}} \int_0^{\bar{T}-T_1} f(T_1, T_2) \ dT_2 \  dT_1

= \frac{d}{d\bar{T}}\int_0^{\bar{T}} \bigg(\int_0^{\bar{T}-T_1} f(T_1, T_2) \ dT_2 \bigg) \  dT_1 + \int_0^{\bar{T}} \frac{d}{d\bar{T}} \bigg( \int_0^{\bar{T}-T_1} f(T_1, T_2) \ dT_2\bigg) \  dT_1

= \int_0^{\bar{T} - \bar{T}} f(\bar{T}, T_2) \ dT_2  + \int_0^{\bar{T}}  f(T_1, \bar{T} - T_1) \  dT_1

= \int_0^{\bar{T}}  f(T_1, \bar{T} - T_1) \  dT_1

The first integral in the penultimate step vanishes because the upper limit is converted from \bar{T} - T_1 into \bar{T} - \bar{T} in the application of Liebniz’s Rule.

Bra-ket formalism of quantum mechanical measurement

The state of a quantum mechanical system is described by an element |\psi\rangle, called a ket, from a Hilbert space, i.e., a vector space that is complete and equipped with an inner product as well as a norm related to this inner product. Using bra-ket notation, the inner product of vectors |\phi\rangle and |\psi\rangle is represented as \langle\phi|\psi\rangle. In quantum mechanics, this inner product is not necessarily a real number, and we have the rule \langle\phi|\psi\rangle = \langle\psi|\phi\rangle^{\ast}.

An observable is represented by a hermitian operator, say \hat{Q}. A general operator \hat{A} acting on a ket |\psi\rangle gives another ket, \hat{A} |\psi\rangle. The inner product rule is then \langle\phi|\hat{A}|\psi\rangle = \langle\psi|\hat{A}^{\dag}|\phi\rangle^{\ast}, where \hat{A}^{\dag} is the hermitian conjugate of \hat{A}. However, a hermitian operator \hat{Q} has the inner product rule \langle\phi|\hat{Q}|\psi\rangle = \langle\psi|\hat{Q}|\phi\rangle^{\ast}, and therefore \hat{Q} = \hat{Q}^{\dag}. This is the defining property of a hermitian operator, and it is straightforward to show that satisfaction of this inner product rule implies the eigenvalues of \hat{Q} must be real.

For an observable represented by the hermitian operator \hat{Q}, we can always find an orthonormal basis of the state space that consists of the eigenvectors of \hat{Q}. In the discrete spectrum case, for example, the eigenvectors of \hat{Q} might be \{|\phi_i\rangle\} and the corresponding eigenvalues \{k_i\}. (The discussion can easily be adapted to the continuous spectrum case by replacing summation by integration, and sequences of discrete coefficients by continuous functions, etc.) In order to perform the measurement of the observable on the quantum state |\psi\rangle, we would first need to expand the state using as basis vectors the eigenvectors of \hat{Q}. We can do this using a projection operator

\hat{P} = \sum_i |\phi_i\rangle\langle\phi_i| \qquad \qquad \qquad (1)

Thus the expansion of |\psi\rangle in terms of the eigenvectors of \hat{Q} would be given by

|\psi\rangle = \hat{P}|\psi\rangle = \sum_i |\phi_i\rangle \langle\phi_i|\psi\rangle = \sum_i a_i |\phi_i\rangle \qquad \qquad \qquad (2)

where the expansion coefficient a_i \equiv \langle \phi_i|\psi\rangle is the projection of |\psi\rangle on the axis represented by the i-th eigenvector of \hat{Q}. (Notice that the quantum state |\psi\rangle itself remains unchanged by the action of projection operators like \hat{P}. Only the representation of |\psi\rangle changes, with respect to the eigenvectors of different hermitian operators).

Given the expansion of the ket

|\psi\rangle = \sum_i a_i |\phi_i\rangle \qquad \qquad \qquad (3)

we then have a corresponding expansion of the bra, given by

\langle\psi| = \sum_i \langle\phi_i| a_i^{\ast} \qquad \qquad \qquad (4)

The quantum state |\psi\rangle is normalised so that

\langle\psi|\psi\rangle = \sum_i \langle \phi_i| a_i^{\ast} a_i |\phi_i \rangle

= \sum_i \langle \phi_i| |a_i|^2 |\phi_i \rangle

= \sum_i |a_i|^2 \langle\phi_i|\phi_i\rangle

= \sum_i |a_i|^2

= 1 \qquad \qquad \qquad (5)

where the absence of cross-products of the eigenvectors of \hat{Q}, and the penultimate equality in (5), follow from the orthonormality of the eigenvectors of \hat{Q}. Thus, the sequence of squares of the absolute values of the expansion coefficients, \{|a_i|^2\}, can be regarded as representing a probability distribution.

We can use this probability distribution to calculate the expected value of the measurement of the observable \hat{Q}:

\langle \hat{Q} \rangle \equiv \langle \psi|\hat{Q}|\psi \rangle

= \big(\sum_i \langle \phi_i | a_i^{\ast}\big)\big(\sum_i a_i \hat{Q} |\phi_i \rangle \big)

= \big(\sum_i \langle \phi_i | a_i^{\ast}\big)\big(\sum_i a_i k_i |\phi_i \rangle \big)

= \sum_i k_i \langle \phi_i|a_i^{\ast} a_i |\phi_i\rangle

= k_i |a_i|^2 \langle \phi_i|\phi_i\rangle

= |a_i|^2 k_i \qquad \qquad \qquad (6)

In the case of two non-commuting hermitian operators, \hat{Q} and \hat{R}, we can easily derive Heisenberg’s uncertainty principle using this mathematical structure, as follows. Let

\langle \hat{Q} \rangle \equiv \langle \psi | \hat{Q} |\psi\rangle \qquad \qquad \qquad (7)

and

\langle \hat{R} \rangle \equiv \langle \psi | \hat{R} |\psi\rangle \qquad \qquad \qquad (8)

be the expectations of \hat{Q} and \hat{R}, computed as per (6). Let

\hat{Q}^{\prime} \equiv \hat{Q} - \langle \hat{Q} \rangle \qquad \qquad \qquad (9)

\hat{R}^{\prime} \equiv \hat{R} - \langle \hat{R} \rangle \qquad \qquad \qquad (10)

be corresponding deviations from the mean. (Note that \hat{Q}^{\prime} and \hat{R}^{\prime} must be hermitian if \langle \hat{Q} \rangle and \langle \hat{R} \rangle are, since they are obtained simply by subtracting a real number). Then the mean squared deviations are

\Delta q^2 \equiv \langle \psi |\hat{Q}^{\prime} \hat{Q}^{\prime} |\psi \rangle \equiv \langle \hat{Q}^{\prime \ 2} \rangle \qquad \qquad \qquad (11)

\Delta q^2 \equiv \langle \psi |\hat{R}^{\prime} \hat{R}^{\prime} |\psi \rangle \equiv \langle \hat{R}^{\prime \ 2} \rangle \qquad \qquad \qquad (12)

And by the Cauchy-Schwarz inequality, we can write

\Delta q^2 \Delta r^2

= \langle \hat{Q}^{\prime \ 2} \rangle \cdot \langle \hat{R}^{\prime \ 2} \rangle

\ge \langle \hat{Q}^{\prime}\hat{R}^{\prime} \rangle^2

= \bigg(\frac{1}{2} \langle \hat{Q}^{\prime}\hat{R}^{\prime} - \hat{R}^{\prime} \hat{Q}^{\prime}\rangle + \frac{1}{2} \langle \hat{Q}^{\prime}\hat{R}^{\prime} + \hat{R}^{\prime} \hat{Q}^{\prime}\rangle\bigg)^2

\ge \frac{1}{4} |\langle \hat{Q}^{\prime}\hat{R}^{\prime} - \hat{R}^{\prime} \hat{Q}^{\prime}\rangle|^2

= \frac{1}{4} |\langle [\hat{Q}^{\prime}, \hat{R}^{\prime}] \rangle|^2 \qquad \qquad \qquad (13)

Therefore we have

\Delta q \Delta r \ge \frac{1}{2} |\langle [\hat{Q}^{\prime}, \hat{R}^{\prime}] \rangle| \qquad \qquad \qquad (14)

(Notice that we introduced the norm brackets in (13) to allow for the fact that \langle \hat{Q}^{\prime}\hat{R}^{\prime} - \hat{R}^{\prime} \hat{Q}^{\prime}\rangle will in general be a complex number. The expression \hat{Q}^{\prime}\hat{R}^{\prime} - \hat{R}^{\prime} \hat{Q}^{\prime} is not hermitian, even though \hat{Q}^{\prime} and \hat{R}^{\prime} are).

Finally, observe that

[\hat{Q}^{\prime}, \hat{R}^{\prime}] = \hat{Q}^{\prime}\hat{R}^{\prime} - \hat{R}^{\prime} \hat{Q}^{\prime} = [\hat{Q}, \hat{R}] \qquad \qquad \qquad (15)

Using this in (14) gives us the key inequality of the Heisenberg uncertainty principle:

\Delta q \Delta r \ge \frac{1}{2} |\langle [\hat{Q}, \hat{R}] \rangle| \qquad \qquad \qquad (16)

For example, in the case of the momentum and position operators in one dimension, we have

[\hat{P}, \hat{X}] = -i \hbar \qquad \qquad \qquad (17)

Putting (17) into (16) then gives the canonical inequality of the Heisenberg uncertainty principle:

\Delta p \Delta x \ge \frac{1}{2} \hbar

Free particle in a ring with a periodic boundary condition

We consider a free particle restricted to a ring of length L, with a complete lap around the ring taken to begin at position -L/2 and end at position L/2. The general TISE is

\big[\frac{\hat{p}^2}{2m} + V(x)\big]\psi = E\psi

where

\hat{p} = -i\hbar \frac{d}{dx}

We take V(x) = 0 in the ring and we assume the periodic boundary condition \psi(x + L) = \psi(x).

The TISE becomes

\psi^{\prime \prime}(x) + \frac{2mE}{\hbar^2} \psi = 0

which has general solution

\psi(x) = Ae^{i \sqrt{2mE}/\hbar\cdot x} + Be^{-i \sqrt{2mE}/\hbar\cdot x}

In principle, this allows for two independent solutions superposed with coefficients A and B. However, the periodic boundary condition implies \psi(-L/2) = \psi(L/2) which in turn implies A = B, and periodic boundary conditions produce travelling waves, not standing waves. Therefore, the presence of both terms in the general solution above is a superposition of two running waves with the same amplitude but travelling in opposite directions. One of these running waves is superfluous for the purposes of developing the solution below, so we can set B = 0 and just focus on the forward travelling wave. The general solution then reduces to

\psi(x) = Ae^{i \sqrt{2mE}/\hbar\cdot x}

The periodic boundary condition requires \frac{\sqrt{2mE}}{\hbar} L to be an integer multiple of 2\pi, so using E_n = p_n^2/2m where E_n and p_n are the n-th energy and momentum states respectively (and note that this expression for E_n as p_n^2/2m is allowed only because V(x) = 0 in the ring), we have

p_n = \hbar \frac{2 \pi n}{L}

and

E_n = \frac{\hbar^2 2 \pi^2 n^2}{m L^2}

Choosing units equivalent to setting \hbar = 1, we can write the n-th wave function as

\psi_n(x) = Ae^{i p_n x}

To find the normalising constant A, we write

\int_{-L/2}^{L/2} \psi_n(x) \cdot \psi_n^{\ast}(x) dx = A^2\int_{-L/2}^{L/2} dx = 1

so A = \frac{1}{\sqrt{L}}. The solutions for the particle in a ring are then travelling waves of the form

\psi_n(x) = \frac{1}{\sqrt{L}} e^{i p_n x}

with corresponding momentum and energy states

p_n = \frac{2 \pi n}{L}

and

E_n = \frac{2 \pi^2 n^2}{mL^2}

respectively.

Note that the solutions here are travelling waves, and are different from the standing wave solutions obtained for the more commonly encountered particle-in-a-box problem with left- and right-hand endpoints 0 and L respectively, and with non-periodic boundary conditions \psi(0) = \psi(L) = 0. When no boundary conditions are specified at all, i.e., the particle is not confined to a box or a ring, then with V(x) = 0 the TISE and its general solution are the same as those initially obtained above, but energy and momentum are no longer quantized, i.e., they can take any values along a continuum.

Some mathematics relating to phonons

A mass connected to a spring and executing simple harmonic motion will oscillate at a natural frequency which is independent of the initial position or velocity of the mass. The particular pattern of vibration at the natural frequency is referred to as the mode of vibration corresponding to that natural frequency. Obviously, there is only one natural frequency and one corresponding mode of vibration for a single mass on a spring. However, a system consisting of two coupled masses connected by springs will, in general, have two distinct natural frequencies (the natural frequencies are often referred to as harmonic frequencies, or simply as harmonics), and two distinct modes of vibration corresponding to these natural frequencies. In general, a system consisting of N masses connected by springs will have N natural frequencies and a distinct mode of vibration for each of these N natural frequencies.

With this in mind, it is interesting to observe that the displacements from equilibrium of an oscillating system of masses connected to springs can be described BOTH in terms of a coupled system of ODEs, AND as an uncoupled system of ODEs, with each independent ODE in the uncoupled system representing a distinct mode of vibration of the original system characterised by a distinct natural frequency. A specific example of this will be given shortly. Amazingly, the same kind of idea also applies to the Hamiltonian of a vibrating lattice of quantum oscillators. The Hamiltonian will initially be expressed in a complicated way involving coupling of the quantum oscillators. However, with some strategic Fourier transforms, the Hamiltonian will be re-expressed in terms of uncoupled entities, each entity representing a distinct mode of vibration of the original lattice characterised by a distinct natural frequency. The transformed Hamiltonian will look exactly the same as the Hamiltonian for an uncoupled set of quantum harmonic oscillators, and quasiparticles called phonons will emerge in this framework as discrete packets of vibrational energy. These phonons are closely analogous to photons as carriers of discrete packets of energy in the context of electromagnetism.

Before considering the case of quantum oscillators in a lattice, it is instructive to explore a specific example of the situation for ODEs describing a simple mass-spring system. Consider two identical pendulums, A and B, connected by a spring whose relaxed length l is exactly equal to the distance between the pendulum bobs.

Suppose the system is displaced from equilibrium by moving the pendulums away from their relaxed positions and releasing them. The second picture above shows an arbitrary moment when the displacement of pendulum A is x_A and the displacement of pendulum B is x_B. Since the spring is being stretched by an amount x_A - x_B, the magnitude of the restoring force on A is

m \omega_0^2 x_A + k (x_A - x_B)

(i.e., the usual gravitational restoring force for a pendulum, plus the restoring force due to the spring). For B, the magnitude of the restoring force is

m \omega_0^2 x_B - k (x_A - x_B)

(the gravitational and spring restoring forces are opposing each other, as can be seen in the sketch above). Therefore using the usual equations F = m a and F = -kx for a spring system, the equations of motion for A and B are

m \frac{d^2}{dt^2}x_A + m \omega_0^2 x_A + k (x_A - x_B) = 0

m \frac{d^2}{dt^2}x_B + m \omega_0^2 x_B - k (x_A - x_B) = 0

Dividing through by m, and letting \omega_c^2 = k/m, we can write these as

\frac{d^2}{dt^2}x_A + (\omega_0^2 + \omega_c^2) x_A - \omega_c^2 x_B = 0 \qquad \qquad \qquad (1)

\frac{d^2}{dt^2}x_B + (\omega_0^2 + \omega_c^2) x_B - \omega_c^2 x_A = 0 \qquad \qquad \qquad (2)

These are a pair of coupled ODEs, but we can easily manipulate equations (1) and (2) to obtain two independent ODEs from them. Adding the two equations gives

\frac{d^2}{dt^2}(x_A + x_B) + \omega_0^2 (x_A + x_B) = 0

and subtracting (2) from (1) gives

\frac{d^2}{dt^2}(x_A - x_B) + (\omega_0^2 + 2\omega_c^2) (x_A - x_B) = 0

Writing q_1 = x_A + x_B and q_2 = x_A - x_B, we can re-express these as

\frac{d^2}{dt^2}q_1 + \omega_0^2 q_1 = 0 \qquad \qquad \qquad \qquad \qquad (3)

\frac{d^2}{dt^2}q_2 + (\omega_0^2 + 2\omega_c^2) q_2 = 0 \qquad \qquad \qquad \  (4)

These are now two uncoupled ODEs for simple harmonic oscillations, the first one having natural frequency \omega_0 and the second one having natural frequency (\omega_0^2 + 2\omega_c^2)^{1/2}. They are independent oscillations because, according to equations (3) and (4), changes in q_1 occur independently of changes in q_2, and vice versa. They can be solved independently to get sinusoidal expressions for q_1 and q_2. For example, q_1 = C \cos(\omega_0 t) and q_2 = D \cos( (\omega_0^2 + 2\omega_c^2)^{1/2} t) are possible solutions. The transformed equations (3) and (4) represent another way of describing the two normal modes of vibration occurring in the original coupled system (1) and (2), one normal mode having frequency \omega_0, the other having the higher frequency (\omega_0^2 + 2\omega_c^2)^{1/2}.

The present note is concerned with a similar idea at the quantum level, where we imagine N quantum oscillators connected linearly by forces which act like little springs, i.e., a 1-D lattice of quantum oscillators. We want to transform the Hamiltonian for this coupled system into a Hamiltonian which looks like the Hamiltonian for a system of N independent quantum harmonic oscillators. Analogous to the transformed ODE system for the two-mass system considered above, the transformed Hamiltonian will be expressed in terms of the N possible modes of vibration of the original coupled system. The vibration energies in this reformulated Hamiltonian will be seen to consist of discrete `packets’ of energy, \hbar \omega_k, which can be added to, or subtracted from, particular quantum oscillators in the lattice using creation and annihilation operators, just as in the usual quantum harmonic oscillator framework. As stated earlier, these discrete packets of energy can be regarded as particles (or, rather, quasiparticles) analogous to photons in electromagnetism, and they are referred to as phonons in the solid state physics literature. A phonon is thus a quantum of vibrational energy, \hbar \omega_k, needed to move a lattice of coupled quantum oscillators up or down to different vibrational energy levels.

So, we consider a one-dimensional chain of N connected quantum particles, each of mass m, connected by forces which act like springs of unstretched length a and with spring constant K.

The j-th mass is normally at position R_j = ja, but can be displaced slightly by amount x_j. Writing the quantum mechanical momentum and position operators as \hat{p}_j = - i \hbar \frac{\partial}{\partial x_j} and \hat{x}_j respectively, the Hamiltonian operator for the system (a generalisation of the Hamiltonian for a collection of uncoupled quantum oscillators) is

\hat{H} = \sum_{j=0}^{N-1} \bigg[\frac{\hat{p}_j^{2}}{2m} + \frac{1}{2}K(\hat{x}_{j+1}-\hat{x}_j)^2 \bigg] \qquad \qquad \qquad (5)

Although this system consists of masses which are strongly coupled to their neighbours by springs, we will show that if the system is perturbed from an initial state where each mass is at position R_j = ja (e.g., by stretching and releasing some of the springs), the vibrations of the 1-D lattice behave as if the system consisted of a set of uncoupled quantum harmonic oscillators. In other words, the Hamiltonian in (5) can be re-expressed in the form

\hat{H} = \sum_{k=0}^{N-1} \hbar \omega_k \bigg(\hat{a}_k^{\dag} \hat{a}_k + \frac{1}{2}\bigg) \qquad \qquad \qquad (6)

where \hbar is Planck’s constant, \omega_k is a natural frequency of the system depending on k, and \hat{a}_k^{\dag} and \hat{a}_k are creation and annihilation operators respectively (as in the context of the quantum harmonic oscillator). The subscript k on the creation and annihilation operators reflects the fact that they are only able to act on the k-th oscillator in the system, having no effect on the others. Thus, each oscillator in the system is understood to have its own pair of creation and annihilation operators.

We begin by applying discrete Fourier transforms to the position and momentum operators \hat{x}_j and \hat{p}_j appearing in (5):

\hat{x}_j = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \hat{x}_k e^{i k j a} \qquad \qquad \qquad \ \  (7)

\hat{x}_k = \frac{1}{\sqrt{N}} \sum_{j=0}^{N-1} \hat{x}_j e^{-i k j a} \qquad \qquad \qquad (8)

\hat{p}_j = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \hat{p}_k e^{i k j a} \qquad \qquad \qquad \  \  (9)

\hat{p}_k = \frac{1}{\sqrt{N}} \sum_{j=0}^{N-1} \hat{p}_j e^{-i k j a} \qquad \qquad \qquad (10)

We impose periodic boundary conditions, so that

e^{i k j  a} = e^{i k (j + N) a} \qquad \qquad \qquad (11)

Therefore, we must have

e^{i k N a} = 1 \qquad \qquad \qquad (12)

so

k = \frac{2 \pi m}{Na} \qquad \qquad \qquad (13)

where m can take integer values between 0 and N-1 inclusive, i.e., m is in the set of least positive residues of N.

An important observation is that

\sum_{j=0}^{N-1} e^{i k j a} = N \delta_{k, 0} \qquad \qquad \qquad (14)

because when k = 0 we have a sum of N terms on the left-hand side each equal to 1, whereas when k \neq 0 we can use the formula for the sum of a geometric progression to write

\sum_{j=0}^{N-1} e^{i k j a} =  \sum_{j=0}^{N-1} (e^{i k a})^j = \frac{(e^{i k a})^N - 1}{e^{i k a} - 1} = 0

since (e^{i k a})^N = e^{i 2 \pi m} = 1. The result in (14) is used repeatedly in what follows.

Note that (8) is the inverse transform for (7), and (10) is the inverse transform for (9). To see how this works, let us confirm that using (8) in the right-hand side of (7) gives \hat{x}_j. Making the substitution we get

\frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \bigg[\frac{1}{\sqrt{N}}\sum_{j^{\prime}=0}^{N-1}\hat{x}_{j^{\prime}} e^{- i k j^{\prime} a}\bigg] e^{i k j a}

= \frac{1}{N} \sum_{j^{\prime}=0}^{N-1} \hat{x}_j^{\prime} \bigg[\sum_{k=0}^{N-1} e^{- i k (j^{\prime}-j)a}\bigg]

= \frac{1}{N} \sum_{j^{\prime}=0}^{N-1} \hat{x}_{j^{\prime}} N \delta_{(j^{\prime}-j), 0}

(using (14))

= \hat{x}_j

as claimed, since the only non-zero term in the sum in the penultimate line is the one corresponding to j^{\prime} = j. Thus, \hat{x}_k in (8) is the inverse operator of \hat{x}_j in (7), and likewise \hat{p}_k in (10) is the inverse operator of \hat{p}_j in (9).

Next, we observe that for the usual operators in (7) and (9) we have the commutation relations

[\hat{x}_j, \hat{p}_{j^{\prime}}] = i \hbar \delta_{j, j^{\prime}} \qquad \qquad \qquad (15)

The inverse transforms in (8) and (10) then imply that we have the following commutation relations for the inverse operators:

[\hat{x}_k, \hat{p}_{k^{\prime}}] = i \hbar \delta_{k, -k^{\prime}} \qquad \qquad \qquad (16)

To see this, observe that we have

[\hat{x}_k, \hat{p}_{k^{\prime}}] = \frac{1}{\sqrt{N}} \sum_{j=0}^{N-1} \hat{x}_j e^{- i k j a} \cdot \frac{1}{\sqrt{N}} \sum_{j^{\prime}=0}^{N-1} \hat{p}_{j^{\prime}} e^{-i k^{\prime} j^{\prime} a}

- \frac{1}{\sqrt{N}} \sum_{j^{\prime}=0}^{N-1} \hat{p}_{j^{\prime}} e^{- i k^{\prime} j^{\prime} a} \cdot  \frac{1}{\sqrt{N}} \sum_{j=0}^{N-1} \hat{x}_j e^{- i k j a}

= \frac{1}{N} \sum_{j=0}^{N-1}  \sum_{j^{\prime}=0}^{N-1} e^{-i k j a} e^{- i k^{\prime} j^{\prime} a}  [\hat{x}_j, \hat{p}_{j^{\prime}}]

= \frac{1}{N} i \hbar \sum_{j = 0}^{N-1} e^{- i (k + k^{\prime}) j a}

(using (15))

= \frac{1}{N} i \hbar N \delta_{k, -k^{\prime}}

(using (14))

= i \hbar \delta_{k, -k^{\prime}}

as claimed.

With these results we are now in a position to work out the terms in the Hamiltonian in (5) above in terms of the inverse operators \hat{p}_k and \hat{x}_k. First, using (9), we have

\sum_{j=0}^{N-1} \hat{p}_j^2

= \sum_{j=0}^{N-1} \bigg(\frac{1}{\sqrt{N}}\sum_{k=0}^{N-1}\hat{p}_k e^{i k j a}\bigg) \bigg(\frac{1}{\sqrt{N}}\sum_{k^{\prime}=0}^{N-1}\hat{p}_{k^{\prime}} e^{i k^{\prime} j a}\bigg)

= \frac{1}{N} \sum_{j=0}^{N-1} \sum_{k=0}^{N-1} \sum_{k^{\prime}=0}^{N-1} \hat{p}_k \hat{p}_{k^{\prime}} e^{i (k + k^{\prime}) j a}

= \frac{1}{N} \sum_{k=0}^{N-1} \sum_{k^{\prime}=0}^{N-1} \hat{p}_k \hat{p}_{k^{\prime}} \bigg(\sum_{j=0}^{N-1}e^{i (k + k^{\prime}) j a}\bigg)

(carrying out the spatial sum)

= \frac{1}{N} \sum_{k=0}^{N-1} \sum_{k^{\prime}=0}^{N-1} \hat{p}_k \hat{p}_{k^{\prime}} N \delta_{k, -k^{\prime}}

(using (14))

= \sum_{k=0}^{N-1} \sum_{k^{\prime}=0}^{N-1} \hat{p}_k \hat{p}_{k^{\prime}} \delta_{k, -k^{\prime}}

= \sum_{k=0}^{N-1} \hat{p}_k \hat{p}_{-k} \qquad \qquad \qquad (17)

where in the last step we used the Kronecker delta to carry out one of the momentum sums. This has the effect of setting k^{\prime} = -k (because any term not satisfying this will disappear), leaving us with a sum over a single index.

Next, using (7), we have

\hat{x}_{j+1} = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \hat{x}_k e^{i k (j+1)a}

= \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \hat{x}_k e^{i k j a} e^{i k a}

Subtracting (7) from this we get

\hat{x}_{j+1} - \hat{x}_j = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \hat{x}_k e^{i k j a}(e^{i k a} - 1)

Using this to deal with the other terms in the Hamiltonian in (5) we get

\sum_{j=0}^{N-1} (\hat{x}_{j+1} - \hat{x}_j)^2

= \frac{1}{N} \sum_{j=0}^{N-1}  \sum_{k=0}^{N-1}  \sum_{k^{\prime}=0}^{N-1} \hat{x}_k \hat{x}_{k^{\prime}} e^{i (k + k^{\prime}) j a} (e^{i k a} - 1)(e^{i k^{\prime} a} - 1)

= \frac{1}{N} \sum_{k=0}^{N-1}  \sum_{k^{\prime}=0}^{N-1} \hat{x}_k \hat{x}_{k^{\prime}} (e^{i k a} - 1)(e^{i k^{\prime} a} - 1)\bigg(\sum_{j=0}^{N-1}e^{i (k + k^{\prime}) j a}\bigg)

= \frac{1}{N} \sum_{k=0}^{N-1}  \sum_{k^{\prime}=0}^{N-1} \hat{x}_k \hat{x}_{k^{\prime}} (e^{i k a} - 1)(e^{i k^{\prime} a} - 1)N\delta_{k, -k^{\prime}}

= \sum_{k=0}^{N-1} \hat{x}_k \hat{x}_{-k}  (e^{i k a} - 1)(e^{- i k a} - 1)

= \sum_{k=0}^{N-1} \hat{x}_k \hat{x}_{-k}  (1 - e^{i k a} - e^{- i k a} + 1)

= \sum_{k=0}^{N-1} \hat{x}_k \hat{x}_{-k}  \big(1 - \cos(ka) - i \sin(ka) - \cos(ka) + i \sin(ka) + 1\big)

= \sum_{k=0}^{N-1} \hat{x}_k \hat{x}_{-k} 2 \big(1 - \cos(ka) \big)

= \sum_{k=0}^{N-1} \hat{x}_k \hat{x}_{-k} \big(4 \sin^2\big(\frac{ka}{2}\big)\big) \qquad \qquad \qquad (18)

(using the result 1 - \cos(\theta) = 2\sin^2(\theta/2)).

Using (17) and (18), we can rewrite the Hamiltonian in (5) as

\hat{H} = \sum_{k=0}^{N-1} \bigg[\frac{1}{2m} \hat{p}_k \hat{p}_{-k} + \frac{1}{2} m \omega_k^2 \hat{x}_k \hat{x}_{-k} \bigg] \qquad \qquad \qquad (19)

where \omega_k^2 \equiv 4(K/m) \sin^2 (ka/2).

Next, we observe from (10) that the Hermitian conjugate of \hat{p}_k is given by

\hat{p}_k^{\dag} = \frac{1}{\sqrt{N}} \sum_{j=0}^{N-1} \hat{p}_j^{\dag} e^{i k j a}

But since \hat{p}_j is Hermitian, we have \hat{p}_j^{\dag} = \hat{p}_j. Therefore

\hat{p}_k^{\dag} = \frac{1}{\sqrt{N}} \sum_{j=0}^{N-1} \hat{p}_j e^{i k j a} = \hat{p}_{-k} \qquad \qquad \qquad (20)

A similar argument using (8) shows that

\hat{x}_k^{\dag} = \hat{x}_{-k} \qquad \qquad \qquad (21)

[Formally, the adjoint of an operator A is defined by \langle \psi, A \phi \rangle = \langle A^{\dag} \psi, \phi \rangle. In the case of a Hermitian operator, we have \langle \psi, A \phi \rangle = \langle A \psi, \phi \rangle. Therefore, A^{\dag} = A. This guarantees that the operator has real eigenvalues. In our case, by working out the inner products \langle \psi, \hat{p}_j \phi \rangle and \langle \hat{p}_j \psi, \phi \rangle, for example, we would find that they are equal, confirming that \hat{p}_j^{\dag} = \hat{p}_j].

Using these results with (16), we can write down the commutation relation between \hat{x}_k and \hat{p}_k^{\dag} as

[\hat{x}_k, \hat{p}_{k^{\prime}}^{\dag}] = [\hat{x}_k, \hat{p}_{-k^{\prime}}] = i \hbar \delta_{k, k^{\prime}} \qquad \qquad \qquad (22)

We can also write down the creation and annihilation operators (cf. the quantum harmonic oscillator) as

\hat{a}_k = \sqrt{\frac{m \omega_k}{2 \hbar}} \big(\hat{x}_k + \frac{i}{m \omega_k} \hat{p}_k\big) \qquad \qquad \qquad \  \  \  \   (23)

\hat{a}_k^{\dag} = \sqrt{\frac{m \omega_k}{2 \hbar}} \big(\hat{x}_{-k} - \frac{i}{m \omega_k} \hat{p}_{-k}\big) \qquad \qquad \qquad (24)

It is easy to show that these satisfy the commutation relations

[\hat{a}_k^{\dag}, \hat{a}_{k^{\prime}}^{\dag}] = 0 \qquad \qquad \qquad \  \  \ (25)

[\hat{a}_k, \hat{a}_{k^{\prime}}] = 0 \qquad \qquad \qquad \  \  \ (26)

[\hat{a}_k, \hat{a}_{k^{\prime}}^{\dag}] = \delta_{k, k^{\prime}} \qquad \qquad \qquad (27)

Therefore, creation and annihilation operators with subscript k can only act on an oscillator corresponding to the same wave vector k. When they try to act on an oscillator corresponding to a different wave vector, they commute, so they are unable to have any effect. To demonstrate (27) in the case k^{\prime} = k, we have

[\hat{a}_k, \hat{a}_k^{\dag}] = \hat{a}_k \hat{a}_k^{\dag} -  \hat{a}_k^{\dag}  \hat{a}_k

= \frac{m \omega_k}{2 \hbar} \bigg[\big(\hat{x}_k + \frac{i}{m \omega_k} \hat{p}_k\big) \big(\hat{x}_{-k} - \frac{i}{m \omega_k} \hat{p}_{-k}\big) -  \big(\hat{x}_{-k} - \frac{i}{m \omega_k} \hat{p}_{-k}\big) \big(\hat{x}_k + \frac{i}{m \omega_k} \hat{p}_k\big)\bigg]

= \frac{m \omega_k}{2 \hbar} \bigg[-\frac{i}{m \omega_k}\hat{x}_k\hat{p}_{-k} + \frac{i}{m\omega_k}\hat{p}_k\hat{x}_{-k} - \frac{i}{m\omega_k}\hat{x}_{-k}\hat{p}_k + \frac{i}{m\omega_k}\hat{p}_{-k}\hat{x}_k\bigg]

= \frac{m \omega_k}{2 \hbar} \bigg[-\frac{i}{m\omega_k}[\hat{x}_k, \hat{p}_{-k}] + \frac{i}{m\omega_k}[\hat{p}_k, \hat{x}_{-k}]\bigg]

= \frac{m \omega_k}{2 \hbar} \bigg[-\frac{i}{m\omega_k}(i \hbar) + \frac{i}{m\omega_k}(-i\hbar)\bigg]

= 1

as required.

We can invert equations (23) and (24). From (24) we get

\hat{a}_{-k}^{\dag} = \sqrt{\frac{m \omega_k}{2\hbar}}\big(\hat{x}_k - \frac{i}{m\omega_k}\hat{p}_k\big)

so

\hat{x}_k = \sqrt{\frac{2\hbar}{m\omega_k}}\hat{a}_{-k}^{\dag} + \frac{i}{m\omega_k}\hat{p}_k \qquad \qquad \qquad (28)

Using (28) to substitute for \hat{x}_k in (23) we get, with some easy manipulations,

\hat{p}_k = -i\sqrt{\frac{m\hbar\omega_k}{2}}\big(\hat{a}_k - \hat{a}_{-k}^{\dag}\big) \qquad \qquad \qquad (29)

And using (29) to substitute for \hat{p}_k in (28) we get

\hat{x}_k = \sqrt{\frac{\hbar}{2m\omega_k}}\big(\hat{a}_{-k}^{\dag} + \hat{a}_k\big) \qquad \qquad \qquad (30)

Therefore, using (29), we deduce

\frac{1}{2m}\hat{p}_k\hat{p}_{-k}

= \frac{1}{2m}\bigg( -i\sqrt{\frac{m\hbar\omega_k}{2}}\big(\hat{a}_k - \hat{a}_{-k}^{\dag}\big) \bigg)\bigg( -i\sqrt{\frac{m\hbar\omega_k}{2}}\big(\hat{a}_{-k} - \hat{a}_k^{\dag}\big) \bigg)

= -\big(\frac{\hbar \omega_k}{4}\big)\big(\hat{a}_k \hat{a}_{-k} - \hat{a}_k \hat{a}_k^{\dag} - \hat{a}_{-k}^{\dag} \hat{a}_{-k} + \hat{a}_{-k}^{\dag} \hat{a}_k^{\dag}\big) \qquad \qquad \qquad (31)

and, similarly, using (30) we deduce

\frac{1}{2}m\omega_k^2 \hat{x}_k \hat{x}_{-k}

=  \frac{1}{2}m\omega_k^2 \bigg( \sqrt{\frac{\hbar}{2m\omega_k}}\big(\hat{a}_{-k}^{\dag} + \hat{a}_k\big) \bigg)\bigg( \sqrt{\frac{\hbar}{2m\omega_k}}\big(\hat{a}_{k}^{\dag} + \hat{a}_{-k}\big) \bigg)

= \big(\frac{\hbar \omega_k}{4}\big) \big(\hat{a}_k \hat{a}_{-k} + \hat{a}_k \hat{a}_k^{\dag} + \hat{a}_{-k}^{\dag} \hat{a}_{-k} + \hat{a}_{-k}^{\dag} \hat{a}_k^{\dag}\big) \qquad \qquad \qquad (32)

Adding (31) and (32) we get

\frac{\hbar \omega_k}{2}\big(\hat{a}_k\hat{a}_k^{\dag} + \hat{a}_{-k}^{\dag} \hat{a}_{-k}\big) \qquad \qquad \qquad (33)

Therefore the Hamiltonian in (19) becomes

\hat{H} = \sum_{k=0}^{N-1} \bigg[\frac{\hbar \omega_k}{2} \big(\hat{a}_k\hat{a}_k^{\dag} + \hat{a}_{-k}^{\dag} \hat{a}_{-k}\big) \bigg] \qquad \qquad \qquad (34)

But, by inspection, \hat{a}_{-k}^{\dag} \hat{a}_{-k} and \hat{a}_{k}^{\dag} \hat{a}_{k} are the same operators, just expressed with different indices. Therefore, re-indexing this term in the Hamiltonian in (34) we get

\hat{H} = \sum_{k=0}^{N-1} \bigg[\frac{\hbar \omega_k}{2} \big(\hat{a}_k\hat{a}_k^{\dag} + \hat{a}_{k}^{\dag} \hat{a}_{k}\big) \bigg] \qquad \qquad \qquad (35)

Finally, using the commutator in (27) with k^{\prime} = k, we have

\hat{a}_k  \hat{a}_k^{\dag} = 1 +  \hat{a}_k^{\dag} \hat{a}_k

Therefore, the Hamiltonian in (35) can be written as

\hat{H} = \sum_{k=0}^{N-1} \bigg[\hbar \omega_k \bigg(\hat{a}_{k}^{\dag} \hat{a}_{k} + \frac{1}{2}\bigg) \bigg] \qquad \qquad \qquad (36)

which is the same as (6). This result shows that the Hamiltonian for a linear chain of coupled quantum oscillators can be expressed in terms of modes of vibration which behave like independent and uncoupled quantum harmonic oscillators. The k-th mode of vibration can accept increments to its energy in integer multiples of a basic quantum of energy, \hbar \omega_k. As in the standard quantum harmonic oscillator model, these quanta of energy behave like particles, and we call these quanta phonons in the present context.

Quick way to explain the concept of a solid angle

I was struggling to explain the concept of a solid angle to a student. I found that the following approach, by analogy with plane angles, succeeded.

In the case of a plane angle d \theta (in radians) subtended by an arc length d a, the following relationship holds:

\frac{d \theta}{2 \pi} = \frac{da}{2 \pi r} \qquad \qquad \qquad (1)

On the left-hand side we have the ratio of d \theta to the circumference of a unit circle, 2 \pi. On the right-hand side, we have the ratio of the arc length d a to the circumference of a circle of radius r, 2 \pi r. So, the idea is that d \theta is a measure of the part of the circumference of the unit circle at the apex that is covered by the projection of d a on to this unit circle. From (1) we get

d \theta = \frac{da}{r} \qquad \qquad \qquad (2)

(or the familiar formula for calculating arc lengths, da = r d \theta).

A solid angle is simply a 3D analogue of this.

Instead of a unit circle at the apex, we have a unit sphere with surface area 4 \pi. The solid angle d \Omega (in steradians) is subtended by the area element d A, and these satisfy the following relationship:

\frac{d \Omega}{4 \pi} = \frac{d A}{4 \pi r^2} \qquad \qquad \qquad (3)

On the left-hand side we have the ratio of d \Omega to the surface area of the unit sphere, 4 \pi. On the right-hand side, we have the ratio of the area element d A to the surface area of a sphere of radius r, 4 \pi r^2. So, the idea is that d \Omega is a measure of the part of the surface area of the unit sphere at the apex that is covered by the projection of d A on to this unit sphere. From (3), we get the familiar formula for solid angles,

d \Omega = \frac{d A}{r^2} \qquad \qquad \qquad (4)

It is possible to solve some simple problems involving solid angles purely by symmetry considerations. For example, to work out the solid angle subtended by one of the faces of a cube, we note that the cube has six faces of equal area and together they must account for the entire surface area of the unit sphere at the apex. Therefore, a single cube face must account for one-sixth of the surface area of the unit sphere at the apex, so we have

\frac{d \Omega}{4 \pi} = \frac{1}{6}

\implies

d \Omega = \frac{2 \pi}{3}

Therefore, a single cube face subtends a solid angle of 2 \pi/3 steradians.

Useful methods in Laplace transform theory

A Laplace transform L(f) of a function f(t) is defined by the equation

L(f) = \int_0^{\infty} f(t) e^{-pt} dt = F(p) \qquad \qquad \qquad (1)

The idea is that a Laplace transform takes as an input a function of t, f(t), and yields as the output a function of p, F(p).

In many applications of this idea, for example when applying a Laplace transform to the solution of a differential equation, we need to find the inverse of the transform to obtain the required solution. This can often be done by consulting tables of Laplace transforms and their inverses in reference manuals, but we can also invert Laplace transforms using something called the Bromwich integral. I want to explore the latter approach in this note, as it involves the useful trick of solving integrals by converting them to contour integration problems on the complex plane. I also want to record a useful trick for changing the order of integration in double integrals which is employed in the proof of a convolution theorem for Laplace transforms.

Differential equations can be solved using Laplace transforms by finding the transforms of derivatives y^{\prime} = dy/dt, y^{\prime \prime} = d^2y/dt^2, etc. To find L(y^{\prime}), we use the above definition of L(f) and integrate by parts as follows:

L(y^{\prime}) = \int_0^{\infty} y^{\prime}(t) e^{-pt} dt

= \big[e^{-pt} y(t) \big]_0^{\infty} + p\int_0^{\infty} e^{-pt}y(t)dt

= -y(0) + pL(y) = pY - y_0  \qquad \qquad \qquad (2)

where for simplicity we have written L(y) = Y and y(0) = y_0.

To find L(y^{\prime \prime}), we think of y^{\prime \prime} as (y^{\prime})^{\prime} and substitute y^{\prime} for y in (2) to get

L(y^{\prime \prime}) = pL(y^{\prime}) - y^{\prime}(0)

= p(pL(y) - y(0)) - y^{\prime}(0)

= p^2 L(y) - py(0) - y^{\prime}(0)

= p^2 Y - py_0 - y_0^{\prime}  \qquad \qquad \qquad (3)

Continuing this process, we can also obtain the transforms of higher derivatives. Using these results, we can now solve differential equations. For example, suppose we want to solve

y^{\prime \prime} + 4y^{\prime} + 4y = t^2 e^{-2t}  \qquad \qquad \qquad (4)

with initial conditions y_0 = 0 and y_0^{\prime} = 0. We simply take the Laplace transform of each term in the equation. We get

p^2 Y - py_0 - y_0^{\prime} + 4pY - 4y_0 + 4Y = L(t^2 e^{-2t}) \qquad \qquad \qquad (5)

Using the given initial conditions, the left-hand side of (5) reduces to (p+2)^2 Y. We can obtain the Laplace transform on the right-hand side of (5) by consulting a reference table, or by integrating. Following the latter approach, we integrate by parts once to get

L(t^2e^{-2t}) = \int_0^{\infty} t^2 e^{-2t}e^{-pt}dt = \int_0^{\infty} t^2e^{-(p+2)t}dt = \frac{2}{(p+2)}\int_0^{\infty}t e^{-(p+2)t}dt

and then integrate by parts again to get

\int_0^{\infty} t e^{-(p+2)t} = \frac{1}{(p+2)^2}

Combining these, we get

L(t^2 e^{-2t}) = \frac{2}{(p+2)^3}

Alternatively, from standard reference tables we find that

L(t^k e^{-at}) = \frac{k!}{(p+a)^{k+1}} \qquad \qquad \qquad (6)

Therefore

L(t^2 e^{-2t}) = \frac{2}{(p+2)^3}

Using these results in (5) we get

(p+2)^2 Y = \frac{2}{(p+2)^3}

and so

Y = \frac{2}{(p+2)^5} \qquad \qquad \qquad (7)

Recall that Y = L(y), so what we need to do now to solve the problem is to apply an inverse Laplace transform to both sides of (7). This would give

y = L^{-1}\big(\frac{2}{(p+2)^5}\big) \qquad \qquad \qquad (8)

We could evaluate the right-hand side of (8) using the formula from the reference table in (6). We see from this formula that we need k=4, a=2, so we can immediately conclude that

y = L^{-1}\big(\frac{2}{(p+2)^5}\big) = \frac{2t^4e^{-2t}}{4!} = \frac{t^4e^{-2t}}{12} \qquad \qquad \qquad (9)

This is the required solution to the second-order differential equation in (4) above. What I want to do now is explore the Bromwich integral for Laplace transforms, and use it to confirm the answer in (9) by obtaining the inverse Laplace transform of \frac{2}{(p+2)^5} via the residue theorem from complex analysis.

In the definition of the Laplace transform in (1) above, we now let p be a complex number, say p = z = x + iy. Then the Laplace transform becomes

F(p) = F(z) = F(x + iy) = \int_0^{\infty} e^{-pt} f(t) dt

= \int_0^{\infty} e^{-(x+iy)t} f(t) dt = \int_0^{\infty} e^{-xt} f(t) e^{-iyt} dt \qquad \qquad \qquad (10)

where x = \text{Re} \ p > k for some real k. [We must have some restriction on \text{Re} \ p to make the integral converge at infinity. The restriction depends on what the function f(t) is, but it is always of the form \text{Re} \ p > k for some real k. To illustrate this idea, consider the case f(t) = 1. Then we have

\int_0^{\infty} e^{-pt} dt = \big[ -\frac{1}{p}e^{-pt}\big]_0^{\infty} = \frac{1}{p}

The final equality here holds only for p > 0, because otherwise e^{-pt} would not vanish at the upper limit of the integral. If p is complex, then \text{Re} \ p > 0 would be required in this case]. Now, (10) has a form similar to that of a Fourier transform

g(\alpha) = \frac{1}{2 \pi}\int_{-\infty}^{\infty} f(t) e^{-i \alpha t} dt \qquad \qquad \qquad (11)

with inverse

f(t) = \int_{-\infty}^{\infty} g(\alpha) e^{i \alpha t} d \alpha \qquad \qquad \qquad (12)

F(p) in (10) corresponds to g(\alpha) in (11), and f(t) in (11) corresponds to \phi(t) = 2 \pi e^{-xt} f(t) in (10). Pursuing this analogy with a Fourier transform, the inverse transform for (10), corresponding to (12), would then be

\phi(t) = \int_{-\infty}^{\infty} F(x + iy) e^{i y t} dy \qquad \qquad \qquad (13)

Using the definition of \phi(t), we can write this as

f(t) = \frac{1}{2 \pi} e^{xt} \int_{-\infty}^{\infty} F(x+iy) e^{i y t} dy

= \frac{1}{2 \pi} \int_{-\infty}^{\infty} F(x+iy) e^{(x+iy)t} dy \qquad \qquad \qquad (14)

for t > 0. Since x = c is a constant in this setup, we have dz = d(x+iy) = idy, so we can write (14) as

f(t) = \frac{1}{2 \pi i} \int_{c-i\infty}^{c+i\infty} F(x+iy) e^{(x+iy)t} dz

= \frac{1}{2 \pi i} \int_{c-i\infty}^{c+i\infty} F(z) e^{zt} dz \qquad \qquad \qquad (15)

where the notation means that we integrate along a vertical line x = c in the z plane. [This can be any vertical line on which x = c > k as required by the restriction on \text{Re} \ p]. The integral (15) for the inverse Laplace transform is called the Bromwich integral.

To use (15) to evaluate f(t) for a given F(p), we exploit the fact that we can evaluate integrals in the complex plane by using the residue theorem. We imagine a contour consisting of a vertical straight line and a semicircle enclosing an area to the left of x = c. We evaluate (15) by using this contour.

We restrict F(z) to be of the form P(z)/Q(z), with P(z) and Q(z) polynomials, and Q(z) of degree at least one higher than P(z). The value of the integral around the contour is determined by the singularities lying inside the semicircle, in accordance with the residue theorem. Specifically, the residue theorem says that the integral around the contour equals 2 \pi i times the sum of the residues of F(z)e^{zt} at its poles. As the radius of the semicircle goes to infinity, the integral along the straight line becomes an improper integral from c - i\infty to c+ i\infty. However, the integral around the semicircle has the radius appearing to a higher degree in the denominator than in the numerator, so this integral goes to zero as the radius becomes infinite. We are left only with the improper integral along the straight line, and the residue theorem then assures us that this must equal 2 \pi i times the sum of the residues of F(z)e^{zt} at its poles . Cancelling the factor 2 \pi i from (15), we conclude that

f(t) = \text{the sum of the residues of } F(z) e^{zt} \text{ at all poles.} \qquad \qquad \qquad (16)

So, to find f(t) from the Laplace transform F(p), we simply construct the complex function F(z)e^{zt}, and then sum the residues at all the poles of this constructed function. Note that we must include all poles in (16), i.e., we must choose x = c such that all the poles of F(z)e^{zt} lie inside the contour to the left of the vertical line we are integrating along. To find the residues at each pole, we can use the following rule:

When a function f(z) has a pole of order n at z_0, the residue at this pole can be obtained by multiplying f(z) by (z - z_0)^n, differentiating the result n-1 times, dividing by (n-1)!, and evaluating the resulting expression at z = z_0.

We can now apply this approach to find the inverse of the transform

F(p) = \frac{2}{(p+2)^5}

We have

F(z)e^{zt} = \frac{2e^{zt}}{(z+2)^5}

This has a pole of order 5 at z = -2. Therefore, multiplying by (z+2)^5 we get 2e^{zt}, differentiating this result n - 1 = 4 times we get 2t^4e^{zt}, and dividing by (n - 1)! = 4! = 24 we get \frac{t^4 e^{zt}}{12}. Finally, evaluating the result at z = -2 gives the answer

f(t) = \frac{t^4e^{-2t}}{12}

which exactly matches the answer we got in (9) above.

We also have a convolution theorem for Laplace transforms, the proof of which I want to explore here. The convolution theorem says that the Laplace transform of a convolution of functions is the product of the Laplace transforms of the individual functions in the convolution. So, if g(t) and h(t) are functions, and G(p) and H(p) are their corresponding Laplace transforms, then the convolution theorem says

G(p)H(p) = L\bigg[\int_0^t g(t - \tau)h(\tau) d\tau \bigg] \qquad \qquad \qquad (17)

where

(g \ast h)(t) = \int_0^t g(t - \tau)h(\tau) d\tau \qquad \qquad \qquad (18)

is the convolution of g and h. To prove (17), note that from the definition of a Laplace transform in (1) above we have

G(p)H(p) = \int_0^{\infty} e^{-pt} g(t) dt \int_0^{\infty} e^{-pt} h(t) dt

We will now rewrite this by replacing t by different dummy variables of integration so that we can write the product of the two integrals as a double integral. We then have

G(p)H(p) = \int_0^{\infty} e^{-p \sigma} g(\sigma) d\sigma \int_0^{\infty} e^{-p \tau} h(\tau) d\tau

= \int_0^{\infty} \int_0^{\infty} e^{-p(\sigma+\tau)}g(\sigma)h(\tau)d\sigma d\tau \qquad \qquad \qquad (19)

We now make the change of variables \sigma + \tau = t in the inner integral in (19), i.e., the integral with respect to \sigma, so that \tau is treated as fixed. Then \sigma = t - \tau, d\sigma = dt, and when \sigma = 0 we have t = \tau, while when \sigma = \infty we have t = \infty. Then (19) becomes

G(p)H(p) = \int_{\tau=0}^{\infty} \int_{t=\tau}^{\infty} e^{-pt} g(t-\tau)h(\tau) dt d\tau \qquad \qquad \qquad (20)

We will now change the order of integration using the following diagrams to interpret what is going on in (20):

From the first diagram, we see that the double integral in (20) is over the area to the right of the diagonal t = \tau line. Within a given strip of width d\tau, the t integral adds up little area elements from the line t = \tau to t = \infty, then the \tau integral sums over the horizontal strips from \tau = 0 to \tau = \infty, covering the whole infinite area to the right of the diagonal line. However, we can obtain exactly the same result by considering a vertical strip of width dt as shown in the second diagram. Changing the order of integration and integrating with respect to \tau first, then within the given strip of width dt, the \tau integral adds up little area elements from \tau = 0 to the line \tau = t, and then the t integral sums over the vertical strips from t = 0 to t = \infty. In doing things this way round, we are working with the double integral

G(p)H(p) = \int_{t=0}^{\infty} \int_{\tau=0}^{t} e^{-pt}g(t-\tau)h(\tau) d\tau dt

= \int_{t=0}^{\infty} e^{-pt} \bigg[\int_{\tau=0}^t  g(t-\tau)h(\tau) d\tau \bigg] dt

= L\bigg[\int_0^t  g(t-\tau)h(\tau) d\tau \bigg] \qquad \qquad \qquad (21)

where the last step follows from the definition of a Laplace transform in (1) above. As per the construction in the diagrams, the two integrals (20) and (21) must be the same, so the convolution theorem in (17) is proved. The proof is useful in showing how a double infinite integral can be converted into a double integral in which only one of the integrals is improper, and vice versa. I have come across the need for this technique in different contexts.

A couple of results involving the beta function

In this note, I want to quickly record a couple of interesting results involving the beta function. First, consider the reciprocal beta function

f(n) = \frac{\Gamma(\gamma)}{B(n, \gamma)}

where

B(n, \gamma) = \int_0^1 dx \ x^{n-1} (1 - x)^{\gamma - 1} = \frac{\Gamma(n) \Gamma(\gamma)}{\Gamma(n + \gamma)}

It is easy to see by using Stirling’s formula

\Gamma(p + 1) \sim p^p e^{-p} \sqrt{2 \pi p}

in the numerator and denominator of f(n) that

f(n) = \frac{\Gamma(\gamma)}{B(n, \gamma)} = \frac{\Gamma(n + \gamma)}{\Gamma(n)} \sim n^{\gamma}

for n \gg \gamma with \gamma fixed. Therefore, the reciprocal beta function f(n) converges to a simple power law.

Second, we can easily show that the beta function converges when summed over n by exchanging the summation and integration operations. We obtain the infinite sum of the beta function as

\sum_{n=1}^{\infty} B(n, \gamma) = \sum_{n=1}^{\infty} \int_0^1 dx \ x^{n-1} (1 - x)^{\gamma - 1}

= \int_0^1 dx \ (1 - x)^{\gamma - 1} \bigg(\sum_{n=1}^{\infty} x^{n-1}\bigg) = \int_0^1 dx \ (1 - x)^{\gamma - 1} \cdot \frac{1}{(1 - x)} = \frac{1}{\gamma -1}

Strange notations in mathematical probability

I sometimes see things like E[X] = \int t dF_X(t) or E[X] = \int t F_X(dt) instead of the usual E[X] = \int t f_X(t) dt. Where do these strange notations come from? Suppose we have a random variable X, considered as a function which maps events from a \sigma-field to an interval on the real line. Let the random variable take values in the range [a, b], which we will partition into n subintervals M_i = [b_{i-1}, b_i), with M_1 \equiv [b_0, b_1), M_2 \equiv [b_1, b_2), \ldots, M_n \equiv [b_{n-1}, b_n], and with b_0 = a, b_n = b. The random variable X maps each event E_i in the \sigma-field to a corresponding interval M_i on the real line according to a probability distribution. Pick an arbitrary point t_i in each subinterval M_i. Form a simple random variable X_s from X by assigning to each E_i the value X_s = t_i. Then the expectation of X_s is given by

E[X_s] = \sum_{i=1}^{n} t_i P(X_s = t_i) = \sum_{i=1}^{n} t_i P(X \in M_i)

But if we denote the probability distribution function of X by F_X, we have

P(X \in M_i) = F_X(b_i) - F_X(b_{i-1})

Therefore we can write

E[X_s] = \sum_i t_i (F_X(b_i) - F_X(b_{i-1})) \approx \int t dF_X(t)

Therefore the notation E[X] = \int t dF_X(t) means: the weighted sum of the t values, where the weights are the probabilities given by the difference in the values of the probability distribution function at the endpoints of the interval corresponding to t. But the approximations improve as the subdivisions become finer, so we can suppose E[X] to be given by

E[X] =  \int t dF_X(t) = \int t F_X(dt)

So the notation \int t F_X(dt) means: the weighted sum of the t values, where the weights are the probabilities given by the difference in the values of the probability distribution function at the endpoints of the differential interval dt.

In the case of absolutely continuous random variables, we have a density function f_X, so

P(X \in M_i) = f_X(t_i)(b_i - b_{i-1}) \equiv f_X(t_i) \triangle_i t

where \triangle_i t denotes the i-th interval length for t over which t can be regarded as being approximately constant. This implies

E[X_s] \approx \sum_i t_i f_X(t_i) \triangle_i t \approx \int t f_X(t) dt

As the subdivisions become finer, both of the above approximations improve. On the basis of this informal argument, we should then have

E[X] = \int t f_X(t) dt

Therefore we have two equivalent notations in the absolutely continuous case:

E[X] = \int t F_X(dt) = \int t f_X(t) dt

From random walk to Brownian motion with drift

Suppose we divide a time interval of length \Delta t into n = \frac{\Delta t}{\Delta_n t} steps. Let X be a random variable with initial value X_0. At the first step, the value of the random variable can go up by \Delta h or down by \Delta h with probability p and q = 1 - p respectively. (This is a Bernoulli trial with probability of success p). Then at the first step we have \Delta X_1 = X_1 - X_0 and the expected value and variance are

E[\Delta X_1] = p \Delta h - q \Delta h = (p - q) \Delta h

and

V[\Delta X_1] = E[(\Delta X_1)^2] - E^2[\Delta X_1]

= p(\Delta h)^2 + q(\Delta h)^2 - (p - q)^2 (\Delta h)^2

= (\Delta h)^2[p + q - (p - q)^2]

= 4pq(\Delta h)^2

At the second step we similarly have \Delta X_2 = X_2 - X_1 with the same expected value and variance. And so on. This step-by-step process is a discrete-time random walk. Now let

\Delta X_t = \Delta X_1 + \Delta X_2 + \cdots + \Delta X_n

be the position after n steps. Then \Delta X_t is a binomial random variable with expected value

E[\Delta X_t] = E[\Delta X_1 + \Delta X_2 + \cdots + \Delta X_n] = n(p - q)\Delta h = \Delta t (p - q) \frac{\Delta h}{\Delta_n t}

and variance

V[\Delta X_t] = V[\Delta X_1 + \Delta X_2 + \cdots + \Delta X_n] = n4pq(\Delta h)^2 = 4pq\Delta t \frac{(\Delta h)^2}{\Delta_n t}

We now want to pass to continuous time by letting n \rightarrow \infty or equivalently \Delta_n t \rightarrow 0, and we want to make the expected value and variance independent of \Delta h and \Delta_n t in this limit. For this purpose, we need to define

\Delta h = \sigma \sqrt{\Delta_n t}

p = \frac{1}{2} \bigg[1 + \frac{\alpha}{\sigma} \sqrt{\Delta_n t}\bigg]

q = \frac{1}{2} \bigg[1 - \frac{\alpha}{\sigma} \sqrt{\Delta_n t}\bigg]

where \alpha and \sigma are two parameters whose role will become clear shortly. Substituting these three into the expressions for E[\Delta X_t] and V[\Delta X_t] and passing to the limit \Delta_n t \rightarrow 0 we get the expected value

E[\Delta X_t] = \Delta t (p - q) \frac{\Delta h}{\Delta_n t} = \Delta t \bigg[\frac{\alpha}{\sigma} \sqrt{\Delta_n t} \bigg] \frac{\sigma \sqrt{\Delta_n t}}{\Delta_n t} = \alpha \Delta t

and variance

V[\Delta X_t] = 4pq\Delta t \frac{(\Delta h)^2}{\Delta_n t}

= 4 \times \frac{1}{4} \bigg[1 - \frac{\alpha^2}{\sigma^2} \Delta_n t \bigg] \Delta t \sigma^2 \frac{\Delta_n t}{\Delta_n t}

= \bigg[1 - \frac{\alpha^2}{\sigma^2} \Delta_n t \bigg] \Delta t \sigma^2

\rightarrow \sigma^2 \Delta t

Thus, in the limit as n \rightarrow \infty, both the mean and variance are independent of \Delta h and \Delta_n t. Note in particular that \Delta h must depend on the square root of \Delta_n t for this to be achieved. Were it not for this, the mean and variance would depend on \Delta h and on the number of steps n.

Now, as the number of steps n becomes large, the binomial random variable \Delta X_t becomes normally distributed, with

\Delta X_t \sim \mathcal{N}(\alpha \Delta t, \sigma^2 \Delta t)

Let \epsilon_t \sim \mathcal{N}(0, 1). Then we have

\frac{\Delta X_t - \alpha \Delta t}{\sigma \sqrt{\Delta t}} \overset{d}{=} \epsilon_t

or

\Delta X_t \overset{d}{=} \alpha \Delta t + \sigma \epsilon_t \sqrt{\Delta t}

Passing to differentials we then have the stochastic differential equation of Brownian motion with drift:

dx = \alpha dt + \sigma dz

where

dz = \epsilon_t \sqrt{dt}

is the (infinitesimal) increment of a Wiener process z(t) (also known as Brownian motion), \alpha is a drift parameter, and \sigma is a variance parameter. Note that in the increment dz of the Wiener process z(t) we have \epsilon_t \sim \mathcal{N}(0, 1) and \epsilon_t is serially uncorrelated, i.e., E[\epsilon_s \epsilon_t] = 0 for s \neq t. Thus, the values of dz for any two different time intervals are independent, so z(t) follows a Markov process with independent increments. We have E[dz] = 0 and V[dz] = E[dz^2] = dt E[\epsilon_t^2] = dt. Therefore the variance of the change in a Wiener process grows linearly with the time horizon.

Note that the Wiener process does not have a time derivative in the conventional sense because with

\Delta z = \epsilon_t \sqrt{\Delta t}

we have

\frac{\Delta z}{\Delta t} = \epsilon_t (\Delta t)^{-\frac{1}{2}}

and this becomes infinite when we try to pass to the limit \Delta t \rightarrow 0. This is why we need to use Itô calculus to deal with Brownian motion and related processes, rather than being able to use conventional calculus.

Brownian motion with drift is a simple generalisation of the Wiener process. Over any time interval \Delta t, the change in x, \Delta x, is normally distributed with E[\Delta x] = \alpha \Delta t and V[\Delta x] = \sigma^2 \Delta t. From the above development starting with a discrete-time random walk, we can now see why dx must depend on the square root of dt, not just on dt.