The role of the Legendre transform in large deviation theory

The Legendre transform is a mechanism for converting a relationship like

t(k) = \frac{d \lambda(k)}{dk} \qquad \qquad \qquad (1)

into a relationship like

k(t) = \frac{dJ(t)}{dt} \qquad \qquad \qquad (2)

In other words, by going from a function \lambda(k) to its Legendre transform J(t) and vice versa, the roles of t and k in (1) and (2) can be reversed.

In moving from (1) to (2), the required Legendre transform is

J(t) = t \cdot k(t) - \lambda(k(t)) \qquad \qquad \qquad (3)

because differentiating both sides of (3) with respect to t gives

\frac{dJ(t)}{dt} = k(t) + t \cdot \frac{d k(t)}{dt} - \frac{d \lambda(k(t))}{dk(t)} \cdot \frac{dk(t)}{dt} \qquad \qquad \qquad (4)

= k(t)

Using (1), we see that the last two terms in (4) cancel, so we get (2). Conversely, the Legendre transform of J(t) in (2) can be written

\lambda(k) = k \cdot t(k) - J(t(k)) \qquad \qquad \qquad (5)

because differentiating both sides with respect to k gives (1).

The Fenchel-Legendre transform in large deviation theory achieves this same kind of relationship between an entropy function J(\bar{T}) and the log of the Laplace transform of a corresponding random variable, which is denoted \lambda(k). The Fenchel-Legendre transform is given by

J(\bar{T}) = \sup_{k \in \mathbb{R}} (k \cdot \bar{T} - \lambda(k)) \qquad \qquad \qquad (6)

We take \lambda(k) to be the log of the Laplace transform of a random variable T, and then approximate the density of T at some far-flung value \bar{T} in the distribution of T as

f(\bar{T}) = \exp(-J(\bar{T})) \qquad \qquad \qquad (7)

Random variables for which we can use (7) as an approximation of the density at tail values \bar{T} in the distribution are said to satisfy the large deviation principle.

One way to understand how (6) arises is in terms of Chernoff’s inequality, which says

P(T \geq \bar{T}) \leq \exp(-J(\bar{T})) \qquad \qquad \qquad (8)

Chernoff’s inequality is a modified form of Markov’s inequality. Minimising the upper bound in this inequality requires J(\bar{T}) to be the supremum defined in (6) above, as I explained in a previous post.

Another way to understand where (6) comes from is through the use of the approximation in (7) to obtain \lambda(k) in terms of J(\bar{T}), followed by an inversion to obtain J(\bar{T}) in terms of \lambda(k) using the fact that \lambda(k) and J(\bar{T}) are Legendre transforms of each other. For a random variable T, the Laplace transform is

E[e^{kT}] = \int_{\mathbb{R}} e^{kT} f(T) dT \qquad \qquad \qquad (9)

(Note that, by the inherent symmetry, the right-hand side can equally be regarded as the Laplace transform of the density). We now approximate f(T)dT by

f(\bar{T})d \bar{T} = P(T \in [\bar{T}, \bar{T}+d\bar{T}]) \qquad \qquad \qquad (10)

with f(\bar{T}) as given in (7) above. Then (9) becomes

E[e^{kT}] \approx \int_{\mathbb{R}} e^{k \bar{T}} e^{-J(\bar{T})} d\bar{T} = \int_{\mathbb{R}} e^{k\bar{T} - J(\bar{T})}d \bar{T} \qquad \qquad \qquad (11)

We can approximate the integral in (11) using the saddle-point approximation, i.e., as the maximum value of the integrand, which gives

E[e^{kT}] \approx \exp(\sup_{\bar{T} \in \mathbb{R}} (k\bar{T} - J(\bar{T}))) \qquad \qquad \qquad (12)

Now taking the log of both sides of this we get

\lambda(k) = \sup_{\bar{T} \in \mathbb{R}} (k\bar{T} - J(\bar{T})) \qquad \qquad \qquad (13)

We can now get from (13) to (6) by invoking the Legendre transform relationship. To do this, we need to carefully understand the role that taking the supremum is playing in (13). For any given value of k, we are finding the value of \bar{T} that maximises the expression k\bar{T} - J(\bar{T}). This then makes the optimised value of \bar{T} a function of k. Thus we can write (13) as

\lambda(k) = k \cdot \bar{T}^{\ast}(k)  - J(\bar{T}^{\ast}(k)) \qquad \qquad \qquad (14)

where \bar{T}^{\ast} denotes the optimal value of \bar{T}. The supremum in (6) is playing a similar role, so we can likewise rewrite (6) as

J(\bar{T}) = \bar{T} \cdot k^{\ast}(\bar{T}) - \lambda(k^{\ast}(\bar{T})) \qquad \qquad \qquad (15)

where k^{\ast} denotes the optimal value of k. Comparing these with (3) and (5) above, we now immediately recognise (14) and (15) as Legendre transforms of each other whenever \lambda(k) is everywhere differentiable. Thus, we can invert (14) to obtain (15) using the Legendre transform relationship between them. We can then use (15) to calculate the entropy function for particular random variables, given their Laplace transforms.

Published by Dr Christian P. H. Salas

Mathematics Lecturer

Leave a comment