The Legendre transform is a mechanism for converting a relationship like
into a relationship like
In other words, by going from a function to its Legendre transform
and vice versa, the roles of
and
in (1) and (2) can be reversed.
In moving from (1) to (2), the required Legendre transform is
because differentiating both sides of (3) with respect to gives
Using (1), we see that the last two terms in (4) cancel, so we get (2). Conversely, the Legendre transform of in (2) can be written
because differentiating both sides with respect to gives (1).
The Fenchel-Legendre transform in large deviation theory achieves this same kind of relationship between an entropy function and the log of the Laplace transform of a corresponding random variable, which is denoted
. The Fenchel-Legendre transform is given by
We take to be the log of the Laplace transform of a random variable
, and then approximate the density of
at some far-flung value
in the distribution of
as
Random variables for which we can use (7) as an approximation of the density at tail values in the distribution are said to satisfy the large deviation principle.
One way to understand how (6) arises is in terms of Chernoff’s inequality, which says
Chernoff’s inequality is a modified form of Markov’s inequality. Minimising the upper bound in this inequality requires to be the supremum defined in (6) above, as I explained in a previous post.
Another way to understand where (6) comes from is through the use of the approximation in (7) to obtain in terms of
, followed by an inversion to obtain
in terms of
using the fact that
and
are Legendre transforms of each other. For a random variable
, the Laplace transform is
(Note that, by the inherent symmetry, the right-hand side can equally be regarded as the Laplace transform of the density). We now approximate by
with as given in (7) above. Then (9) becomes
We can approximate the integral in (11) using the saddle-point approximation, i.e., as the maximum value of the integrand, which gives
Now taking the log of both sides of this we get
We can now get from (13) to (6) by invoking the Legendre transform relationship. To do this, we need to carefully understand the role that taking the supremum is playing in (13). For any given value of , we are finding the value of
that maximises the expression
. This then makes the optimised value of
a function of
. Thus we can write (13) as
where denotes the optimal value of
. The supremum in (6) is playing a similar role, so we can likewise rewrite (6) as
where denotes the optimal value of
. Comparing these with (3) and (5) above, we now immediately recognise (14) and (15) as Legendre transforms of each other whenever
is everywhere differentiable. Thus, we can invert (14) to obtain (15) using the Legendre transform relationship between them. We can then use (15) to calculate the entropy function for particular random variables, given their Laplace transforms.
