Python simulation of a threshold phenomenon in a general epidemic

For a lecture on deterministic and stochastic general epidemic models, I ran a Python simulation illustrating a threshold phenomenon which determines how an epidemic evolves. I want to record the results of this here.

In a general epidemic model, we assume a closed community of n + 1 individuals and classify those individuals into three groups, those who are susceptible to disease, those who are infective, and those who are removed (i.e., who have had the disease and are no longer infective). The numbers of each type at time t are denoted by X(t), Y(t) and Z(t) respectively. Note that since Z(t) = n + 1 - (X(t) + Y(t)), the time evolution of Z(t) is automatically determined by the evolution of X(t) and Y(t). At any given time t, only two types of transition are possible. The first type of transition is SUSCEPTIBLE \rightarrow INFECTIVE in which an infective meets a susceptible and the disease is passed on, causing the number of susceptibles to decrease by one and the number of infectives to increase by one. The second type of transition is INFECTIVE \rightarrow REMOVED in which an infective recovers from the disease, so the total number of infectives decreases by one and the total number removed increases by one.

Assuming that each member of the community comes into contact with the remaining n others according to a Poisson process with rate \beta, that there is homogenous mixing so that the probability of one member meeting any other particular member is 1/n, and that a susceptible who becomes infected with the disease only remains infective for a time drawn from an exponential distribution with rate \gamma, then the general epidemic as a stochastic process can be characterised by the following two probability statements corresponding to the two types of transitions outlined above:

P(X(t + \delta t) = x - 1, Y(t + \delta t) = y + 1 | X(t) = x, Y(t) = y)

= \frac{\beta x y}{n} \delta t + o(\delta t) \quad \quad \quad \quad (1)

P(X(t + \delta t) = x, Y(t + \delta t) = y - 1 | X(t) = x, Y(t) = y)

= \gamma y \delta t + o(\delta t) \quad \quad \quad \quad (2)

The first probability statement says that infectives meet susceptibles according to a Poisson process with rate \beta x y/n, while the second probability statement says that infectives recover and become removed according to a Poisson process with rate \gamma y, where x and y denote realised numbers of susceptibles and infectives at time t respectively.

To obtain a deterministic approximation of the above stochastic general epidemic model, we set up ordinary differential equations for the time evolution of x, y and z based on the probability statements (1) and (2). From (1), we see that susceptibles x decrease at the rate \beta x y/n and infectives y correspondingly increase at the same rate, while from (2) we see that infectives also decrease at the rate \gamma y and the removed z correspondingly increase at the same rate. We thus get the following system of three coupled ordinary differential equations for the deterministic general epidemic model:

\frac{dx}{dt} = -\frac{\beta x y}{n} \quad \quad \quad \quad (3)

\frac{dy}{dt} = \frac{\beta x y}{n} - \gamma y \quad \quad \quad \quad (4)

\frac{dz}{dt} = \gamma y \quad \quad \quad \quad (5)

These cannot generally be solved to get explicit expressions for x, y and z as functions of time, but can be solved numerically to get time paths for these variables, as shown below using Python. It is also straightforward to obtain a differential equation for y in terms of x by dividing (4) by (3) to get

\frac{dy}{dx} = \frac{dy/dt}{dx/dt} = \frac{\frac{\beta x y}{n} - \gamma y}{-\frac{\beta x y}{n}} = \frac{\rho - x}{x} \quad \quad \quad \quad (6)

where

\rho = \frac{n \gamma}{\beta} \quad \quad \quad \quad (7)

is a key parameter known as the epidemic parameter. The differential equation (6) can easily be solved given initial conditions x(0) = x_0, y(0) = y_0, z(0) = z_0, to get an equation for y in terms of x:

y - y_0 = (x_0 - x) - \rho \ln \bigg( \frac{x_0}{x}\bigg) \quad \quad \quad \quad (8)

This equation describes the evolution of the epidemic in a `timeless’ way in the (x, y )-plane and predicts that the epidemic parameter \rho plays a crucial role. It acts as a threshold parameter in the following sense: if x_0 \leq \rho, the number of infectives never rises above the initial number y_0 and in fact declines steadily from there until it reaches y = 0 at which point the epidemic ends (since there are no more infectives to pass on the disease), whereas if x_0 > \rho, the number of infectives can rise significantly above y_0 until it reaches a maximum value before beginning to decline from there until the epidemic ends again when y = 0. Public health measures therefore aim to reduce the number of susceptibles x_0 (e.g., by immunisation) and to increase the value of the epidemic parameter \rho by increasing the removal rate \gamma and decreasing the contact rate \beta (e.g., by encouraging self-isolation) to try to achieve a scenario in which x_0 < \rho.

To supplement these results, I wanted to use Python to get time paths for x, y and z as direct numerical solutions to the coupled differential equations (3), (4) and (5), ideally also demonstrating the threshold phenomenon with these time paths. The following diagram summarises the deterministic general epidemic model to be implemented in Python:

I used the odeint() function from the scipy.integrate library to solve the differential equation system numerically. This requires the definition of a Python function which takes as inputs \mathbf{A} = (x, y, z), t, \beta, \gamma and n, and returns the vector derivative d \mathbf{A}/dt = (dx/dt, dy/t, dz/dt).

First, I ran a simulation for the case x_0 > \rho. I solved the model equations for

  • \gamma = 1/10~\mathrm{ days^{-1}}
  • n=1.0 \times 10^6
  • \beta = 0.5~\mathrm{ days^{-1}}
  • (x_0, y_0, z_0) = (0.8 \times n, 0.2 \times n, 0)

Note that the corresponding epidemic parameter is \rho = n \gamma/\beta = 200000.

The results show the classic epidemic pattern for x_0 > \rho, with the number of infectives rising significantly above y_0 before declining to y = 0. Note that almost all the susceptibles ended up being infected in this scenario.

Next, I ran a simulation for the case x_0 < \rho. Suppose that public health measures have managed to increase the value of the epidemic parameter \rho by increasing the removal rate \gamma and reducing the contact rate \beta. I solved the model equations for

  • \gamma = 1/3~\mathrm{ days^{-1}}
  • n=1.0 \times 10^6
  • \beta = 0.4~\mathrm{ days^{-1}}
  • (x_0, y_0, z_0) = (0.8 \times n, 0.2 \times n,0)

Note that the corresponding epidemic parameter is now \rho = n \gamma / \beta = 833333.333....

The results now show the typical results for x_0 < \rho, with the number of infectives never rising above the initial y_0, and declining steadily to y = 0. Note that almost half of the initial number of susceptibles remained unaffected by the disease in this scenario.

On the equilibration of M/M/1 queues

The M/M/1 queue is the simplest Markov queueing model, and yet it is already sufficiently complicated to make it impractical in an introductory queueing theory lecture to derive time-dependent probability distributions for random variables of interest such as queue size and queueing time. A simpler approach is to assume the queue eventually reaches a steady state, and then use the forward Kolmogorov equations under this assumption to obtain equilibrium expected values of relevant random variables.

For the purposes of a lecture on this topic, I wanted to run some simulations of M/M/1 queues using Python to get a feel for how quickly or otherwise they actually reach their equilibrium. I was expecting equilibration after a few tens or hundreds of customers had passed through the queue. To my surprise, however, I ended up doing simulations involving a million customers and it typically took a few hundred thousand before the queue converged to theoretically predicted equilibrium values of things like mean queue size and mean queueing time. I want to record the relevant calculations and Python simulations here.

In M/M/1 queues, arrivals of customers occur according to a Poisson process with rate \lambda, there is a single server, and departures occur according to a Poisson process with rate \epsilon (so, the service time for each customer is exponentially distributed, also with rate \epsilon). Suppose the queue contains x customers at time t + \delta t where \delta t is a small time increment. Given the axioms of a Poisson process, there are only three possibilities for the number of customers at time t:

There could have been x + 1 customers at time t and one departed, or there could have been x - 1 customers at time t and one arrived, or there could have been x customers at time t and there were no departures or arrivals. By the axioms of a Poisson process, these three possibilities have probabilities

p_{x + 1, x}(t, t + \delta t) = \epsilon \delta t + o(\delta t)

p_{x - 1, x}(t, t + \delta t) = \lambda \delta t + o(\delta t)

p_{x, x}(t, t + \delta t) = 1 - (\lambda + \epsilon) \delta t + o(\delta t)

for the cases x = 1, 2, 3, \dots. In the case x = 0, we have

p_{- 1, 0}(t, t + \delta t) = 0

p_{0, 0}(t, t + \delta t) = 1 - \lambda \delta t + o(\delta t)

since a departure is impossible when there are no customers in the queue.

The M/M/1 queue is a continuous-time Markov process for which Chapman-Kolmogorov equations can be obtained by applying the Theorem of Total Probability. The Chapman-Kolmogorov equations say that

p_{i, x}(0, t + \delta t) = \sum_k p_{i, k}(0, t) p_{k, x}(t, t + \delta t)

where i is the number of customers at time 0 and the summation is over all relevant states k in the state space. The only relevant states k here are x + 1, x - 1, and x as outlined above, so the Chapman-Kolmogorov equations for the current situation are

p_{i, x}(0, t + \delta t) = p_{i, x}(0, t) p_{x, x}(t, t + \delta t)

+ p_{i, x - 1}(0, t) p_{x - 1, x}(t, t + \delta t) + p_{i, x + 1}(0, t) p_{x + 1, x}(t, t + \delta t)

= p_{i, x}(0, t) (1 - (\lambda + \epsilon) \delta t) + p_{i, x - 1}(0, t) \lambda \delta t + p_{i, x + 1}(0, t) \epsilon \delta t

Rearranging, we get

\frac{p_{i, x}(0, t + \delta t) - p_{i, x}(0, t)}{\delta t}

= \lambda p_{i, x - 1}(0, t) + \epsilon p_{i, x + 1}(0, t) - (\lambda + \epsilon) p_{i, x}(0, t) + \frac{o(\delta t)}{\delta t}

Letting \delta t \rightarrow 0, we get the Kolmogorov forward equations for the M/M/1 queue:

\frac{d}{dt} p_{i, x}(0, t) = \lambda p_{i, x - 1}(0, t) + \epsilon p_{i, x + 1}(0, t) - (\lambda + \epsilon) p_{i, x}(0, t)

for the cases x = 1, 2, 3, \dots, and

\frac{d}{dt} p_{i, 0}(0, t) = \epsilon p_{i, 1}(0, t) - \lambda p_{i, 0}(0, t)

for the case x = 0.

These equations can be solved to get a time-dependent probability distribution for the queue size, but the procedure is far from straightforward and the time-dependent probability distribution itself is very complicated. What can be done far more easily is to assume that the process reaches a steady state after many customers have arrived and departed, and then calculate the expected values of random variables of interest such as the expected queue size and the expected queueing time. This is what we will do here, before confirming the calculations using Python simulations.

When the process has reached a steady state, the probabilities p_{i, x}(0, t) are no longer changing over time, so we have \frac{d}{dt} p_{i, x}(0, t) = 0. Setting the left-hand sides of the Kolmogorov forward equations equal to zero and solving the equations recursively for x = 0 and then x = 1, 2, 3, \ldots, we get

p_{i, x} = \big(\frac{\lambda}{\epsilon} \big)^x \cdot p_{i, 0}

for x = 0, 1, 2, \ldots, where \lambda/\epsilon is called the traffic intensity. For the equilibrium distribution to exist, these probabilities must sum to 1, which requires \lambda/\epsilon < 1. In this case, we have

\sum_x \big(\frac{\lambda}{\epsilon} \big)^x \cdot p_{i, 0} = \frac{p_{i, 0}}{1 - \big(\frac{\lambda}{\epsilon} \big)} = 1

so,

p_{i, 0} = 1 - \big(\frac{\lambda}{\epsilon} \big)

Substituting this into the equation for p_{i, x} above we get

p_{i, x} = \big(1 -  \big(\frac{\lambda}{\epsilon} \big) \big) \big(\frac{\lambda}{\epsilon} \big)^x

which is the probability mass function of a geometric distribution starting at 0 with parameter \lambda/\epsilon. The mean queue size in the steady state of this process is the mean of this distribution, which is

\frac{\big(\frac{\lambda}{\epsilon} \big)}{1 - \big(\frac{\lambda}{\epsilon} \big)}

It can also be shown fairly straightforwardly that the mean queueing time for the M/M/1 queue in equilibrium is the mean of an exponential distribution with parameter \epsilon - \lambda, so the mean queueing time is

\frac{1}{\epsilon - \lambda}

To simulate this M/M/1 queue in Python, we begin by defining Python functions for the exponential service times and Poisson arrivals:

Next, we compute departure times for customers in an M/M/1 queue given the arrival times and service times generated by the previous functions. Each customer is either served upon arrival if the server is free, or has to wait for the previous customer to depart before being served.

We can also compute queue sizes, given the arrival times and departure times of customers. To clarify the algorithm, suppose we want to know the queue size when customer 5 arrives. We set counter = 0 to initialise it and we set j = 4 to look at customer 4’s departure time. If the arrival time of customer 5 is before the departure time of customer 4, then we know there was at least one person in the queue so we set counter = 1 and change j = 3 to look at customer 3’s departure time. We continue with this process until the arrival time of customer 5 is after the departure time of a previous customer, or until we reach j = 0 at which point we will have counter = 4 if the arrival time of customer 5 is before the departure time of customer 0. We will then set j = -1 and the counting process will stop.

We can now run a simulation. The results below are for a simulation with one million customers, and parameter values \lambda = 18 and \epsilon = 20. The predicted equilibrium mean queueing time is 1/(20 - 18) = 0.5 time units, and the predicted equilibrium mean queue size is (18/20)/(1 - (18/20)) = 9.

We now explore how long it takes for the M/M/1 queue to settle down to its steady state. We look at the convergence of both the queueing time means and the queue size means to their theoretical equilibrium values.

The simulation with one million customers eventually settled down to the predicted theoretical values for mean queue size and mean queueing time, but this took a surprisingly long time, around 400 thousand customers.

Why we can omit the modulus symbol in ln(|x|) in the integrating factor method

During a lecture on the solution of standard ODEs, a student asked why we use \ln(x) as the solution of the integral \int \frac{1}{x} dx in the exponent of an integrating factor rather than the formally correct antiderivative \ln(|x|) + c. In particular, the student was concerned about the omission of the modulus symbol around the x. The explanation is not an unsubtle one and is often overlooked or treated lightly when this approach is used, so I thought I would record the discussion here.

The first-order ODE under discussion was of the simple form

\frac{dy}{dx} - \frac{7}{x} y = 2x^2 \quad \quad \quad \quad (1)

This can easily be solved using the integrating factor method, with integrating factor

e^{-\int \frac{7}{x} dx} = e^{-7 \ln(x)} = \frac{1}{x^7} \quad \quad \quad \quad (2)

We multiply BOTH sides of the original ODE by this integrating factor to obtain

\frac{1}{x^7} \frac{dy}{dx} - \frac{7}{x^8} y = \frac{2}{x^5} \quad \quad \quad \quad (3)

and then observe that the left-hand side can be written as the derivative of a product:

\frac{d}{dx} \big( \frac{1}{x^7} y \big) = \frac{2}{x^5} \quad \quad \quad \quad (4)

This is now easy to integrate, and upon integrating both sides with respect to x we obtain

\frac{1}{x^7} y = -\frac{1}{2} x^{-4} + c \quad \quad \quad \quad (5)

which gives the required explicit general solution for y:

y = -\frac{1}{2} x^3 + c x^7 \quad \quad \quad \quad (6)

The question that arose was why we did not use -7\ln(|x|) + c for the integral in (2) above. Is this not the correct antiderivative solution for -\int \frac{7}{x}dx?

It is true that when one is finding the general antiderivative of -\frac{7}{x} one must give the answer as -7\ln(|x|) + c. However, it is NOT necessary to do this when the integral of -\frac{7}{x} appears as the exponent of the integrating factor in a problem like (1). It is also not necessary to include an arbitrary constant in the integral in the exponent of the integrating factor, for the same reason.

This is because, as exemplified above, the integrating factor is something that one multiplies BOTH sides of the differential equation by to enable the left-hand side to be expressed as the derivative of a product. As we saw above, it is then easy to obtain a solution by integrating both sides. Since the integrating factor multiplies both sides of the differential equation, it would make no difference to the final solution if we were to multiply the integrating factor by a constant. Any constant which multiplies the integrating factor would simply cancel out when using the integrating factor method to solve the differential equation.

If we used \exp(-7 \ln(|x|)) = \frac{1}{|x|^7} as the integrating factor in (2), we would obtain the same solution to the differential equation as in (6), irrespective of whether x > 0 or x < 0, because when x < 0 we have

\frac{1}{|x|^7} = \frac{1}{(-x)^7} = (-1) \frac{1}{x^7}

This is just a constant times the integrating factor obtained in (2), so as described above the constant will cancel out and it will be AS IF we had used the integrating factor in (2) in the first place.

That’s why we just ignore the modulus symbol (and the arbitrary constant) when using the integrating factor method in cases like (1). We don’t need to think of the integral in the exponent of the integrating factor in the same way as we would think of the general antiderivative, because this doesn’t affect the final solution of the differential equation.

Using Prüfer sequences to count and classify irreducible labelled 10-node trees

In the cult classic movie Good Will Hunting (Miramax, 1997), a maths-genius janitor played by a young Matt Damon secretly solves challenge problems in graph theory written on a blackboard by an MIT maths professor for his students.

One of these problems, which Will can be seen solving in the above screenshot from the movie, is the following:

Draw all the homeomorphically irreducible trees with n=10.

Homeomorphically irreducible trees are those which do not contain any vertices of degree 2 and which are non-isomorphic. There are ten such 10-node trees, as follows:

There are many articles and videos online which discuss how these ten trees can be deduced in various ways. However, two questions which have always bothered me since I first saw the movie do not seem to have been discussed anywhere. While preparing an introductory lecture on trees in graph theory, I decided to try to answer these questions once and for all and am recording the results in the present post. (I also presented these results in my lecture!)

By Cayley’s famous formula n^{n-2} for the number of labelled trees with n vertices, there are 100 million labelled 10-node trees. I always wondered how many of these 100 million labelled 10-node trees are irreducible, and how many of the irreducible labelled trees belong to each of the ten homeomorphically irreducible types in the Good Will Hunting problem? Below, I first use combinatorial arguments to calculate the number of irreducible labelled 10-node trees and then confirm this figure using Python. I also use Python to classify the irreducible labelled trees into the ten Good Will Hunting isomorphism classes.

Combinatorial calculations

To calculate the total number of irreducible labelled 10-node trees, we begin by observing that the Prüfer sequences of these trees will all be 8 digits long and such that no number occurs with a frequency of 1 (the degree of a vertex in a tree is one more than the frequency of occurrence of the vertex label in the tree’s Prüfer sequence, so the Prüfer sequences of trees which do not contain vertices of degree 2 cannot contain vertex labels with a frequency of 1). To illustrate this, I randomly assigned labels to the ten non-isomorphic trees from the Good Will Hunting problem above and calculated their Prüfer sequences as follows:

As expected, all the vertex labels in the Prüfer sequences above appear with frequencies of at least 2. Therefore, the combinatorial problem we need to solve is the following:

How many ways are there to form a sequence of 8 numbers chosen from the ten numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, in such a way that each number that appears in the sequence appears at least twice?

We will need to use a standard result from combinatorics which says that if there are n objects, with r_1 of type 1, r_2 of type 2, . . . , and r_m of type m , where r_1 + r_2 + \cdots + r_m = n , then the number of arrangements of these objects, denoted P(n; r_1, r_2, \ldots, r_m), is

P(n; r_1, r_2, \ldots, r_m) = \frac{n!}{r_1! r_2! \cdots r_m!}   \qquad \qquad \qquad \qquad \qquad (1)

To apply this result, we need to identify all the possible patterns of 8-digit arrangements satisfying the above conditions, then apply the formula to each possible pattern to find out how many ways there are to arrange the 8 digits in that pattern. Adding up all the possible ways of arranging the 8 digits in each pattern will then give the total number of irreducible labelled 10-node trees. There are actually only seven cases that need to be considered. Using letters to stand for numbers, only the following seven patterns are possible:

Case 1: (a, a, a, a, a, a, a, a)

Here, only a single number appears in the Prüfer sequence. There are 10 options for choosing which number occurs 8 times in this sequence, so there are 10 ways to arrange a Prüfer sequence in which one number occurs 8 times.

Case 2: (a, a, a, a, a, a, b, b)

Here, only two numbers appear in the Prüfer sequence, one of them 6 times and the other 2 times. There are 10 options for choosing which number occurs 6 times and then 9 options for choosing which number occurs 2 times. Using formula (1) above, there are P(8; 6, 2) = \frac{8!}{6!2!} = 28 ways to arrange 6 of one number and 2 of the other. Thus, there are \frac{10 \times 9 \times 8!}{6!2!} = \frac{10!}{6!2!} = 2520 ways to arrange a Prüfer sequence in which one number occurs 6 times and another occurs twice.

Case 3: (a, a, a, a, a, b, b, b)

Here, only two numbers appear in the Prüfer sequence, one of them 5 times and the other 3 times. There are 10 options for choosing which number occurs 5 times and then 9 options for choosing which number occurs 3 times. Using formula (1) above, there are P(8; 5, 3) = \frac{8!}{5!3!} = 56 ways to arrange 5 of one number and 3 of the other. Thus, there are \frac{10 \times 9 \times 8!}{5!3!} = \frac{10!}{5!3!} = 5040 ways to arrange a Prüfer sequence in which one number occurs 5 times and another occurs 3 times.

Case 4: (a, a, a, a, b, b, b, b)

Here, only two numbers appear in the Prüfer sequence, each of them 4 times. There are 10 options for choosing a first number which occurs 4 times and then 9 options for choosing a second number which occurs 4 times. When considering the possible arrangements, we note that the numbers can be interchanged without affecting each given arrangement because they occur with the same frequency, so there are two equivalent versions of each arrangement. Thus, using formula (1) above, there are \frac{1}{2} P(8; 4, 4) = \frac{1}{2}\frac{8!}{4!4!} = 35 ways to arrange 4 of one number and 4 of the other. Thus, there are \frac{10 \times 9 \times 8!}{2 \times 4!4!} = \frac{10!}{2 \times 4!4!} = 3150 ways to arrange a Prüfer sequence in which one number occurs 4 times and another occurs 4 times.

Case 5: (a, a, b, b, c, c, c, c)

Here, three different numbers appear in the Prüfer sequence, one of them 4 times and the other two twice. There are 10 options for choosing a first number which occurs 4 times, then 9 options for choosing a second number which occurs 2 times, then 8 options for choosing a third number which occurs 2 times. When considering the possible arrangements, we note that the two numbers which occur twice can be interchanged without affecting each given arrangement because they occur with the same frequency, so there are two equivalent versions of each arrangement. Thus, using formula (1) above, there are \frac{1}{2} P(8; 4, 2, 2) = \frac{1}{2}\frac{8!}{2!2!4!} = 35 ways to arrange 4 of one number and 2 of each of the other two numbers. Thus, there are \frac{10 \times 9 \times 8 \times 8!}{2 \times 2!2!4!} = 4 \times \frac{10!}{2!2!4!} = 151200 ways to arrange a Prüfer sequence in which one number occurs 4 times and two other numbers occur 2 times each.

Case 6: (a, a, a, b, b, b, c, c)

Here, three different numbers appear in the Prüfer sequence, one of them 2 times and the other two 3 times. There are 10 options for choosing a first number which occurs 2 times, then 9 options for choosing a second number which occurs 3 times, then 8 options for choosing a third number which occurs 3 times. When considering the possible arrangements, we note that the two numbers which occur 3 times can be interchanged without affecting each given arrangement because they occur with the same frequency, so there are two equivalent versions of each arrangement. Thus, using formula (1) above, there are \frac{1}{2} P(8; 2, 3, 3) = \frac{1}{2}\frac{8!}{2!3!3!} = 280 ways to arrange 2 of one number and 3 of each of the other two numbers. Thus, there are \frac{10 \times 9 \times 8 \times 8!}{2 \times 2!3!3!} = 4 \times \frac{10!}{2!3!3!} = 201600 ways to arrange a Prüfer sequence in which one number occurs 2 times and two other numbers occur 3 times each.

Case 7: (a, a, b, b, c, c, d, d)

Finally, four different numbers appear in this case, each of them 2 times. There are 10 options for choosing a first number which occurs 2 times, then 9 options for choosing a second, then 8 options for choosing a third, then 7 options for choosing the fourth. When considering the possible arrangements, we note that the four numbers which occur 2 times can be interchanged without affecting each given arrangement because they occur with the same frequency, so there are 4 \times 3 \times 2 \times 1 = 24 equivalent versions of each arrangement. Thus, using formula (1) above, there are \frac{1}{24} P(8; 2, 2, 2, 2) = \frac{1}{24}\frac{8!}{2!2!2!2!} = 105 ways to arrange 2 each of four numbers. Thus, there are \frac{10 \times 9 \times 8 \times 7 \times 8!}{24 \times 2!2!2!2!} = \frac{7}{3} \times \frac{10!}{2!2!2!2!} = 529200 ways to arrange a Prüfer sequence in which four different numbers occur 2 times each.

Adding up the totals for the seven cases above, we find

10 + 2520 + 5040 + 3150 + 151200 + 201600 + 529200 = 892720

Thus, according to the combinatorial calculation approach above, there are 892720 irreducible labelled 10-node trees. This will now be confirmed computationally using Python, and the 892720 trees will also be classified into the ten Good Will Hunting isomorphism types.

Python calculations

To confirm the number of irreducible labelled 10-node trees, I wrote the following Python code to pick from the 100 million labelled 10-node trees those whose Prüfer sequences do not contain vertex labels with a frequency of 1. As can be seen from the output, this took nearly 4 hours (on a laptop with 16.0GB of RAM) and confirmed the number 892720 obtained via the combinatorial calculations above.

I then wrote the following Python code to classify the 892720 irreducible trees into the ten Good Will Hunting isomorphism types. This code uses the Prüfer sequences I calculated for the randomly labelled ten isomorphism types above as templates against which each of the 892720 trees can be checked. Each tree is then classified into one of the ten types depending on which of the ten Prüfer sequences it is isomorphic to.

As can be seen from the output here, this classification process took 22 minutes. The results showing the classification of the 892720 trees into the ten Good Will Hunting isomorphism types are shown below, together with a check that the ten totals do indeed account for all the 892720 irreducible trees.

A max-flow/min-cut network problem solved both manually and with Python

For a lecture on digraphs and network flows, I prepared the following capacitated directed network problem in order to explore its solution both manually via a maximum-flow/minimum-cut algorithm and computationally using the NetworkX library in Python:

The number on each arc represents the flow capacity of that arc and the vertices S and T are the source and the sink for the flow respectively (all arcs containing S are directed away from S and all arcs containing T are directed towards T).

The max-flow/min-cut algorithm involves a labelling procedure which, starting from an initial flow (e.g., a zero flow) enables augmentation of flows along some arcs as well as reduction of flows along others. The idea is to increase the flow from S to T as much as possible subject to the capacity constraints on the arcs by using the algorithm to keep finding flow-augmenting routes through the network. The process terminates when all flow-augmenting routes have been found and a maximum flow value for the network thus becomes known. The algorithm will also result in some of the arcs in the network becoming saturated, i.e., filled to capacity, and a set of arcs containing some of these saturated arcs will constitute a minimum cut for the network. Such a minimum cut is a set of arcs with the smallest possible sum of capacities such that removal of all the arcs in the set divides the network into two disjoint parts X and Y, the part X containing the source S and the part Y containing the sink T. A theorem known as the max-flow min-cut theorem states that the maximum flow value must equal the capacity value of the minimum cut for a basic network such as the one in the diagram above. Finding a minimum cut using the saturated arcs at the end of the algorithm thus provides a useful way to check that the final flow is indeed a maximal one.

Below, I will solve the above problem both manually and using the NetworkX library in Python. For the manual solution, I will use a labelling procedure to record movement from one vertex V to the next vertex W with the notation W(V, \varphi ), where \varphi is the size of flow that is possible along the arc subject to the capacity of the arc and also subject to the possible flow at the previous step. Where there is a choice of arc when moving from one vertex to the next, we choose alphabetically. When moving backwards along an arc, this is indicated with a minus sign so that, for example, V(W-, \varphi) indicates that we moved from W to V along an arc that was actually directed from V to W.

The manual solution is a little burdensome mainly because it requires making repeated copies of the network showing the adjustments at each step of the algorithm. Below I will render these copies by hand. At each step, each arc is labelled with two numbers separated by a comma, the first number being the flow along the arc and the second being the capacity of the arc. I will highlight that an arc is saturated by thickening the line representing the arc. We assume a zero initial flow, represented as follows in a manual copy of the above network:

Starting from vertex S, we choose alphabetically the arc SA. The labels are: A(S, 31) \ C(A, 13) \ D(C, 10) \ F(D, 6) \ T(F, 6). The path is SACDFT. The flow increase is 6. The saturated arc is DF. Making these amendments, the network looks as follows after this first run:

For the next run, we begin again with arc SA. The labels are: A(S, 25) \ C(A, 7) \ D(C, 4) \ T(D, 4). The path is SACDT. The flow increase is 4. The saturated arc is CD. The network looks as follows after this second run:

We begin again with arc SA. The labels are: A(S, 21) \ C(A, 3) \ E(C, 3) \ F(E, 3) \ T(F, 3). The path is SACEFT. The flow increase is 3. The saturated arc is AC. The network looks as follows after this third run:

We begin again with arc SA. The labels are: A(S, 18) \ D(A, 13) \ T(D, 8). The path is SADT. The flow increase is 8. The saturated arc is DT. The network looks as follows after this fourth run:

We begin again with arc SA. The labels are: A(S, 10) \ D(A, 5) \ C(D-, 5) \ E(C, 5) \ F(E, 5) \ T(F, 5). The path is SADCEFT. The flow increase is 5. The saturated arc is AD. The network looks as follows after this fifth run:

Proceeding alphabetically, we now begin with arc SB. The labels are: B(S, 15) \ C(B, 10) \ E(C, 3) \ F(E, 3) \ T(F, 3). The path is SBCEFT. The flow increase is 3. The saturated arcs are CE and EF. The network looks as follows after this sixth run:

We begin again with arc SB. The labels are: B(S, 12) \ C(B, 7) \ F(C, 7) \ T(F, 7). The path is SBCFT. The flow increase is 7. The saturated arc is BC. The network looks as follows after this seventh run:

We begin again with arc SB. The labels are: B(S, 5) \ E(B, 5) \ T(E, 5). The path is SBET. The flow increase is 5. The saturated arcs are SB and BE. The network looks as follows after this eighth run:

Proceeding alphabetically, we now begin with arc SC. The labels are: C(S, 17) \ F(C, 15) \ T(F, 13). The path is SCFT. The flow increase is 13. The saturated arc is FT. The network looks as follows after this ninth run:

We begin again with arc SC. The labels are: C(S, 4) \ F(C, 2) \ E(F-, 2) \ T(E, 2). The path is SCFET. The flow increase is 2. The saturated arc is CF. The network looks as follows after this tenth run:

The algorithm is now complete. The diagram below shows the maximum flow and a minimum cut for this network:

The maximum flow has the value 26 + 15 + 15 = 56. The minimum cut is the set of arcs (DT, DF, CF, CE, BE) which has capacity 12 + 6 + 22 + 11 + 5 = 56.

Having solved the problem manually, it is interesting to now compare this solution with a computational solution produced using the library NetworkX in Python. I did the coding for this problem in a Jupyter Notebook as follows:

Using the Python library NetworkX to find all Hamiltonian cycles in a graph

For the purposes of a lecture in graph theory, I created the following example of a Hamiltonian graph consisting of the complete graphs K_3, K_4 and K_5 joined pairwise by an edge:

The Python library NetworkX can be used to perform many of the calculations that arise in graph theory and networks and I used it in this case to list and count all the Hamiltonian cycles in the above graph. In the present post, I want to quickly record the Python code I used for this (in a Jupyter Notebook):

Boltzmann distribution, Gibbs-Shannon entropy and Helmholtz free energy

Consider a so-called canonical ensemble consisting of a system A, a heat bath R, and the total closed system T containing A and R, with corresponding energies U_A, U_R and U_T respectively so that

U_R = U_T - U_A \quad \quad \quad \quad (1)

with U_T fixed. For example, A could represent a single 1-D lattice of spins in the Ising Model, R could consist of a heat bath with which A is in thermal contact, and T would be the closed system containing A and R.

With regard to the 1-D lattice of spins, different spin configurations will have different energies and for a given energy U_A there will be a multiplicity of W_A(U_A) spin configurations with that same energy (i.e., if U_A is the macrostate, there will be W_A(U_A) different microstates yielding that same macrostate). Similarly, if the energy U_R is the macrostate of the heat bath, there will be W_R(U_R) different microstates yielding that same macrostate. Since U_T is fixed, we see that the total number of microstates for the combined system, W_T, is a function of U_A:

W_T(U_A) = W_A(U_A) \times W_R(U_T - U_A) \quad \quad \quad \quad (2)

Suppose we pick one particular spin configuration of the 1-D lattice of spins with associated energy E_i. Then U_A = E_i and W_A(E_i) = 1 because we have picked a particular microstate for system A. We will now obtain from this setup the probability of the combined system being in this state. The total number of microstates for the combined system is now

W_T(E_i) = 1 \times W_R(U_T - E_i) \quad \quad \quad \quad (3)

The entropy of the combined system using Boltzmann’s entropy equation is

S_R = k_B \ln(W_R) \quad \quad \quad \quad (4)

where k_B is the Boltzmann constant.

To obtain an expression for the temperature of the heat bath, we can use the first law of thermodynamics which in the absence of any work done by the system reduces to

dU = dQ \quad \quad \quad \quad (5)

where dU is the internal energy of the system and dQ is energy added via heat. Using the differential form of the definition of entropy in thermodynamics, namely

dS = \frac{dQ}{T} \quad \quad \quad \quad (6)

we get from (5) that

\frac{1}{T} = \frac{dS}{dU} \quad \quad \quad \quad (7)

Using (7) with (4) we then have

\frac{1}{k_B T} = \frac{d \ln(W_R)}{dU_R} \quad \quad \quad \quad (8)

Integrating gives

W_R = \gamma e^{U_R/(k_B T)} \quad \quad \quad \quad (9)

where \gamma is an integration constant. Therefore, we have

W_T(E_i) = W_R(U_T - E_i) = \gamma e^{(U_T - E_i)/(k_B T)} \quad \quad \quad \quad (10)

This calculation is for one particular energy level, U_A = E_i, of the 1-D lattice of spins. Summing over all possible energy levels, the total number of microstates of the combined system is

\sum_j W_T(E_j) = \gamma e^{U_T/(k_B T)} \sum_j e^{-E_j/(k_B T)} \quad \quad \quad \quad (11)

The probability of the combined system being in state E_i is then obtained by dividing (10) by (11):

P(E_i) = \frac{e^{-E_i/(k_B T)}}{\sum_j e^{-E_j/(k_B T)}} \quad \quad \quad \quad (12)

This is the Boltzmann probability distribution, with

Z = \sum_j e^{-E_j/(k_B T)} \quad \quad \quad \quad (13)

being the partition function. Note that all terms involving the heat bath end up dropping out. The only relevance of the heat bath is to define the temperature of the system. Everything else about it is irrelevant.

The 1-D lattice of spins does not have constant energy when it is in contact with a heat bath. The energy of the system fluctuates with probabilities governed by the Boltzmann distribution. We can obtain a formula for the average entropy of the 1-D lattice of spins in the canonical ensemble, called the Gibbs-Shannon entropy, in terms of Boltzmann probabilities. Imagine taking many measurements of the energy of the 1-D lattice of spins. Interpreting P(E_i) as a relative frequency, the multiplicity of microstates giving energy E_i is

W_i = \frac{1}{P(E_i)} \quad \quad \quad \quad (14)

The entropy of the configuration giving rise to energy E_i is then

S(E_i) = k_B \ln(W_i) = -k_B \ln(P(E_i)) \quad \quad \quad \quad (15)

The Gibbs-Shannon average entropy is obtained from (15) as

\langle S \rangle = \sum_j P(E_i) S(E_i) = -k_B \sum_i P(E_i) \ln(P(E_i)) \quad \quad \quad \quad (16)

Note that (16) simplifies to Boltzmann’s formula for the entropy when all the probabilities are equal, say

P(E_i) = \frac{1}{W}

for all i, since then we have

-k_B \sum_i P(E_i) \ln(P(E_i)) = -k_B \sum_i \frac{1}{W} \ln(\frac{1}{W}) = k_B \ln(W)

Finally, we can obtain the Helmholtz free energy by taking logarithms of the Boltzmann probability in (12) to get

\ln(P(E_i)) = -\frac{E_i}{k_B T} - \ln(Z) \quad \quad \quad \quad (17)

Substituting this into the Gibbs-Shannon entropy formula in (16) we get

\langle S \rangle = k_B \sum_i P(E_i) \bigg(\frac{E_i}{k_B T} + \ln(Z)\bigg) = \frac{\langle E \rangle}{T} + k_B \ln(Z)

which can be rewritten as

\langle E \rangle - T \langle S \rangle = -k_B \ln(Z) \quad \quad \quad \quad (18)

The quantity \langle E \rangle - T \langle S \rangle is the average value of the Helmholtz free energy, F, which is a function of state, i.e., it is a function only of macroscopic thermodynamic variables. The equation

F = -k_B \ln(Z) \quad \quad \quad \quad (19)

linking the free energy to the partition function is therefore of vital importance since it relates the large-scale properties of the system to its microscopic energy states. The partition function acts like a bridge between these two regimes.

Zero-field singularity of magnetic susceptibility in a 4-D Ising model

For the purposes of a lecture on simulating the Ising model of ferromagnetism using the Metropolis-Hastings algorithm, I explored the behaviour of magnetic susceptibility on a four-dimensional hypercube lattice. In particular, I wanted to test a well-known prediction of theoretical physics that a zero-field singularity should appear at a certain critical temperature. The idea is that there is an essential singularity in the mathematics at zero external magnetic field which disappears with a non-zero magnetic field. See the discussion on physics Stack Exchange about this. I was interested to see if a computer simulation of magnetic susceptibility in an Ising model on a four-dimensional hypercube lattice would produce similar results as the second diagram in this Stack Exchange query, which I reproduce here:

The results do suggest that, in the nearest-neighbour ferromagnetic Ising model on a 4-D hypercube lattice, there is a cusp-like singularity in magnetic susceptibility at a critical temperature around T = 7. This singularity becomes more and more apparent in the plots as the external magnetic field strength is reduced to zero but tends to disappear as the field strength increases. I want to record this experiment and the results in the present post.

I considered a 4-D ferromagnetic Ising model on a hypercubic lattice, \mathcal{L}={a \mathbf{i}+ b \mathbf{j} + c \mathbf{k} + d \mathbf{l}} with a, b, c, d = 1, 2, \ldots, L. I assumed periodic boundary conditions and the presence of an external magnetic field with parameter B. The nearest-neighbour Hamiltonian for a particular configuration \mathbf{s} of the N = L^4 spins on the four-dimensional hypercube is

E(\mathbf{s}, J, B) = -J \sum_{\mathbf{d} \in \mathcal{L}} s_{\mathbf{d}}(s_{\mathbf{d}+ \mathbf{i}}+s_{\mathbf{d}+ \mathbf{j}} + s_{\mathbf{d}+ \mathbf{k}}+s_{\mathbf{d}+ \mathbf{l}}) - B \sum_{\mathbf{d} \in \mathcal{L}} s_{\mathbf{d}} \quad \quad \quad \quad (1)

where s_{\mathbf{d}} is the spin at site \mathbf{d} on the lattice, J > 0 is the nearest-neighbour interaction energy and B > 0 is the external magnetic field parameter. The partition function assuming thermal equilibrium at a temperature T is

Z = \sum_{i=1}^{2^N} \exp \big(-\frac{E_i}{k_B T}\big) \quad \quad \quad \quad (2)

where k_B is Boltzmann’s constant and E_i is the Hamiltonian for the i-th spin configuration \mathbf{s}_i. For the computer simulations I used units in which J = 1 and k_B = 1.

I used Monte Carlo simulations in the form of the well-known single-flip Metropolis-Hastings algorithm to produce plots of the magnetic susceptibility, \chi, as a function of the temperature in the range T = 0.5 to T = 15.0, for a lattice of size N = L \times L \times L \times L with L = 5, L = 10 and L = 15. With units in which J = 1 and k_B = 1, the magnetic susceptibility is calculated as

\chi =\frac{1}{T N}\big(\langle S^2 \rangle - \langle S \rangle^2\big) \quad \quad \quad \quad (3)

where

S=\sum_{\mathbf{d} \in \mathcal{L}} s_{\mathbf{d}} \quad \quad \quad \quad (4)

is the total magnetisation given a particular spin configuration \mathbf{s} on the 4-D lattice.

I initially chose a range of B-values rising from 0.05 to 0.8 in multiples of two. I obtained the following results for L = 5, L = 10 and L = 15, starting with a completely ordered configuration in which all the spins in the lattice were in the +1 state:

The results show a clear peak in magnetic susceptibility becoming more pronounced as B is reduced towards zero, with the critical temperature being around T = 7. Notice also a rapid increase in the CPU time with L, the CPU time being around one minute for L = 5, twelve minutes for L = 10, and over two hours for L = 15. The plot did not change much between L = 10 and L = 15 so I used L = 10 for the remaining simulations.

To confirm the zero-field singularity at the critical temperature T = 7, I reduced B even further. I obtained the following plots for B-values going down to B = 0.0001953125.

It is clear that the peak becomes arbitrarily large as B \rightarrow 0 at a critical temperature of around T = 7, indicating that there is indeed a singularity there.

Fourier transform of a function that is scaled and translated in either order

Consider a function f(x) with Fourier transform \tilde{f}(k) where

\tilde{f}(k) = \frac{1}{\sqrt{2 \pi}}\int_{-\infty}^{\infty} f(x) e^{-ikx } dx \quad \quad \quad \quad (1)

Then it is straightforward to show that the Fourier transform of the scaled function f(x/\alpha) is

\frac{1}{\sqrt{2 \pi}}\int_{-\infty}^{\infty} f(x/\alpha) e^{-ikx } dx = |\alpha| \tilde{f}(\alpha k)

and the Fourier transform of the x-translated function f(x - c) is

\frac{1}{\sqrt{2 \pi}}\int_{-\infty}^{\infty} f(x - c) e^{-ikx } dx = e^{-ikc} \tilde{f}(k)

What is a little less straightforward is to deduce from these what the Fourier transform must be of a function that has been both scaled and translated in the form

f\big(\frac{x - c}{\alpha}\big) \quad \quad \quad \quad (2)

In fact, the Fourier transform of this particular type of scaled and translated function is a simple combination of the two individual adjustments as if they had occurred independently of each other:

\frac{1}{\sqrt{2 \pi}}\int_{-\infty}^{\infty} f\big(\frac{x - c}{\alpha}\big) e^{-ikx } dx = e^{-ikc} |\alpha| \tilde{f}(\alpha k) \quad \quad \quad \quad (3)

We should be able to deduce this by performing the calculations for the scaling and translation sequentially, and we should get the same answer irrespective of the order in which the transformations are performed. However, there are subtleties involved and these are what I am interested in recording in the present note.

Consider performing the scaling first and then the translation. Let g(x) = f(x/\alpha) so that \tilde{g}(k) = |\alpha| \tilde{f}(\alpha k) . Then the combined transformation in (2) is obtained as g(x - c) which has Fourier transform

e^{-ikc} \tilde{g}(k) = e^{-ikc} |\alpha| \tilde{f}(\alpha k)

This agrees with (3). However, when going the other way and performing the translation first before the scaling, it is necessary to remember that the translation in (2) is by an amount c/\alpha . Therefore, in this case, we have to let g(x) = f \big(x - \frac{c}{\alpha} \big) so that \tilde{g}(k) = e^{-ik(c/\alpha)} \tilde{f}(k). Then the combined transformation in (2) is obtained as g(x/\alpha) = f \big(\frac{x}{\alpha} - \frac{c}{\alpha} \big) which has Fourier transform

|\alpha| \tilde{g}(\alpha k) = e^{-i(\alpha k)(c/\alpha)} |\alpha| \tilde{f}(\alpha k) = e^{-ikc} |\alpha| \tilde{f}(\alpha k)

This again agrees with (3), but this way round we see that there has had to be a process of cancellation of the \alpha parameter in the complex exponential term. This complication is avoided by performing the scaling first before the translation, making this particular ordering of the transformations a little easier to work with in the case of the combined transformation in (2).

Ornstein-Uhlenbeck process: derivation and simulation of its pdf and moments

The Ornstein-Uhlenbeck process is widely used in the stochastic modelling of mean-reverting processes. In the present note, I want to record a derivation I produced for a lecture of the pdf and moments of an Ornstein-Uhlenbeck process exhibiting mean-reversion to zero with a stochastic differential equation (SDE) of the form

dx = -\gamma x dt + \sigma dW \quad \quad \quad \quad (1)

where \gamma and \sigma are constants and W(t) is a standard Wiener process. I used the Euler–Maruyama method with a time step of dt = 0.01 in Python to generate and plot a sample path of this process from t = 0 up to t = 10, setting the initial position as x(0) = 0.6 and the values \gamma = 0.5 and \sigma = 0.3. The following picture shows this sample path:

The mean-reversion to zero is clear from the plot and arises due to the -\gamma parameter in the drift term of the SDE, which tends to pull the process back down when x(t) goes above zero and back up when x(t) goes below zero.

The Fokker-Planck equation of the SDE in (1) is easily obtained as

\frac{\partial \rho(x, t)}{\partial t} = \gamma \frac{\partial}{\partial x} [x \rho(x, t)] + \frac{\sigma^2}{2}\frac{\partial^2 \rho(x, t)}{\partial x^2} \quad \quad \quad (2)

and this can be used to derive a stationary pdf for the Ornstein-Uhlenbeck process corresponding to a long-term equilibrium situation in which the pdf is not time-varying, so \partial \rho/\partial t = 0. From (2) we then get

0 = \gamma \frac{\partial}{\partial x} [x \rho(x, t)] + \frac{\sigma^2}{2}\frac{\partial^2 \rho(x, t)}{\partial x^2}

= \frac{d}{d x} \bigg[\gamma x \rho(x) + \frac{\sigma^2}{2} \frac{d \rho(x)}{dx} \bigg] \quad \quad \quad \quad (3)

Equation (3) tells us that the sum of the two terms in the square brackets must be equal to a constant. Setting this constant equal to zero gives us

\frac{\sigma^2}{2} \frac{d \rho(x)}{dx} = - \gamma x \rho(x)

and so

\frac{d \rho(x)/dx}{\rho} = -\frac{2 \gamma x}{\sigma^2}

from which we deduce the form of the stationary pdf to be

\rho(x) = N \exp\big(-\frac{\gamma x^2}{\sigma^2}\big)

We can find an expression for N from the normalisation condition

N \int_{-\infty}^{\infty} \exp\big(-\frac{\gamma x^2}{\sigma^2}\big) dx = 1 \quad \quad \quad \quad (4)

Making the change of variable y = \sqrt{\gamma} x/ \sigma we can rewrite (4) as

N \frac{\sigma}{\sqrt{\gamma}} \int_{-\infty}^{\infty} \exp(-y^2) = 1

from which we get

N = \frac{\sqrt{\gamma}}{\sigma I_0} \quad \quad \quad \quad (5)

where

I_0 = \int_{-\infty}^{\infty} \exp(-y^2)dy \quad \quad \quad \quad (6)

The moments of the Ornstein-Uhlenbeck process are obtained as

\langle x^n \rangle = \int_{-\infty}^{\infty} x^n \rho(x) dx = N \int_{-\infty}^{\infty} x^n \exp\big(-\frac{\gamma x^2}{\sigma^2}\big) dx \quad \quad \quad \quad (7)

Making the change of variable y = (\sqrt{\gamma}/\sigma)x, we have x^n = (\sigma/\sqrt{\gamma})^n y^n and dx = (\sigma /\sqrt{\gamma})dy. Using these substitutions together with the expression for N in (5) above we get

\langle x^n \rangle = \frac{\sqrt{\gamma}}{\sigma I_0} \int_{-\infty}^{\infty} \frac{\sigma^{n+1}}{(\sqrt{\gamma})^{n+1}}y^n \exp(-y^2)dy = \frac{\sigma^n}{(\sqrt{\gamma})^n} \frac{I_n}{I_0} \quad \quad \quad \quad (8)

where

I_n = \int_{-\infty}^{\infty} y^n \exp(-y^2)dy \quad \quad \quad \quad (9)

From (8), we see that the first two moments of the Ornstein-Uhlenbeck process are

\langle x \rangle =  \frac{\sigma}{\sqrt{\gamma}} \frac{I_1}{I_0} \quad \quad \quad \quad (10)

and

\langle x^2 \rangle =  \frac{\sigma^2}{\gamma} \frac{I_2}{I_0} \quad \quad \quad \quad (11)

These can be computed directly in Python using code such as the following (quad is from the scipy.integrate library):

This is useful to know for more complicated stochastic processes where the integrals I_0, I_1 and I_2 might be difficult to evaluate manually, but in the case of the Ornstein-Uhlenbeck process these three integrals have simple exact values. For I_0 in (6), we use the well-known trick of solving the double integral

\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \exp(-(x^2+y^2))dxdy \quad \quad \quad \quad (12)

Changing to polar coordinates x = r \cos(\theta) and x = r \sin(\theta), with limits 0 to \infty for r and 0 to 2 \pi for \theta, and the Cartesian area element dxdy becoming the polar coordinates area element r dr d\theta, the double integral becomes

\int_0^{\infty} \int_0^{2\pi} \exp(-r^2) r dr d\theta = \pi \quad \quad \quad \quad (13)

Since (12) is the square of I_0, we conclude I_0 = \sqrt{\pi}. It is easily shown that I_1 = 0, and we can calculate I_2 by integrating I_0 by parts:

I_0 = \int_{-\infty}^{\infty} 1 \cdot \exp(-y^2) dy

= [y \exp(-y^2)]_{-\infty}^{\infty} - \int_{-\infty}^{\infty} y(-2y \exp(-y^2))dy

= 2\int_{-\infty}^{\infty} y^2 \exp(-y^2) dy

Thus,

I_2 =  \int_{-\infty}^{\infty} y^2 \exp(-y^2) dy = \frac{1}{2} I_0 = \frac{\sqrt{\pi}}{2}

We can therefore write the first two moments of the Ornstein-Uhlenbeck process with the SDE in (1) above in exact form as

\langle x \rangle = 0 \quad \quad \quad \quad (14)

and

\langle x^2 \rangle = \frac{\sigma^2}{2\gamma} \quad \quad \quad \quad (15)

To check these calculations, I used Python to compute 5000 sample paths of the Ornstein-Uhlenbeck process x(t) with the same specifications as the plot above, and plotted the mean \langle x \rangle and mean squared distance \langle x^2 \rangle as functions of time together with the corresponding exact values. The plots below show that for sufficiently long times the numerical calculations do indeed fluctuate around the values predicted on the basis of the stationary pdf derived above:

I also calculated the histogram of the probability density of the random variable x(t) evaluated at t = 10 using the 5000 simulated sample paths, and plotted it together with the stationary pdf, \rho(x), derived above from the Fokker-Planck equation. The plot below shows good agreement between the two: