Monday 3 May 2021

Gaps Between Primes Using Probabilistic Models - Part 2/2

In this post we continue to look at the gaps between primes, and introduce the idea of using probabilistic models for primes.



The video for this topic is here (youtube), and accompanying slides are here (pdf).

Probabilistic models of primes, despite being built on rather broad simplifications, can match numerical evidence fairly well. Predictions from some of these models can become worthy conjectures about the primes. 

Here we build a particularly simple probabilistic model, and use it to predict the distribution of prime gaps.


Probability of a Prime

The Prime Number Theorem tells us the density of primes around $x$ is approximately $1/\ln(x)$. It isn't a big leap to interpret this density as a probability. So the probability $n$ is prime is simply


$$\Pr(n\text{ prime})=\frac{1}{\ln(n)}$$


Prime Gaps

A prime gap at $n$ of size 4 is a sequence {prime, not prime, not prime, not prime, prime}. The probability of this sequence is


$$\frac{1}{\ln(n)}\cdot\left(1-\frac{1}{\ln(n+1)}\right)\cdot\left(1-\frac{1}{\ln(n+2)}\right)\cdot\left(1-\frac{1}{\ln(n+3)}\right)\cdot\frac{1}{\ln(n+4)}$$


For $n$ much larger than $a$, we can approximate $\ln(n+a)\approx\ln(n)$, simplifying the probability


$$\left(1-\frac{1}{\ln(n)}\right)^{3}\cdot\left(\frac{1}{\ln(n)}\right)^{2}$$


We can generalise this to prime gaps of size $k$ having a sequence {prime, k-1 not prime, prime}. 


$$\left(1-\frac{1}{\ln(n)}\right)^{k-1}\cdot\left(\frac{1}{\ln(n)}\right)^{2}$$


Prime Gap Counts

Now we know the probability of a gap of size $k$, we can say the expected number of gaps of that size amongst the first $N$ numbers is approximately the probability summed over all the $n$ up to $N$.


$$\sum_{n}^{N}\left(1-\frac{1}{\ln(n)}\right)^{k-1}\cdot\left(\frac{1}{\ln(n)}\right)^{2}$$


Using another approximation that for most n between 1 and $N$, $\ln(n)\approx\ln(N)$, this simplifies further.


$$N\cdot\left(1-\frac{1}{\ln(N)}\right)^{k-1}\cdot\left(\frac{1}{\ln(N)}\right)^{2}$$


We can take the logarithm of this count to give us a linear function of the gap size $k$.


$$(k-1)\cdot\ln\left(1-\frac{1}{\ln(N)}\right)+2\ln\left(\frac{1}{\ln(N)}\right)+\ln(N)$$


Plotting a graph of the logarithm of the counts for each prime gap $k$ should give us a straight line. The gradient will be negative because it is the logarithm of $\left(1-\frac{1}{\ln(N)}\right)$, which is always less than 1. 

The following plot shows this probabilistic model works surprisingly well given the simplifications we made. 



Improved Model

Our simple model has many imperfections, and a significant one is that it doesn't take into account that every other number in a given range, the even numbers, are never prime. 

If we want to assert that the probability of an odd number being prime is zero, but also preserve the density of primes being $1/\ln(x)$ in the neighbourhood of $x$, we can double the probability of the odd numbers being prime to $2/\ln(x)$. 

The probability of a prime gap of size $k$ now only includes odd numbers in the sequence {prime, $k/2-1$ not prime, prime }. Following the same steps as before, and noting that we're only applying the probability to half the numbers in the range 1 to $N$, we have a new estimate for the prime gap counts.


$$\frac{N}{2}\cdot\left(1-\frac{2}{\ln(N)}\right)^{\frac{1}{k}-1}\cdot\left(\frac{2}{\ln(N)}\right)^{2}$$


The following plot shows that for smaller gaps this model is indeed an improvement. Modelling the behaviour of larger gaps requires more sophisticated models.



What Have We Learned?

It is useful to compare any probabilistic model with a naive one. Here we can compare our probabilistic primes model to coin tosses. The following compares the predicted prime gaps amongst the first $5\times10^8$ numbers with expected contiguous chains of heads (or tails) amongst $5\times10^8$ throws.



A linear log-count of sequences isn't unique, in fact it is common for chains of independent events.What we can say is:

  • the correct order of magnitude for predicted prime gaps suggests the $1/\ln(x)$ probability is about right.
  • the primes appear fairly randomly as if they were independent events.

You can explore this graph yourself online (link).


Animated Prime Gaps

As a fun aside, the following animates the prime gaps as $N$ grows up to $5\times10^8$, together with the predicted gap counts from the above two models.



Further Reading

Harald Cramér, in the 1930s, developed his probabilistic model for primes and used it to develop conjectures about the primes. His 1936 paper is considered to have kicked off the field: