Thursday, 15 April 2021

Gaps Between Primes - Part 1/2

 We've seen:

  • the primes are not so thinly spread out that the sum of their reciprocals converges
  • the primes appear to have a density of $1/\ln(x)$ around $x$

Intuitively, this suggests that there must be a constraint on the gaps between primes. 



This post explores the question of gaps between primes.

A video accompanying this post is here: [youtube], and slides here [pdf].


Factorials

Let's remind ourselves of factorials. The factorial of a counting number $n$ is the product of all the whole numbers between 1 and that number, and is written $n!$.

$$ n! = n \cdot (n-1) \cdot (n-2) \cdot \ldots \cdot 2 \cdot 1 $$

For example, $5! = 5 \times 4 \times 3 \times 2 \times 1 = 120$.    

It is clear that factorials are not primes. In fact they are almost the antithesis of primes, because $n!$ contains every number below $n$ as a factor. The only exceptions are $2!=2$, $1!=1$ and $0!=1$ which are small enough not to have factors.


Prime Gaps

Let's consider a sequence of numbers from $5!+2$ to $5!+5$.


$$ 5! + 2 = (5 \cdot 4 \cdot 3 \cdot 1 +1)\cdot 2 $$

$$ 5! +3 = (5 \cdot 4 \cdot 2 \cdot 1 +1)\cdot 3 $$

$$ 5! +4 = (5 \cdot 3 \cdot 2 \cdot 1 +1)\cdot 4 $$

$$ 5! +5 = (4 \cdot 3\cdot 2 \cdot 1 +1)\cdot 5 $$


We can see all the numbers from $(5!+2)$ to $(5!+5)$ are not prime. We can generalise this to say that the entire sequence from $n!$ to $(n!+n)$ is not prime, a sequence of length $n-1$. 

Because $n$ can be as large as desired, we've just proved there is no upper limit on prime gaps.

The primes continue to surprise us. We started with the fact that the primes can't be too sparse, and yet we have proved there is no upper limit on the gaps between primes. Both facts can be true if large prime gaps are rare. 


Distribution of Prime Gaps

If we counted all the prime gaps in a moderately large range, we'd expect large gaps to be rare. It isn't obvious whether medium sized gaps would occur more or less often than the smallest gaps.

The following plot shows the logarithm of frequency of prime gaps in the range 1 to 500,000,000. We can see there is only one prime gap of size 1, that between $p=2$ and $p=3$.

Note: the convention for prime gaps sizes is $p_{n+1}-p_n$, so the gap between 2 and 3 is 1, not 0.



It is surprising to see the distribution so tightly constrained. The overall shape appears to be very linear, and there are clear regions above and below the line where no prime gap counts seem to occur. 


Twin Prime Conjecture

Looking again at the distribution, gaps of size 2 seem to occur more often than almost all other sizes. You may have noticed these twin primes pop up more often than expected along the number line:

$$(3,5),(5,7),(11,13),(17,19)\ldots(101,103),(107,109)\ldots$$

The Twin Prime Conjecture says that we'll keep finding twin primes along the number line, that they don't run out. This is yet another simple statement about primes that remains unproven.


Next Time..

In part 2 we'll see how simplified probabilistic models of the primes can help explain the prime gap distribution.


Tuesday, 6 April 2021

Chebyshev's Estimates For $\pi(x)$

Pafnuty Chebyshev proved Bertrand's Postulate that there is at least one prime between a number and its double, $n \leq p \leq 2n$, and came close to proving the Prime Number Theorem. 



In this post we'll recreate his estimates for the prime counting function $\pi(x)$ which came close to the actual PNT. The approach is inspired by a note by Franz Lemmermeyer [pdf].


A Bit Of History

We've previously discussed how Gauss and Legendre used experimental evidence to suggest that the prime counting function $\pi(x)$ is of the form $ \frac{x}{\ln(x)} $, and that this approximation gets better for larger $x$ - in the sense that the ratio between the true count and this approximation gets ever closer to 1.

$$  \pi(x) \sim \frac{x}{\ln(x)} $$

This is the Prime Number Theorem (PNT), which took another 100 years to be proved by Hadamard and de la Vallee-Poussin in 1896.  

Before that final proof, around 1850, Chebyshev got  close to proving the PNT by showing that $\pi(x)$ can be bounded as follows, where $c_1, c_2$ are constants $>0$.

$$ c_1\frac{x}{\ln(x)} < \pi(x) < c_2\frac{x}{\ln(x)} $$

This is a remarkable result, and in fact was strong enough to prove Bertrand's Postulate that there is a prime between a number and its double, $n < p < 2n$.


Starting With Combinations

We start with an observation about binomial coefficients:

$$ 2^{2n} = (1+1)^{2n} = {2n \choose 0}  +  \ldots + \ldots {2n \choose n} + \ldots + {2n \choose 2n}$$

Because ${2n \choose n}$ is only one of those terms we can say:

$$ \begin{align} 2^{2n} &> {2n \choose n} = \frac{(2n)!}{n!n!} \\ \\ &= \frac{(2n)\cdot(2n-1)\cdot \ldots \cdot (n+1)}{n\cdot(n-1)\cdot(n-2)\ldots \cdot 2 \cdot 1} \end{align}$$

Looking at that fraction, we can say that the numerator contains primes $n<p \leq 2n$, and these primes don't appear in the denominator. This means each of those primes divides the  whole fraction without remainder, and so does the product of those primes.

$$ \prod_{n<p \leq 2n} p \text{ divides } {2n \choose n} $$

How many primes are there $n<p \leq 2n$? The answer is of course $\pi(2n) - \pi(n)$.

If we raised $n$ to this number, it would be smaller than the actual product of those primes $ \prod_{n<p \leq 2n}$. This is because each of those primes is greater than $n$.

$$ n^{\pi(2n) - \pi(n)} < \prod_{n<p \leq 2n} p  \leq  {2n \choose n} < 2^{2n}$$

We can take logs and simplify:

$$ \boxed{ \pi(2n) - \pi(n) < 2\ln(2) \cdot \frac{n}{\ln(n)} } $$

We now have a relation relating $\pi(n)$ to $n$. Intuitively it hints at $\pi(n)$ being of the form $n/\ln(n)$.

To find a form for $\pi(n)$ not involving $\pi(2n)$ we use induction.


Upper Bound

Let's try the following form for $\pi(n)$:

$$ \pi(2^k) \leq 3 \cdot \frac{2^k}{k} $$

We can examine our inequality with $n=2^k$:

$$ \begin{align} \pi(2^{k+1}) & \leq \pi(2^k) + 2\ln(2)\cdot \frac{2^k}{\ln(2^k)} \\ &= \pi(2^k) + 2\cancel{\ln(2)}\cdot \frac{2^k}{k\cancel{\ln(2)}} \\ &= \pi(2^k) + \frac{2^{k+1}}{k} \end{align}$$

Our proposed form for $\pi(n)$ gives us

$$ \begin{align} \pi(2^{k+1}) & \leq  3 \cdot \frac{2^k}{k} + 2 \cdot \frac{2^{k}}{k} \\ &= 5\cdot\frac{2^k}{k} \\ \pi(2^{k+1}) &\leq 3\cdot\frac{2^{k+1}}{k+1} \end{align}$$

That last inequality appears to be only true for $k>5$ because $5\cdot\frac{2^k}{k} < 3\cdot\frac{2^{k+1}}{k+1}$ is only true for $k>5$. However, numerically checking $\pi(2^{k+1})$ for $k\leq5$ confirms it is true for all $k$.

We've shown that if $\pi(2^k) \leq 3 \cdot \frac{2^k}{k}$ is true for $k$ it is also true for $k+1$. We've also shown it is true for some $k$, for example, $k=1$ because $\pi(2^1) = 1 \leq 3\cdot\frac{2^1}{1} = 6$. This completes the induction proof for this form of $\pi(2^k)$.


Now, if we set $2^k < x \leq 2^{k+1}$ we can say

$$\begin{align} \pi(x) &\leq \pi(2^{k+1}) \\ &\leq 3\cdot\frac{2^{k+1}}{k+1} = 6\cdot\frac{2^{k}}{k+1} \\ &\leq 6\cdot\frac{2^{k}}{k} = 6\ln(2)\cdot\frac{2^k}{\ln(2^k)} \\ \pi(x) &\leq 6\ln(2)\cdot\frac{x}{\ln(x)}\end{align}$$

That least step requires $x \geq e$, but we can numerically verify that $\pi(x) \leq 6\ln(2)\frac{x}{\ln(x)}$ is true for $x<e$.

And so finally we have an upper bound for the prime counting function $\pi(x)$, 

$$\boxed{ \pi(x) \leq 6\ln(2)\frac{x}{\ln(x)} }$$

On its own, the upper bound isn't enough to tell us about the true form of $\pi(x)$. We need the lower bound to be of a similar form, and fairly tight too.


Lower Bound

Going back to the binomial coefficients

$$ 2^{2n} = (1+1)^{2n} = {2n \choose 0}  +  \ldots  + {2n \choose n} + \ldots + {2n \choose 2n}$$

we can see that ${2n \choose n}$ is the largest element, which means it must be $\geq$ than the average of these $2n$ terms (we have $2n+1$ terms but we can combine the first and last which each have a value of 1), so

$$ \frac{2^{2n}}{2n} \leq {2n \choose n}$$

Taking logs gives us

$$ 2n\ln2 - \ln(2n) \leq \ln {2n \choose n} $$

We now need to think about the prime factorisation of ${2n \choose n}$. To do this we need to take a short diversion.


We know that  ${2n \choose n}$ is $\frac{(2n)!}{n!n!}$, so we need to think about the prime factorisation of $n!$. Luckily this is not too difficult. If $v_p(n)$ is the highest power of a prime $p$ dividing $n$, then

$$ \boxed{v_p(n!) = \sum_m \left \lfloor \frac{n}{p^m} \right \rfloor }$$

This is because in the set of numbers $\{ 1,2,3,\ldots,n \}$ there are $\lfloor \frac{n}{p} \rfloor$ multiples of $p$, $\lfloor \frac{n}{p^2} \rfloor$ multiples of $p^2$, $\lfloor \frac{n}{p^3} \rfloor$ multiples of $p^3$, and so on, each contributing 1 to $v_p(n!)$.

Now, because ${2n \choose n}$ is $\frac{(2n)!}{n!n!}$, and because $v_p(a/b) = v_p(a)-v_p(b)$, we can say

$$ v_p{2n \choose n} = \sum_m \left( \left \lfloor \frac{2n}{p^m} \right \rfloor - 2\left \lfloor \frac{n}{p^m} \right \rfloor \right)$$

Let's focus on that form $\lfloor 2x \rfloor - 2 \lfloor x \rfloor$. It only takes values 0 or 1. This is easy to see if we write $x=a+b$ where $a$ is the integer part of $x$, and $b$ is the fraction.

$$ \begin{align} \lfloor 2x \rfloor - 2 \lfloor x \rfloor &= \lfloor 2a + 2b \rfloor - 2 \lfloor a + b \rfloor \\ \\ &= \cancel{2a} + \lfloor 2b \rfloor \cancel{- 2a} -2 \lfloor b \rfloor \\ \\ &=\begin{cases} 0 \text { for } b<\frac12 \\ 1 \text{ for } b>\frac12 \end{cases} \end{align}$$

The largest $m$ is $\lfloor \frac{\ln(2n)}{\ln p} \rfloor$ because a larger $m$ would give $p^m>2n$ which isn't possible. So $v_p$ reduces as follows:

$$ \begin{align} v_p{2n \choose n} &= \sum_{1\leq m \leq \lfloor \frac{\ln(2n)}{\ln p} \rfloor} \left( \left \lfloor \frac{2n}{p^m} \right \rfloor - 2\left \lfloor \frac{n}{p^m} \right \rfloor \right) \\ \\ &\leq \sum_{1\leq m \leq \lfloor \frac{\ln(2n)}{\ln p} \rfloor} 1 \\ \\ v_p{2n \choose n} &\leq  \left \lfloor \frac{\ln(2n)}{\ln p} \right \rfloor \end{align} $$

Knowing how ${2n \choose n}$ factorises into primes $p$ will be helpful.


We can now proceed:

$$\begin{align} 2n\ln2 - \ln(2n) &\leq \ln {2n \choose n} \\ \\ &\leq \sum_{p\leq 2n} \left \lfloor \frac{\ln(2n)}{\ln p}\right \rfloor \ln p  \\ \\ &\leq \sum_{p \leq 2n} \ln(2n)\\ \\  &= \ln(2n)\cdot \pi(2n)  \end{align}$$

Re-arranging gives us a lower bound for $\pi(2n)$:

$$ \pi(2n) \geq \ln(2) \cdot \frac{2n}{\ln(2n)} -1 $$

This is almost of the form we want $\pi(n) \geq c_1\cdot \frac{x}{\ln x}$. 


We can use induction again to prove that

$$ \pi(x) \geq \frac{\ln (2)}{2}\cdot \frac{x}{\ln x}$$

Let's start with $2n < x \leq 2n+2$, so

$$ \pi(x) \geq \pi(2n) \geq \ln(2) \cdot \frac{2n}{\ln(2n)} -1 $$

We're going to replace that $2n/\ln(2n)$ with an expression for $n$, and by assuming $x>16$:

$$ \frac{2n}{\ln(2n)} - \frac{n+1}{\ln(2n)} = \frac{n-1}{\ln(2n)} \geq \frac{7}{4\ln2} \geq \frac{1}{\ln2} $$

We can manually show that the desired expression $ \pi(x) \geq \frac{\ln 2}{2}\cdot \frac{x}{\ln x}$ is valid for $x \leq 16$

This gives us:

$$\begin{align} \pi(x) \geq \pi(2n) & \geq \ln(2) \cdot \frac{2n}{\ln(2n)} -1 \\ \\  &\geq \ln(2)\left(\frac{1}{\ln2} + \frac{n+1}{\ln(2n)}\right) -1 \\ \\ &= \cancel{\ln(2)} \left( \frac{\ln(2n) + (n+1)\ln(2)}{ \cancel{\ln(2)}\ln(2n)} \right) -1 \\ \\  &= 1 + \frac{(n+1)\ln(2)}{\ln(2n)} -1 \\ \\ &\geq \frac{(n+1)\ln(2)}{\ln(x)} \text{ because } x > 2n \\ \\ &\geq \frac{(x/2)\ln(2)}{\ln(x)} \text{ because } x \leq 2n+2 \\ \\ \end{align}$$

So we finally have 

$$ \boxed{ \pi(x) \geq \frac{\ln (2)}{2}\cdot \frac{x}{\ln x} }$$


The following illustrates these lower and upper bounds for $x\leq 500$.


Bertrand's Postulate

Can Chebyshev's estimates be used to prove Bertrand's Postulate that there is a prime $p$ between $n$ and $2n$? Let's find out. 

Bertrand's postulate is the same as saying $\pi(2x) - \pi(x) > 0$.

The smallest difference is when we take the lower bound for $\pi(2x)$, and the upper bound for $\pi(x)$:

$$ \begin{align} c_1\frac{2x}{\ln(2x)} - c_2\frac{x}{\ln(x)} &> 0 \\ \\ 2c_1 \cdot \ln(x) &> c_2 \cdot (\ln(x) + \ln (2)) \\ \\ (2c_1 - c_2)\cdot \ln(x) &> c_2 \cdot \ln(2) \end{align} $$

We can see that $(2c_1 - c_2)$ must be positive, and so 

$$ \boxed{ 2c_1 > c_2 }$$

In our case, the bounds $c_1 = \frac12 \ln(2)$ and $c_2 = 6\ln(2)$ aren't tight enough to prove Bertrand's Postulate. 

Chebyshev and others actually developed tighter bounds for $\pi(x)$ which do meet this requirement, $c_1=0.92$ and $c_2=1.11$, but those tighter bounds require much more work.


Thoughts

It is interesting that 

  • Chebyshev's estimates for the prime counting function $\pi(x)$ are derived from the binomial coefficients of the expansion $(1+1)^{2n}$, a rather surprising source of useful inequalities. 
  • the estimates are intimately linked to the prime factorisation of factorials. 
  • the trail, albeit with some algebra work, can lead to a form for $\pi(x)$ that is $x/\ln(x)$, a form which hints at the Prime Number Theorem.

Additional work by Chebyshev showed that $c_1$ and $c_2$ can be as close to 1 as desired for large enough $x$.