Prime partners exist: progress toward the twin prime problem

David W. Farmer

Number theorists view the integers as a largely unexplored landscape populated by interesting and friendly inhabitants. The prime numbers: 2, 3, 5, 7, 11, 13,...; occur like irregular mileposts stretching off to infinity. The squares: 1, 4, 9, 16, 25,...; and the cubes: 1, 8, 27, 64, 125,...; along with the higher powers are more sparse, but they also demand attention because of their special form.
These intriguing features exist because of the interaction between addition and multiplication. If we knew only about addition then the integers would be quite boring: start with 0, then repeatedly add or subtract 1 to get everything else. End of story.
Things become interesting when you add multiplication to the mix. To construct every integer from addition, all you need is 1. For multiplication, the prime numbers are the building blocks and there are infinitely many of them. Examining the list of prime numbers reveals both structure and apparent randomness. Fairly often (8 times before 100, and 35 times before 1000) primes occur as "twins," meaning that both $p$ and $p+2$ are prime. Examples are 101 and 103, or 8675309 and 8675311. Somewhat less often, a prime is of the form $n^2+1$, such as $17=4^2+1$ or $90001 = 300^2+1$.
It is an unsolved problem to determine whether there are infinitely many primes of the form $p+2$ or $n^2+1$, where we use $p$ to represent a prime number and $n$ to represent a positive integer (which may or may not be prime). Other special forms which also are unresolved are Mersenne primes ($2^p-1$) and Sophie Germain primes ($2p+1$). Such problems have been the subject of thousands of research papers over the past several centuries. And while none have been solved (nor are they likely to be solved in the near future) a large number of tools have been developed which have led to partial results. Recently Yitang Zhang of the University of New Hampshire has combined several of those tools to make impressive progress on the twin prime problem. Below we discuss some variations of these difficult problems and then describe Zhang's result.

1. Relaxing the conditions: measuring progress on difficult problems

How can one measure progress toward an unsolved problem? One view when dealing with primes of a special form is to make the form less restrictive. For example, instead of looking for primes of the form $n^2+1$, one could ask for primes of the form $n^2+a$, where $a$ is as small as possible. Or one could look for primes of the form $n^2+m^2$, or more ambitiously $n^2+m^4$.
These problems vary widely in difficulty. Fermat proved that there are infinitely may primes of the form $p=n^2+m^2$, in fact those are exactly the primes $p\equiv 1 \bmod 4$. In 1997 John Friedlander and Henryk Iwaniec proved that $n^2+m^4$ is prime infinitely often, using sieve techniques originally developed by Enrico Bombieri. A measure of the difficulty of that result is that only $X^{\frac34}$ of the numbers less than $X$ have that form. Shortly after, Roger Heath-Brown proved that the even thinner set $n^3+2m^3$ is prime infinitely often. One might have suspected that those results would be more difficult than the twin prime problem, because, by the prime number theorem, there are $X/\log(X)$ numbers of the form $p+2$ less than $X$, and the naive guess would be that having more candidates would make the problem easier.
One can generalize the twin prime problem to ask whether $p$ and $p+4$, or $p+6$, or $p+8$, etc., are prime. This is sometimes called the "prime neighbors" problem: show that there is some fixed number $2n$ so that $p$ and $p+2n$ are prime infinitely often. That is exactly the problem which Zhang recently solved. We describe some of the heuristics behind twin primes and related problems, and then discuss the techniques behind Zhang's proof.

2. Prime heuristics: the frequency to special primes

Why should we expect to have infinitely many twin primes?
The prime number theorem, proven in 1896 by Haramard and de la Valée-Poussin(sp), says that the number of primes up to $X$ is asymptotically $X/\log(X)$. Equivalently, the $n$th prime is asymptotic to $n\log(n)$. Here, and throughout number theory, "$\log$" means natural logarithm.
By the prime number theorem, among numbers of size $n$ the primes have density $1/\log(n)$. This suggests what is known as the "Cramér model" of the primes: the probability that the large number $n$ is prime is $1/\log(n)$, and distinct numbers have independent probabilities of being prime. Thus, the Cramér model says that the probability that $p$ and $p+2$ are both prime is $1/\log(p) \cdot 1/\log(p+2) \approx 1/\log^2(p)$, so up to $X$ the Cramér model predicts approximately $X/\log^2(X)$ twin primes. In particular, there should be infinitely many of them.
Of course that argument has to be wrong, because it predicts exactly the same answer for $p$ and $p+1$ being prime -- but that almost never happens because if $p$ is odd then $p+1$ is even. Hardy and Littlewood, using their famous circle method, developed precise conjectures for the number of twin primes as well as a wide variety of other interesting sets of primes. The result is that (in most cases) the Cramér model predicts the correct order of magnitude, but the constant is wrong. Using the twin prime case as an example, the idea is that the likelihood of $p+2$ being prime has to be adjusted based on the fact that $p$ is prime. If $p$ is odd then $p+2$ also is odd, so there is a $\frac12$ chance that $p$ and $p+2$ are both odd, which is twice as likely as the Cramér model would predict. Similarly, we need neither $p$ nor $p+2$ to be a multiple of 3, which happens $\frac13$ of the time -- exactly when $p\equiv 1\bmod 3$. The Cramér model would have that happen $(\frac{2}{3})^2$ of the time, so we must adjust the probability to make it slightly less likely that $p$ and $p+2$ are prime. Making a similar adjustment for the primes 5, 7, 11, etc., we arrive at the Hardy-Littlewood twin prime conjecture: $$ \#\{p < X \ :\ p \ \mathrm{ and } \ p+2 \ \text{ are prime } \} \sim 2 \prod_{p>2} \left( 1- \frac{2}{p}\right) \left(1- \frac{1}{p}\right)^{-2} \cdot \frac{X}{\log^2(X)} . $$ The twin primes have been tabulated up to $10^{16}$ and so far the Hardy-Littlewood prediction has been found to be very accurate.
Other prime neighbors, $p$ and $p+2n$, can actually occur more frequently than the twin primes. For example, if $p$ is a large prime then $p+6$ is not divisible by 2 or 3, so it is more likely than $p+2$ to be prime. Specifically, in the case of $p$ and $p+2$, we must have $p\equiv 2 \bmod 3$, while in the case of $p$ and $p+6$ we can have $p\equiv 1 \text{ or } 2 \bmod 3$. Thus, a prime gap of 6 is twice as likely as a prime gap of 2. The same argument shows that a prime gap of 30 is $\frac83$ times as likely as a prime gap of 2. The Hardy-Littlewood method also makes predictions for the frequency that $n^2+1$, or other polynomials, represent primes. All those conjectures are also supported by numerical observations.
Precise predictions supported by data are nice, but it is better to have an actual proof. The methods for proving results about primes of a special form involve the ancient idea of a prime sieve. The appendix has information about the archtypal sieve of Eratosthenes.

3. Sieving for primes

For twin primes, the first nontrivial result was due to Viggo Brun in 1915, using what is now known as the Brun sieve. Brun proved that there are not too many twin primes. In particular, Brun proved that $$ \sum_{p, p+2, \text{ prime}} \frac{1}{p} < \infty. $$ This is consistent with the Hardy-Littlewood conjecture because $\sum 1/n\log^2(n)$ converges. Since $\sum_p 1/p$ diverges, which was proven by Euler and also follows from the prime number theorem, we see that most prime numbers are not twin primes. This is a negative result, in terms of our goal of showing that there are infinitely many twin primes.
To state things positively, we wish to find a parameter $A$ such that for infinitely many $p$ there exists $a\le A$ with $p+a$ prime. By the prime number theorem, the average distance between a prime $p$ and the next prime is $\log(p)$, so we can take $A=\log(p)$. It is nontrivial to show $A=C\log(p)$ for some fixed number $C< 1$. The existence of such a number $C$ was first proven by Hardy and Littlewood in 1927, assuming the Generalized Riemann Hypothesis. An unconditional proof was given by Erdős in 1940, using the Brun sieve. Erdős' result did not provide a specific value for $C$; this was remedied by Ricci in 1954, who found $C\le \frac{15}{16}$. Bombieri and Davenport modified the Hardy-Littlewood method to make it unconditional (and better!), obtaining $C\le \frac12$. Various subsequent improvements in the years up to 1988 led to $C\le 0.2484...$.
The next big development, which is one of the main ingredients in Zhang's recent work, is due to Goldston, Pintz, and Yıldırım, in 2005. For the rest of this article we will refer to the method of those three authors as "GPY." It is GPY, combined with some techniques from the 1980's, which made Zhang's result possible. All the GPY papers are in the AIM preprint series, available at {\tt http://aimath.org/preprints.html}.
The GPY method uses a variant of the Selberg sieve to prove that $A=C \log(p)$ for any $A>0$, and later improved this to $$ \phantom{x}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ A=\sqrt{\mathstrut \log(p)} (\log\log(p))^2. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1) $$ The method also shows that if the primes have "level of distribution $\vartheta>\frac12$," the precise meaning of which is described below, then one could find an absolute bound for $A$, independent of $p$. The GPY method was the topic of the November, 2005, AIM workshop Gaps between primes. One highlight of the workshop was Soundararajan presenting a proof that $\vartheta=\frac12$ is the true obstruction for the GPY method to produce bounded gaps between primes. An intetresting sidelight is that the workshop was the first occasion that Goldston, Pintz, and Yıldırım were all in the same place at the same time.

4. Primes in progressions: a tool for optimizing the sieve

Key to the GPY method is knowledge about the distributions of primes in arithmetic progressions. If $a$ and $q$ are integers with no common factor, then Dirichlet proved that there are infinitely many primes $p\equiv a \bmod q$. For example, 4 and 15 have no common factor, so the sequence $$ 4, 19, 34, 49, 64, 79, ... $$ contains infinitely many primes. In 1899, de la Valée-Poussin went further, showing that all such progressions contain the expected number of primes. Here "expected" means that each of the progressions $\bmod\ q$ that can contain infinitely many primes, asymptotically contains the same number of primes. To continue our previous example of progressions $\bmod 15$, there are 8 possible choices for the number $a$: 1, 2, 4, 7, 8, 11, 13, and 14. Every large prime has one of those numbers as its remainder when divided by 15, and all 8 remainders occur equally often. The number of possible remainders $\bmod q$ is denoted by the Euler phi-function, $\varphi(q)$, so $\varphi(15)=8$. The general case is that if $a$ and $q$ have no common factors, then $$ \sum_{\ontop{p< x}{p\equiv a \bmod q}} \log(p) \ \ \ \ \ \ \ \text{ is asymptotic to } \ \ \ \ \ \ \ \frac{x}{\varphi(q)}. $$ de la Valée-Poussin actually did a bit better, providing a main term and an explicit error term.

5. Level of distribution: how well we can control the primes

A shortcoming of the above results is that they only hold for fixed $q$ as $x\to\infty$. For the GPY mythod, and the subsequent work of Zhang, one requires $q$ to grow with $x$, and allowing $q$ to grow faster gives better results. To describe the issue more precisely, we introduce the "error" term $$ E(x; q, a) = \left|\sum_{\ontop{p< x}{p\equiv a \bmod q}} \log(p) - \frac{x}{\varphi(q)}\right|. $$ For applications, we would like $E(x; q, a)$ to be small when $q=x^\alpha$ for $\alpha$ as large as possible. Unfortunately, such a bound is not known for any $\alpha>0$. The Generalized Riemann Hypothesis (GRH) implies the inequality $E(x; q, a)< (q x)^{\frac12+\varepsilon}$ as $x\to\infty$, for any $\varepsilon>0$. Thus, assuming GRH we have that $E(x; q, a)$ is small when $q=x^\alpha$ with $\alpha< \frac12$.
In practice it is sufficient to have $E(x; q, a)$ to be small on average. Specifically, one seeks the largest value of $\vartheta$ such that $$ \sum_{q< x^\vartheta}\max_{(a,q)=1} E(x; q, a) \ll \frac{x}{\log^A(x)} \ \ \ \ \ \ \ \ \ \ (2) $$ for any $A>0$. The consequence of GRH mentioned above implies this inequality for $\vartheta< \frac12$. Bombieri and Vinogradov, independently in the 1960's, established the inequality for $\vartheta< \frac12$ unconditionally. Note that this does not prove GRH, but in some sense it is like "GRH on average".
If equation (1) holds for all $\vartheta< \vartheta_0$ then we say "the primes have level of distribution $\vartheta_0$." Thus, the Bombieri-Vinogradov theorem says that the primes have level of distribution $\frac12$. This is what was required for the GPY bound for $A$ in equation (1). GPY also show that if the primes have level of distribution $\vartheta_0>\frac12$, then there is a bound for $A$ independent of $p$. Zhang did not quite prove that (2) holds for some $\vartheta_0>\frac12$, but he proved a similar-looking result, using techniques of Bombieri, Friedlander, and Iwaniec from the 1980's.

6. Modifying and combining

Zhang modified the GPY method by restricting the Selberg sieve to only use numbers having no large prime factors. Such number are called "smooth." Restricting to smooth numbers made the sieve less effective, but it turned out that the loss was quite small and the added flexibility was a great benefit. This flexibility enabled Zhang to use techniques developed by Bombieri, Friedlander, and Iwaniec (who we will refer to as BFI for the remainder of this article) who in a series of three papers in the late 1980's almost managed to go beyond level distribution $\frac12$. The "almost" refers to the fact that while they did go beyond $\frac12$ (in fact they went up to $\frac47$) the expressions they consider were not exactly the same as those in equation (2). Fortunately, the BFI methods apply in Zhang's case, and he was able to obtain a level of distribution $\frac12+\frac1{584}$. That is barely larger than $\frac12$, but in this game anything positive is a win.
The BFI results were well-known to the experts, and combining the GPY and BFI methods to obtain bounded gaps was an obvious thing to try in the 8 years since the GPY result. Zhang's insight was to modify the sieve in a way that allowed the use of the BFI techniques.

7. What bound?

Zhang's original paper gives the bound $A< 70,000,000$, a number which he made no effort to optimize, and which has since been improved significantly. To understand where that number comes from, we look a little more closely at how the GPY method use the Selberg sieve.
Some patterns cannot appear in the primes. For example, one of the three numbers $p, p+2, p+4$ has to be a multiple of 3, and so there cannot be infinitely many "triplets" of primes of that form. On the other hand, there is no obvious reason why the numbers $p, p+2, p+6$ cannot all be prime, so one conjectures that there are infinitely many triples of primes of that form. Patterns which allow the possibility of infinitely many primes are called admissible. All admissible tuples are conjectured to be prime infinitely often, and the Hardy-Littlewood circle method makes a precise conjecture for how often this should occur.
The GPY method uses the Selberg sieve to count how many prime factors occur in an admissible tuple. Suppose a tuple with 100 terms is shown to have at most 198 prime factors among the numbers in tuple. Then we could conclude that the tuple contains at least 2 prime numbers, because each number has at least one prime factor, and there are only 198 prime factors to spread among the 100 numbers. That is what Zhang proved, except that he showed that infinitely often an admissible 3,500,000-tuple has at most 6,999,998 prime factors.
So where does the number 70,000,000 come from? If we can find an admissible 3,500,000-tuple, $p, p+2, p+6, ..., p+N$, then we have shown that there exists infinitely many prime partners which differ by at most $N$. Zhang notes that there is an admissible 3,500,000-tuple with $N=70,000,000$, but that is not optimal. The best known value is $N=57,554,086$ -- but that is probably not best possible.
Since the appearance of Zhang's paper there have been numerous improvements, coming from larger values in the level distribution, better ways of estimating various error terms and finding better admissible tuples. As of this writing, the best confirmed improvement is a prime gap of at most 5,414, and a not-yet-confirmed improvement to 4,680; the latter improvement coming from a level distribution of $\frac12+\frac{7}{600}$. At the rate things are going, there will probably be a better result by the time you read this.



Appendix: The archetypal sieve

The familiar sieve of Eratosthenes is an efficient way to generate the prime numbers up to a pre-selected bound. Start with the numbers from 1 to $N$. First cross out the multiples of 2. At each stage circle the smallest number which has not been crossed out or circled, and then cross out all multiples of that number. Stop when the next number that remains is larger than $\sqrt{N}$. The numbers which have not been crossed out are the prime numbers up to $N$.
Using the sieve of Eratosthenes to find all the primes up to $N$ only requires keeping track of $N$ bits of information, and uses around $N \log(N)$ operations. To find the primes up to 1 million, it is faster to run the sieve of Eratosthenes on a computer than to read those primes from the hard drive.
The sieve of Eratosthenes is efficient as a practical method for generating the primes, but it is relatively useless as a theoretical tool. Suppose one wanted to estimate the number of primes up to $N$. The primes are those numbers which are not crossed out by the sieve, all we need to do is count how many numbers are crossed out, and subtract that from $N$. Since some numbers are crossed out multiple times, we have to use the inclusion-exclusion principle: \begin{align} \text{ number of non-primes up to } N =\mathstrut & \text{number crossed out once} \cr &-\text{number crossed out twice}\cr &+\text{number crossed out three times}\cr &- \text{etc.}\nonumber \end{align} Unfortunately, estimating the above quantities finds that the error term is larger than $N$, leaving one with no information at all. Thus, one must use a more sophisticated sieve to prove results about the primes.