Introduction to multiplying huge integers

Main story: A Trillion Triangles


Most people know that addition of whole numbers (integers) is much easier than multiplication. Asked for the sum of 34 and 62, some people may even be able to work this out without pen and paper. But most of us would struggle to find the product (unless we know our 34 times tables).

Indeed one reason for this is that the numbers get much larger when multiplying them. Indeed the sum of the two numbers above is 96 whereas the product is 2108.

Another, perhaps more important reason for the difference in difficulty is the number of operations which must be performed. To do the addition, I can break the problem down into two small additions, 4+2 = 6 and 3+6 = 9. Of course it's also possible that there is a carry I have to take care of, but we'll ignore these in our analysis. Now consider the multiplication. I have to do 2 × 4 = 8, 30 × 2 = 60, 60 × 4 = 240 and 30 × 60 = 1800. Now I need to add all these together. So, in total I have four easier multiplications to do and three additions.

This process is typical for multiplication using schoolbook algorithms. To add numbers with n digits I need to perform n additions. But to multiply them I need to do n2 digit multiplication and then n2-1 additions. If n was a trillion say, then n2 is a trillion times a trillion, a number so large we don't have time to perform that many operations before the next ice age, even if we used the world's fastest computer.

Enter the FFT

The basic principles of the Fast Fourier Theorem were actually known to Carl Friedrich Gauss in 1805. He was one of the greatest mathematicians who ever lived and numerous mathematical innovations bear his name such as the Gaussian integers, Gaussian elimination, and the Gaussian distribution. Some of his greatest achievements were proving the law of Quadratic Reciprocity and inscribing a 17-gon with a ruler and compass, both extraordinary achievements for his time. It's a testament to how advanced Gauss was that he recognised the application of the FFT so many years ago, using it to compute trajectories of the asteroids Pallas and Juno!

Remarkably the principles of the FFT lay forgotten until much more recently when Cooley and Tukey reinvented them in 1965. As often happens with mathematicians and computer scientists, the idea occurred to Tukey at a very unusual time, during a Presidential Advisory meeting which he was attending during the cold war. The topic was how to detect foreign nuclear tests.

An FFT Example

Let's see roughly how the FFT works. Imagine I wish to multiply two two digit numbers (here we'll interpret the word digit loosely - any method of breaking our number up into equal sized pieces will do, they needn't be single decimal digits).

Let's imagine we have two sets of four wheels, one for each of our numbers. Each of the wheels is divided into four parts and can be rotated. On the first set of wheels let's suppose that the first wheel has the first digit of our first number in its first quarter and zeroes on all the other quarters of that wheel. Similarly the second digit of our first number will be on the first quarter of our second wheel with zeroes on the other quarters of that wheel. The third and fourth wheels will have all zeroes on them. We'll do the same thing with the digits of the second number and the second set of wheels.

Let's work through an explicit example. We'll perform the ambitious multiplication 43 × 21. So far our wheels will have the following numbers on them:

(1, 0, 0, 0) (2, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0)

(3, 0, 0, 0) (4, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0)

The first step in our manipulation of these values is to divide each set of wheels into half sized groups. As we have four wheels that will mean two groups of two. We are then going to progress through the wheels in the first group and match them with wheels in the second group. So the first wheel will be matched with the third wheel and the second wheel will be matched with the fourth wheel in each set.

We are going to replace the first wheel with the first plus third wheel and we'll replace the third wheel with the original first wheel plus a rotated version of the original third wheel. The amount of rotation will be 180 degrees.

Now we are going to do precisely the same thing with the second and fourth wheels but with a slight twist (literally). Before we compute the new fourth wheel from the original second and fourth wheel, we'll give them both a twist by 90 degrees.

Here is what our wheels now look like:

(1, 0, 0, 0) (2, 0, 0, 0) (1, 0, 0, 0) (0, 2, 0, 0)

(3, 0, 0, 0) (4, 0, 0, 0) (3, 0, 0, 0) (0, 4, 0, 0)

Of course not a whole lot has actually happened, as our original third and fourth wheels in each case were all zero. But we established a pattern. If we had started with eight wheels, the pattern would have been similar. We'd break into two groups of four and pair them off. The first and fourth wheels would be added and then added with the 180 degree rotation. The second and fifth wheels would have been added and added with a 180 degree rotation, but there'd be the extra twist, this time with a twist of both wheels by one eighth of a turn before computing the new fifth wheel values. Then we'd move on to the third and sixth wheels and do the same except the extra twist would be by two eighths of a turn, and so on.

Back to our four wheels. The next stage is similar to the first except we now break into groups of one instead of two and pair up. So this time the first and second wheel will be paired and the third and fourth wheel paired. Now the first wheel will be replaced with the first plus the second wheel and the second by the first wheel plus the second rotated by 180 degrees. A similar thing is done for the third and fourth wheels. The result is as follows:

(3, 0, 0, 0) (1, 0, 2, 0) (1, 2, 0, 0) (1, 0, 0, 2)

(7, 0, 0, 0) (3, 0, 4, 0) (3, 4, 0, 0) (3, 0, 0, 4)

Note there were no extra twists this time.

It's a bizarre set of operations, but you'll agree we didn't do anything strenuous. We've only been rotating wheels and adding and subtracting small numbers.

Actually, what we have done so far constitutes one full FFT. We haven't finished yet, there is more work to do. We have a middle phase called "pointwise multiplication" and then we perform a kind of backwards FFT called an inverse FFT, or IFFT for short.

Pointwise Multiplications

So, now comes the real work in the algorithm. We are going to multiply each of the four wheels in the first set with the corresponding wheel in the second set, using an unusal multiplication procedure.

Firstly we are going to think of the numbers on our wheels as directions on a map. Let's suppose the four numbers refer to directions on a compass, east, north, west, south in that order. Thus the first wheel in the first set says to go three units east. The next wheel says one unit east, two units west, and so on.

Now multiplication is going to occur like this. Once we've plotted our position on the map after following the directions on a wheel, we're at a particular position that is a certain distance from the origin and at a certain compass bearing counterclockwise from east. To multiply we are going to multiply the total distance in each case and add the compass bearings.

The first wheel in the first set says go a total of three units east and that is it. So the total distance is three and the bearing zero degrees from east. The first wheel in the second set also says to go east, so the bearing is zero degrees, but the distance is 7. So multiplying these means multiplying the distances, 3 × 7 = 21 and adding the bearings 0+0 = 0.

The second wheels multiply in a straightforward manner. However, multiplying the third wheels is slightly tricky as the bearings are strange angles as are the total distances. But it turns out there is an easy way to get a combination of four numbers which when put on a wheel gets us to the product position without us doing any complicated drawings and angle computations.

The trick is to multiply the two wheels out by multiplying each of the numbers in the first wheel being multiplied with each number of the second wheel being multiplied, taking directions into account. Let's do this for the third wheel in each set, i.e. (1, 2, 0, 0) by (3, 4, 0, 0). We first have 1E by 3E which gives 3E. Then 1E by 4N which gives 4N, then 2N by 3E which gives 6N, then finally 2N by 4N which gives 8W. The total is 3E, 10N and 8W or (3, 10, 8, 0). You can check by drawing it out on a map that this product gives a set of directions which puts us at exactly the product position as given by the first definition of product.

This all seems like a lot of work, and it is. In fact it is more work than just multiplying 21 and 43 in the first place. However, in practice we don't do all this work. Instead of using four wheels with four numbers (or eight wheels with eight numbers each, or even more), we use complex numbers. Each complex number has just two components, an amount east and an amount north. You can see that no matter how many wheels we have and no matter how many numbers there are on our wheels, following the directions will get us to some grid reference, which we can describe by an amount east and an amount north. Complex numbers can be multiplied much more straightforwardly, (3E, 4N) × (2E, 1N) = (3 × 2-4 × 1E, 3 × 1+4 × 2N) = (-2E, 11N). Of course mathematicians write this complex number as -2 + 11i, but it is plotted on the page by going -2 east (i.e. 2 west) and 11 north.

Let's stick to our wheels and our peculiar method of multiplying, instead of using complex numbers, because it more accurately gives a feel for what the FFT is about when multiplying large numbers. In fact in practice, when multiplying large numbers we don't actually use complex numbers or rotating wheels. Instead we use modular arithmetic and a version of the FFT for such numbers due to Schoenhage and Strassen. Instead of rotating wheels we use binary numbers and shifts of the binary digits to the left and right. Binary digits which get shifted out the top get put back in at the bottom of the number (after changing their sign). You can see where the analogy with wheels comes in. Binary digits are essentially rotated around up to a change in sign.

Anyhow, back to the wheels, and after applying the peculiar multiplication algorithm described above we get the following values on our wheels:

(21, 0, 0, 0) (11, 0, 10, 0) (3, 10, 8, 0) (3, 0, 8, 10)

The Inverse FFT

To finish off, as mentioned above, we need to perform an inverse FFT. In fact, we apply a very similar set of steps as we did on the way in. But there are two big differences. Instead of starting off with groups of two wheels with groups becoming half the size at each stage, we start with groups of one and make the groups twice the size at each stage. The other difference is our twists will be applied to the second of the original values in each pair before computing either of the wheels we are replacing, and the extra twists will be in the opposite direction to those above.

The first stage of the IFFT will end up as follows:

(32, 0, 10, 0) (31, 0, 11, 0) (6, 10, 16, 10) (11, 20, 11, 0)

There was only one pairing of wheels per group so no extra twists. Only the rotations by 180 degrees were needed for the second wheel of each pair.

The second as follows:

(38, 10, 26, 10) (51, 11, 11, 11) (48, 10, 16, 10) (31, 11, 31, 11)

Here we had two groups of two wheels, with wheels in the first group paired with those in the second. This time the second pair needed the extra twist by a quarter turn. As mentioned above, the extra twist is performed before anything is computed and in the opposite direction to what we did on the way in.

Here's how that extra twist worked out. It occurred when dealing with the second and fourth wheels. First we apply the twist of 90 degrees to the fourth wheel, i.e. (11, 20, 11, 0) -> (20, 11, 0, 11). Now we do the usual addition of the second and fourth wheels, to get the new second wheel and the addition with a rotation of 180 degrees to get the new fourth wheel.

Finally, treating the numbers on the four wheels as though they were directions, the first wheel tells us to go 38E, 10N, 26W and 10S, which overall takes us 12E. The second wheel tells us to go overall 40E, the third wheel 32E and the fourth wheel 0E.

Notice how everything has cancelled except the easterly directions. The final thing to do is to divide everything by four (the number of wheels). We end up with 3, 10, 8, 0. Believe it or not, we are now done. It's easy to check that 3 + 10 × 10 + 8 × 100 + 0 × 1000 = 903 which just happens to be 21 × 43, the multiplication we set out to do.

Well, it looks like we did rather a lot of work, and indeed the FFT is not useful for multiplying such small numbers. But notice that we only did four multiplications, and at least if we used complex numbers throughout, instead of wheels with numbers, those four multiplications weren't that hard to do.

Well, as we hinted, the same thing can be done with eight wheels with eight values on each (or more efficiently with eight complex numbers). Then we'll have eight multiplications to do in the middle of the algorithm. And again the same thing could be done for 16 wheels, 32 wheels or more generally any power of 2.

Now imagine that we had 32 wheels in each set and that these represented the digits of 16 digit numbers followed by 16 zeroes. The total number of multiplications at the middle of the algorithm would be 32. That's pretty spectactular because if we multiplied using the schoolboy routine we'd need to do 16 × 16 = 256 digit multiplications. So (assuming we used complex number to make things efficient) we've cut our workload by a factor of (almost) 4.

That's the basic idea of the convolution by FFT and IFFT. Instead of doing n2 multiplications, we do 2n multiplications, and a whole load of additions, subtractions and rotations, which are much cheaper. Of course we are skipping over a whole pile of technicalities such as the fact that if we use complex numbers we have to approximate values using decimals because not everything works our exactly when you have more than four wheels. But as we said above, the Schoenhage-Strassen technique allows for the use of a number system where everything is exact, so this approximation problem doesn't exist there.

Multiplying Congruent Number Series with the FFT

Over the years many hundreds of papers have been written on the FFT. One of the methods used to multiply our theta series for the congruent number problem treated the series involved as if they were very large integers (where the "digits" are the terms of the series), then used a special version of the FFT to multiply those integers. The other method also used an FFT at its core, but other tricks were used to break the problem up first so that lots of smaller FFT's could be used, instead of one giant FFT.

The biggest issue for both approaches was figuring out how to do the computation in small steps that would fit into memory, whilst storing the rest of the computation on disk. You can see from the above that each stage of the computation needs access to all the data, and there are multiple stages. In fact for a length 32 FFT there would be 10 stages not including the pointwise multiplications. For length 64 there would be 12 stages, and so on. It's easy to see that a length 1 trillion FFT was out of the question. The data would have to be read from and written to disk so many times that it wouldn't finish before the computer wore out.

Both approaches essentially broke the computation up into smaller FFT's. One method used a multimodular approach, breaking the problem up by reducing modulo a whole load of primes and doing lots of small FFT's, one for each prime. The other approach did one giant FFT which was broken down into lots of smaller FFT's. Essentially instead of having many values in a single row as we did above, the values were written out in a two dimensional table and small FFTs were first performed across the rows, then small FFTs were performed down the columns. Although there were a lot of FFTs, each one only needed access to the data from its column or row, and so overall the entire data set was only read and written a couple of times.