“It is unworthy of excellent men to lose hours like slaves in the labour of calculation which could safely be relegated to anyone else if machines were used”. These words, sounding a bit elitist, were pronounced by Gottfried Leibniz, the German mathematician and philosopher who dedicated his life to the foundation of the modern mathematical analysis (together with Isaac Newton), and to the search for automatic computation. It was in the 1672 when he invented the Stepped Reckoner, a digital mechanical calculator in form of a wheel, which was the first machine allowing for the automatic execution of the four arithmetic operations. A replica of this machine is shown nowadays at the Deutches Museum in Munich.
In the last years Fujitsu company in Kobe (Japan) has produced a computer, named “K”, which is the fastest computer in the world, with calculation speed of 8.12 Petaflops, or about 8 quadrillions (1 followed by 15 zeroes) operations per second. In order to give a reference system, a hand-held calculator has a speed of few flops, but… what is a flop or floating point operation? It is a specific operation involving floating point numbers. These are number representations in informatics which are used for scientific calculations; a floating point number represents any real number with three components, significant decimal digits, a base (2, 10 or 16) and an exponent for the base. This representation hugely expands the possible range of numbers which can be treated by the computer when compared to fixed point and integer representation.
“K” computer, which takes its name by “kei” Japanese symbol for quadrillion, belongs to a series of special calculators called supercomputers, which are dedicated to deal as fast as possible with huge numerical problems, which involve an enormous amount of data. It consists of 672 cabinets, stuffed with circuit boards and it will be increased to 800 in the incoming months. The computer is not operative yet, and it will enter in service by November 2012.
The race to exceptionally fast supercomputers is an history within the history of computation, which is very old and started with the Roman abacus, used in Babylonia around 2400 b.c. The history of supercomputers is more recent and it dated back to the 1960s with the work of Seymour Cray at Control Data Corporation (CDC). In the 1970s Cray left the company to found his own firm, the Cray Research and for five years from 1985-1990 the company continued to design the fastest and most modern supercomputers. Nowadays, companies such as IBM, Fujitsu, Toshiba, Intel and Hewlett-Packard are also dedicated in fabricating modern supercomputers.
What is a supercomputer itself? The term is quite general and vague because the computational power of today’s supercomputers tends to become that of tomorrow’s ordinary computers. CDC’s early supercomputers were simply very fast scalar processors, or simple computing processing units (CPUs), which could deal with one single data (a number or an instruction) at a time. In the 1970s most of the supercomputers turned to have an architecture based on vector processors, consisting of single CPUs which deal with an instructions set. The instructions operate on many data, which are in turn “piled” in an array. These arrays called vectors and their number varied from four to sixteen. These techniques are also found in video game consoles, such as PlayStation 3, which uses the Cell processor, invented in 2000 by collaboration between IBM, Toshiba and Sony.
Lately, in the 1980s and 1990s the architecture of a supercomputer moved from vector processors to so called massive parallel processing (MPP), which consists of a single computer with thousands of ordinary CPUs or 32-bit microprocessors assembled and interconnected among each other in order to perform many tasks and instructions in parallel. Furthermore, many of these computers can be coupled to work closely together as a new machine and the different pairs can be connected to each others through very fast networks. These form clusters which are common in the most powerful operating supercomputers of the last decade.
The concept of parallel computation is at the core of supercomputers function and it is based on the idea that a big and complex problem can be divided in smallest independent parts or tasks, which can be solved separately by different processors at the same time (in parallel). The processing elements can be different such as a single computer with multiple processors or several networked computers or a combination of the two. The informatics community aimed to speed up calculations by increasing the frequency scaling of a program: the latter consists of a number of instructions which are executed in different cycles in a certain amount of time. If the number of cycles per second (the frequency) is increased, then the program will run faster. This is done increasing the number of processors working in parallel. Moore’s law comes in support of this strategy, by reporting that the density of transistors in a microprocessor double every two years so that more and more processors can be used in future parallelization.
Nevertheless, it is not all that simple and linear. Two important factors limit the parallelization process: the first is the program itself, in particular how many parts of this program can be parallelized. If a part of the program cannot be divided in independent sub-parts, no matter the number of processors used, that part will limit the entire speed of the program. This issue goes under the Gene Amdahl’s law formulated in the 1960s. Some results obtained by a processor are necessary to another processor for continuing the calculations, and there will be a minimum time necessary to exchange this information between the two processors. The larger the physical distance in terms of relative connections, the longer will be the time needed for their communication. Typically latencies in currently operating supercomputers are of the order of microseconds.
To illustrate the two problems we can use this example: we want to simulate the motion of hundreds salt ions in water; this can be done using Newton equations, which describe a relationship between the force and acceleration of the particles. All the ions can be treated independently as different parts and by using Newton’s equation for each ion, once known the initial acceleration of the ion at a certain time we can calculated the new position in the next time-step. This part of the calculation can be executed in parallel using for instance one processor for each ion. Hundreds ions will be treated by hundreds processors in parallel. A problem occurs at the next time-step, though: one processor cannot calculate the new acceleration of one of the ions at the next time-step until all the other processors have calculated the new positions of the other ions, because the new acceleration on the specific ion depends on the new force exerted by all the other ions. The force evaluation is a part of the program which cannot be parallelized. Moreover an exchange of information is necessary between the different processors to pass the values of the positions of the ions in order to calculate the overall forces, so that the latency becomes crucial.
Because of these limitations not all types of parallelization result in speeding up the calculations. Sometimes, the use of too many processors makes the time of communication among processors larger than the time necessary for solving the problem. In this case the parallelization might even decrease the overall speed of calculations.
Another crucial problem is the electrical power consumption of a supercomputer, which is directly related to the frequency scaling, so that the faster the supercomputer goes the more power it will consume. “K” supercomputer will consume about 10 MW per year, which corresponds to that of about 10,000 houses for an annual cost of 10 millions US dollars.
Although the great economical costs the race for the highest speed supercomputer started in the 1980s and it is in its full run. Since 1993 the race hase become under the spotlights of an official “referee” called “TOP500” project and directed by Prof. Jack Dongarra at the University of Tennessee. The project ranks the fastest supercomputers using a benchmark problem consisting of 1000 linear equations (LINPACK) and measuring how many flops occur to completely solve this benchmark problem.
The Connection Machine (CM-5) at MIT of Boston and the Japanese Numerical Wind Tunnel were the first fastest performing supercomputers in the TOP500 list of 1993 with 1.7 Gflops (1 followed by 9 zeroes) peak speed. In 2007 Intel Corporation unveiled the multicore POLARIS chip speeding at 1Tflops (1 followed by 12 zeroes) but it was few months later that IBM Blue Gene/L supercomputer hit the top with a peak of 596 Tflops. Then, the race focused on Peta (P) flops. IBM “Roadrunner” supercomputer located at the Los Alamos National Laboratory (US) reached the top place with 1 Pflop on May 2008, but in 2009 Cray XT “Jaguar” at the Oak Ridge National Laboratory (US) beat “Roadrunner”, by reaching 1.75 Pflops, remaining the most powerful supercomputer up tot November 2010. Recently, at the beginning of 2011 Tianhe-1A supercomputer in the National Supercomputing Centre in China became the one spot on TOP500 with a speed of 2.6 Pflops and it remained on top until last June, when Japanese Fujitsu came back to the top with “K”.
Is this tightly competitive race really useful for something? It is, indeed.
Supercomputers allow scientists to simulate very complex phenomena, such climate phenomena, geophysics, heart dynamic and protein folding. These phenomena remain still not fully understood by scientists and the simulations are extremely important either to reveal aspects of the phenomena which experiments cannot fully explain (such as in the case of protein folding) or for accurate predictions of new events and consequences such as in climate modelling (see video below).
Of course, scientists are already looking ahead: the efforts are now focused on pushing supercomputers to the speed of Exa flops (1 followed 18 zeroes) and some engineers are questioning whether new non-conventional technologies in computer hardware will be needed in order to reach this horizon. Erik P DeBenedictis of Sandia National Laboratories claims that a Zetta flops computer, 1 million times faster than “K”, will allow us to have a full, accurate weather forecast for a period spanning on two weeks. The predicted year for the appearance of supercomputers at Zetta flops supercomputers is 2030: it is still a relatively far future, but Leibniz’s dream looks on the way to full realization.