Branch predict
As with all current microprocessors, great care is taken to guess the likely result in each branch instruction. Such instructions produce, usually, two alternative routes for the program. They answer questions like ‘is the result zero?’ and the answer will determine what happens next. If we always wait until the question is answered and only then do we load the instructions for the next bit of the program there is much wasted time as we saw in the earlier microprocessor designs. If we guess correctly, we can already pre-load the next part of the program and get started on it. The branch prediction circuitry does the guessing. If it gets it wrong, the old data is ditched and replaced.
Hardware data prefetch
This is a further form of prediction similar in which the incoming instructions are monitored and, as they are still arriving, the data that will be needed is guessed at, and loaded into the data cache so the Athlon loads data before it knows that it will be needed. As with the branch prediction, incorrect data has to be overwritten but on balance, it speeds up the data flow.
Instruction decoders
To make full use of its slower clock speed, the Athlon has three instruction decoders that can run independently. Each of these can handle three operations per clock cycle giving an overall throughput of nine operations per clock cycle, which is still significantly greater than the six operations per clock cycle of the Pentium.
Pipelines and instructions
The Athlon has three independent integer pipelines and also three similar floating-point pipelines whereas the Pentium has four pipelines for integers but only two for floating points. The three floating-point execution units simultaneously handle:
(a) store and load functions
(b) add functions
(c) multiply functions such as all the Intel MMX (multimedia extensions) instructions plus AMCs own SIMD (single instruction multiple data) instructions to provide full support SSE (streaming SIMD extension) and more lifelike 3D imaging and graphics – AMD’s name for these new instructions is ‘3D NOW!’ technology. (MMX is an Intel trademark; 3D NOW! is an AMD trademark.)
The state of the competition
The Pentium had a ‘rapid execution engine’ which had two ALUs (arithmetic and logic units) for the integer instructions, each clocked at twice the core processor speed running a front side bus at 533 MHz whereas the Athlon XP had only a 333 MHz FSB. This continues the pattern of the Pentium claiming the headline figure for speed. However, on balance, the Athlon is, by most tests, slightly faster than the Pentium.
An update…
That was written yesterday. This morning came the news that Intel has burst through the 3 GHz barrier (just) with a 3.06 GHz device. This, they say, includes hyper-threading, a technique that involves splitting a program into units that can be ran simultaneously. It allows the micro to run multiple applications at the same time, with the processor appearing to be two processors. Such multitasking is available in Windows XP and Linux and probably all their successors. So where does this leave the future, are we going to go for greater and greater speeds, or will we develop multi-tasking so we effectively have greater and greater numbers of micros sharing the work? I have a feeling that task sharing will be the answer.
It seems likely that Intel is now back out in front.
Exciting times ahead…
Another update…
Almost immediately, Athlon has replied with what appears to be another significant step forward – 64-bit computing.
The microprocessor which as yet has been living with the codename ‘Hammer’ will be sold as the more user-friendly name of ‘AMD Athlon 64’ and will be available in mid-2003 and will join the PowerPC 970 in the ‘64’ club. It will be able to run 64-bit, 32-bit and 16-bit applications without any speed penalty and so avoid the cost of buying new software.
The only technical information that is included in the initial announcement is a new bus system using ‘hypertransport’ technology which AMD claims to increase throughput by 50% over existing designs. Intel will have something to say about that claim, I expect. The clock speed of the first batch will be little different from the XP, around the 2.8 GHz, but the design will provide more scope for development and will be able to run programs at a higher speed.
Really exciting times ahead… over 3 GHz clock speeds, 64-bit computing and multiple instructions being carried out simultaneously. Sounds good.
The desktop speed Olympics is shared between the PowerPC 970, Pentium 4 and the Athlon 64 whereas the computer market is dominated by the IBM clones leaving just a minor role for the PowerPC 970 in the Apple-Mac. As we saw earlier, the result of any speed test does depend on the nature of the test. Having said that, and at the risk of irritating the fans of each, in the race for the overall speed freak the Athlon 64, when it is available, will appear to be the winner with the other two virtually shoulder to shoulder a pace behind. But it depends on the test chosen and we know that any speed king will be dethroned so very quickly.
A (very) approximate comparison based on the currently available information is shown in Table 14.1.
Table 14.1
PowerPC 970 | Pentium 4 | Athlon 64 | |
---|---|---|---|
Clock speed | 2 GHz | 2.8 GHz | 2 GHz |
Bus speed | 900 MHz | 533 MHz | 533 MHz |
Bits | 64 | 32 | 64 |
Process size | 0.13/ 0.09 microns | 0.13 microns | 0.13 microns |
Op systems | OSX IBM linux | Windows | Windows |
Comparative speed | 1988 | 1984 | 2372 |
Max memory | Terabytes | 40 GB | Terabytes |
In each case, choose the best option.
1 Compared with the Pentium 4, the Athlon XP design has:
(a) faster FSB, running at 533 MHz.
(b) the same speed of FSB.
(c) slower FSB, running at 333 MHz.
(d) faster FSB, running at 2.8 GHz.
2 As the Pentium 4 and the Athlon XP are both using 0.13 micron technology:
(a) it does NOT imply any other similarities between the designs.
(b) they will both run at the same clock speed.
(c) they will have the same number of pins.
(d) the cache sizes are equal.
3 The three floating point execution units in the Athlon XP simultaneously handle store and load, multiply functions and:
(a) SIMD functions.
(b) add functions.
(c) divide functions.
(d) 3D NOW! functions.
4 When Branch prediction is correct it:
(a) increases the overall speed of running the program.
(b) increases the length of the pipeline.
(c) decreases the clock speed.
(d) prevents overheating of the microprocessor.
5 The Athlon Instruction cache has a capacity of:
(a) 256 kB.
(b) 32 bits.
(c) 64 kB.
(d) 384 MB.