Instruction Decoder, Level 1 Execution Trace Cache and Branch Predictor
The data that is selected by the predictor is loaded into a buffer and then passed onto the Instruction Decoder.
At this stage, the incoming instructions are analysed and converted into an internal code sequence which can be accessed from the Micro code as we saw when we looked at the Z80180 microprocessor. Once the instructions have been decoded, up to about 12 000
instructions called ‘Micro-Operation/Operand’ or µOP are stored in order of use, all ready to go. The correct order is much assisted by the Branch prediction – known by Intel as the Branch Target Buffer (BTB). This stores previous experience to guess what is likely to happen next.
Hyper pipeline
As we saw in Chapter 11, the pipeline is the organization of the microprocessor and not a separate device within the design, so we don’t get a ‘pipeline’ block shown in Figure 12.3. The predictor designs are now very much improved, having had the experience gained with earlier versions of the Pentium. The better the prediction, the longer and faster we can risk make the pipeline. So pleased were Intel with their predictions that they called the new longer pipeline a ‘hyper pipeline’. For maximum speed we would like a long pipeline so that many simple steps can be carried out at greater speed but the overall outcome depends on the predictor circuits making the right guess. A wrong guess means that the pipeline is loaded with incorrect data and has to be refilled, or ‘flushed’, which takes valuable time. The Pentium 4 now has a pipeline of 20 stages which allow 126 instructions to be in use at a single time which can include up to 48 load and 24 store instructions.
Figure 12.3 The Pentium 4 processor
Micro-OP and Memory usage
The µOps that pour out of the Execution Trace Cache are arranged in order and they will be a mixture of information to be stored in memory locations and arithmetic operations. The arithmetic operations are divided in floating-point operations and integer operations. The floating-point register deals with moving and storing while the ALU (Arithmetic and Logic Unit) deals with the more complex operations such as multiplication of 128-bit numbers and MMX (multimedia instructions) as we met a little earlier. The SIMD (Single Instruction Multiple Data) that was applied to the earlier Pentiums have been extended by an extra 144 instructions. This facility is now called SSE2 (Streaming SIMD Extensions 2 instructions). The general idea is that if we have to perform an action on many bits of data, it is simpler and faster to collect them all together and perform the function on all of them at the same time.
Rapid Execution Engine
For the integer instructions there are two ALUs clocked at twice the core processor speed which is a four-fold improvement over the basic function and provides a data transfer rate of 48 GB/s.
A level 1 data cache handles the data outputs from the ALUs and the AGUs (Address Generation Units).
The new Pentium design with speeds over 1.8 GHz and 0.13 µm technology is given the codename ‘Northwood’ that replaces the previous ‘Williamette’. The Williamette had reached the end of its development whereas the Northwood is just starting and since it is already running at 2.8 GHz, the magic 3 GHz chip is imminent, then we can probably look forward to the even more magic 4 GHz before the Northwood design is obsolete.
The Celeron
The bold type in computer adverts always shouts about ‘price and speed’ and many people fall into the trap of assuming that a 2.8 GHz microprocessor is obviously faster than a 2.5 GHz microprocessor. This is a false assumption but still well established so for this section of the market there is a demand for a very cheap microprocessor with a high clock speed.
The solution is to use the Pentium design and cheapen it by taking out some of the non-essential areas. There have been twelve such versions to track the Pentium releases during its development. In the 2 GHz Celeron, the price reduction has been achieved at the expense of reducing the L2 cache from 512 kB down to 128 kB and the FSB down from 533 MHz to 400 MHz.
Quiz time 12
In each case, choose the best option.
1 SIMD is:
(a) used in standard Pentiums but not in the MMX versions.
(b) a way of preventing wraparound.
(c) single in-line multimedia data.
(d) single instruction multiple data.
2 Branch prediction logic:
(a) is another name for the prefetch register.
(b) is only used in MMX versions.
(c) saves memory in 85% of occasions.
(d) attempts to guess the future steps to be taken by a program.
3 An exception:
(a) will be ignored if the I flag is set to a high level.
(b) is an unusual branching of the program.
(c) is an interrupt signal generated by the microprocessor.
(d) occurs whenever the Pentium is surprised by an arithmetic result.
4 The initials SIMD stand for:
(a) SIM card type D.
(b) Single Instruction Multiple Data.
(c) Superscalar Instruction Mode for Data.
(d) Streaming Instructions Modular Data.
5 In its construction, the Pentium 4 uses:
(a) 0.13 µm technology.
(b) 1.8 µm technology.
(c) 1.3 µm technology.
(d) 0.18 µm technology.
13. The PowerPC
Intel was producing a series of CISC microprocessors and, together with Microsoft, was in a position to dominate the market. Being increasingly squeezed out was the traditional king of computers, IBM, which, at one time, produced more computers that all other manufacturers combined. Big Blue as IBM was called, on account of their logo and the blue suits worn by their army of salesmen, laid down the standard design for the computer that now eclipses all others designs in the world.
As long ago as the mid-1970s IBM had developed a RISC microprocessor but it didn’t really make it in the market place. RISC did not ‘come of age’ until Acorn produced the ARM 2 and 3 microprocessors for their Archimedes microcomputer, but this too, failed to muscle its way into the market as it made little attempt to make it compatible with Intel code. Acorn, at that time, was introducing the Archimedes as a replacement for the much-loved BBC microcomputer. By 1990, it was apparent that the terrible twins, Microsoft and Intel, would take over the world if no one fought back.
As it happens, this was the very year in which a fledgling company called ‘AMD’ was hatched to grow over the years to become a persistent irritant to Intel. As yet, Microsoft still rules the world but there is a system called Linux that may, one day, become troublesome.
Meanwhile, an alliance was formed between IBM, Motorola and Apple Computers. To this alliance IBM brought their POWER microprocessor (Performance Optimized With Enhanced RISC). This was the successor to the earlier 801 RISC microprocessor and was chosen because it was a RISC microprocessor and already had software developed. Motorola would build the chip and Apple would bring its computer operating system, which was light years ahead of the Microsoft equivalent at that time. The new family of microprocessors was to be called the PowerPC series.