Выбрать главу
Quiz time 11

In each case, choose the best option.

1 The ‘size’ of a microprocessor is determined by the:

(a) width of its data registers.

(b) number of lines in its external data bus.

(c) number of digits in its type number.

(d) width of its address registers.

2 An integrated circuit having 15 000 transistors is classed as a:

(a) LSI device.

(b) SLSI device.

(c) VLSI device.

(d) SSI device.

3 An untextured polygon:

(a) looks like a dinosaur.

(b) is just a wire-frame shape.

(c) has no shape.

(d) is a cube.

4 An L1 cache is usually:

(a) onboard the microprocessor.

(b) constructed from DRAM for maximum speed.

(c) slower than Level 2 cache.

(d) external to the microprocessor.

5 RISC:

(a) means ‘radical instruction set computer’.

(b) has longer instructions and is therefore slower than a CISC chip.

(c) is part of everyday life.

(d) chips employ a smaller instruction set.

12. The Pentium family

The Pentium is a 32-bit microprocessor just like the previous Intel 80386 and 80486 but has been considerably enhanced to improve its speed of operation. Even the 132 pins of the 80386 have increased to 296 on the Pentium.

Other full RISC chips were being well-received at the time the CISC Pentium was launched in 1993 and Intel took these new designs into account but it was boxed into a corner by its own success. It had to maintain absolute compatibility with the previous 8086, 80286, 80386 and the 80486 together with their numerical co-processors. The compromise was to use all the RISC while maintaining the CISC codes. It has over 400 instruction codes. Some are performed by hardware and some by microcode. Its two million plus transistors have been incorporated into a superscalar structure. This means that it has duplicated arithmetic and logic units that can allow it to carry out two instructions at the same time under favourable conditions.

It was launched at 66 MHz and in its first year became famous as the microprocessor that couldn’t count. There was a flurry of letters in the computer magazines and a host of ‘How many Pentiums does it take to change a light bulb?’ type jokes. At first, Intel denied there was a problem even though they must have known about it. ‘And, no, you can’t have your money back.’ More letters. ‘Alright, there is a very, very small matter of a few division sums.’ The error actually produced inaccuracies in the sixth or ninth decimal place in some particular division sums. This was insufficient error to affect more that a small minority of users but it started to undermine confidence in the Pentium. The real problem was that two errors occurred during its design at the same time. Either one, on its own, would have been spotted but the two mistakes served to hide each other. Anyway, it’s been fixed. It only affected the early versions and is no longer significant.

Over time the speed has increased to 200 MHz with the inevitable rumours of the Pentium II running at 400 MHz that will support a 100 MHz system clock.

An outline of Pentium operation

See Figure 12.1.

Figure 12.1 The Pentium processor

Data and code caches

Connections to the outside world are via a 64-bit external data bus and a 32-bit address bus. The incoming data that consists of numerical data and instruction codes are loaded very quickly into two internal caches – an 8 kbyte data cache and an 8 kbyte code cache. These caches shift data very rapidly on the internal pathways that are 128 and 256 bits wide.

Whenever possible, the Pentium uses burst mode to read and write data. The burst mode system loads a cache for example, with more data than the width of the data bus. If a cache line is 128 bits wide and it is fed from a 64-bit data bus, then we could completely fill the line by transferring 64 bits and then another 64 bits. Burst mode loads all 128 bits very rapidly without further intervention from the microprocessor. Putting more new data into the cache will increase the chances of the cache holding the required information. This is called a cache ‘hit’.

Prefetch buffer

The prefetch buffer is a small internal memory that holds a list of instructions that are waiting to be executed. This ensures that the instruction decoder is never waiting for a new instruction from the external (slow) memory and it makes more efficient use of the external data bus since the new instructions can be loaded whenever the opportunity arises. When it gets a moment, the Pentium shifts an instruction from the external program into the cache and transfers one instruction from the cache into the prefetch buffer and also sends a signal to the microcode circuit to prepare the code for the next instruction. So, with all the housekeeping done, the instruction decoder can be fed with instructions and data at its maximum rate. The prefetch buffer is actually two independent 32-bit buffers, each providing input to one of the ALUs.

The instruction decoder

The instruction decoder performs much the same function as in other microprocessors. It has two outputs that are fed to the two ALUs called ‘u’ and ‘v’.

Arithmetic and logic units

These units are under the control of the aptly named control unit. The blocks, shown in the diagram as ALU ‘u’ and ALU ‘v’ are actually five step pipelines that can operate in parallel to execute two instructions in a single clock cycle. All commands other than floating point arithmetic can be executed in the ‘u’ pipeline and a more limited range can be carried out in the ‘v’ pipeline. The five-stage pipeline can speed the throughput to one instruction per clock cycle. In the correct conditions, both pipelines can be used simultaneously to handle two instructions in a single clock cycle. Sometimes this is not possible. Perhaps both instructions need access to the same piece of hardware, perhaps the result of an instruction is needed before the next instruction can be started. As a rather simplistic example, if we wished to add two numbers then divide the result by 10, we cannot start dividing anything until the first answer is available. One minor drawback is that instructions cannot overtake each other even if the second one could have been finished very rapidly and they are not dependent on each other.

Floating-point unit (FPU)

For floating-point arithmetic, the FPU has an 8-bit pipeline that is further enhanced by using a hardware multiplier and divider. This is a significant advance over the 80486, which was not pipelined in the FPU. Between them, the pipeline and the hardware, the FPU runs about ten times faster than the 80486 with equivalent clock speeds. You may remember from earlier discussions that one of the benefits of the RISC designs was the use of hardware for the execution of arithmetic operations.

The ‘u’ pipeline has some overlap with the floating-point pipeline so there are restrictions on the occasions when two instructions can be executed at the same time.

There are eight FPU registers 80-bits wide, arranged as a stack. Bits 0 to 63 hold a 64-bit mantissa. Bits 64 to 78 hold a 15-bit exponent and the last bit holds a sign bit.

Notice how the layout of the floating-point number differs from the example that we saw in Chapter 4.

Branch prediction

When the program reaches a ‘branch’ or ‘jump’ instruction, the microprocessor is sent to another part of the program. These instructions are usually ‘conditional’ as in ‘jump to address xxxx if the value in the accumulator is not zero’. When this jump happens, the next few instructions that are loaded into the pipelines are all incorrect and the pipeline has to be emptied and restocked with the new information. This is called ‘flushing’ the pipeline and causes an irritating delay of four or five clock cycles.