RISC I
The first attempt to implement the RISC concept was originally known as Gold. Work on the design started in 1980 as part of a VLSI design course, but the then-complicated design crashed almost all existing design tools. The team had to spend considerable amounts of time improving or re-writing the tools, and even with these new tools it took just under an hour to extract the design on a VAX 11/780.
The final design, known as RISC I, was published in ACM ISCA in 1981. It had 44,500 transistors implementing 31 instructions and a register file containing 78 32-bit registers. This allowed for six register windows containing 14 registers each, with an additional 18 globals. The control and instruction decode section occupied only 6% of the die, whereas the typical design of the era used about 50% for the same role. The register file took up most of that space.
RISC I also featured a two-stage instruction pipeline for additional speed, but without the complex instruction re-ordering of more modern designs. This makes conditional branches a problem, because the compiler has to fill the instruction following a conditional branch (the so-called "branch delay slot"), with something selected to be "safe" (i.e., not dependent on the outcome of the conditional). Sometimes the only suitable instruction in this case is NOP
. A notable number of later RISC-style designs still require the consideration of branch delay.
After a month of validation and debugging, the design was sent to the innovative MOSIS service for production on June 22, 1981, using a 2 μm (2,000 nm) process. A variety of delays forced them to abandon their masks four separate times, and wafers with working examples did not arrive back at Berkeley until May 1982. The first working RISC I "computer" (actually a checkout board) ran on June 11th. In testing, the chips proved to have lesser performance than expected. In general, an instruction would take 2 μs to complete, while the original design allotted for about 400 ns (five times as fast). The precise reasons for this problem were never fully explained. However, throughout testing it was clear that certain instructions did run at the expected speed, suggesting the problem was physical, not logical.
Had the design worked at full speed, performance would have been excellent. Simulations using a variety of small programs compared the 4 MHz RISC I to the 5 MHz 32-bit VAX 11/780 and the 5 MHz 16-bit Zilog Z8000 showed this clearly. Program size was about 30% larger than the VAX but very close to that of the Z8000, validating the argument that the higher code density of CISC designs was not actually all that impressive in reality. In terms of overall performance, the RISC I was twice as fast as the VAX, and about four times that of the Z8000. More interestingly, the programs ended up performing about the same overall amount of memory access because the large register file dramatically improved the odds the needed operand was already on-chip.
It is important to put this performance in context. Even though the RISC design had run slower than the VAX, it made no difference to the importance of the design. RISC allowed for the production of a true 32-bit processor on a real chip die using what was already an older fab. Traditional designs simply could not do this; with so much of the chip surface dedicated to decoder logic, a true 32-bit design like the Motorola 68020 required newer fabs before becoming practical. Using the same fabs, RISC I could have largely outperformed the competition.
Read more about this topic: Berkeley RISC