The AMD 29000, often simply 29k, was a popular family of RISC-based 32-bit microprocessors and microcontrollers from Advanced Micro Devices. A microprocessor incorporates most or all of the functions of a Central processing unit (CPU on a single Integrated A microcontroller (also MCU or µC is a functional Computer system-on-a- chip. They were, for a time, the most popular RISC chips on the market, widely used in laser printers from a variety of manufacturers. A laser printer is a common type of Computer printer that rapidly produces high quality text and graphics on plain paper In late 1995 AMD dropped development of the 29k because the design team was transferred to support the PC side of the business. What remained of AMD's embedded business was realigned towards the embedded 186 family of 80186 derivatives. The majority of AMD's resources were then concentrated on their high-performance, desktop x86 clones, using many of the ideas and individual parts of the latest 29k to produce the AMD K5. The K5 was AMD's first x86 processor developed entirely in-house introduced in March 1996
The 29000 evolved from the same Berkeley RISC design that also led to the Sun SPARC and Intel i960. Berkeley RISC was one of two seminal research projects into RISC -based Microprocessor design taking place under ARPA 's VLSI project. Sun Microsystems Inc ( is a multinational vendor of Computers computer components Computer software, and Information technology services SPARC (from Scalable Processor Architecture is a RISC Microprocessor Instruction set architecture originally Intel 's i960 (or 80960) was a RISC -based Microprocessor design that became popular during the early 1990s as an embedded One "trick" used in all of the Berkeley-derived designs is the concept of register windows, a technique used to speed up procedure calls significantly. The University of California Berkeley (also referred to as Cal, Berkeley and UC Berkeley) is a major research university located in Berkeley In Computer engineering, the use of register windows is a technique to improve the performance of a particularly common operation the Procedure call. In Computer science, a subroutine ( function, method, procedure, or subprogram) is a portion of code within a larger The basic idea is to use a large set of registers as a stack, loading local data into a set of registers during a call, and marking them "dead" when the procedure returns. In Computer architecture, a processor register is a small amount of storage available on the CPU whose contents can be accessed more quickly than storage Values being returned from the routines would be placed in the "global page", the top eight registers in the SPARC (for instance). It is interesting to note that the competing early RISC design from Stanford University looked at this concept, but decided that improved compilers could make more efficient use of general purpose registers than a hard-wired window, something that has proven true over the years. Leland Stanford Junior University, commonly known as Stanford University or simply Stanford, is a private Research university located in
In the original Berkeley design, SPARC, and i960, the windows were fixed in size. A routine using only one local variable would still use up eight registers on the SPARC, wasting this expensive resource. It was here that the 29000 differed from these earlier designs, in that it used a variable window size to improve usage. In this example only two registers would be used, one for the local variable, another for the return address. In postal Mail, a return address is an explicit inclusion of the address of the person sending the message It also added more registers, including the same 128 registers for the procedure stack, but adding another 64 for global access. In comparison the SPARC had 128 registers in total, and the global set was a standard window of eight. These changes, combined with a "halfway smart" compiler, resulted in the best of both worlds in performance -- high performance for procedure calls, while still having lots of global registers for general purpose work. The 29000 also "extended" the register window stack with an in-memory (and in theory, in-cache) stack. When the window filled the calls would be pushed off the end of the register stack into memory, restored as required when the routine returned. Generally the 29000's register usage was considerably more advanced than competing designs based on the Berkeley concepts.
Another difference, this one not so odd, is that the 29000 included no special-purpose condition code register. Any register could be used for this purpose, allowing the conditions to be easily saved at the expense of complicating some code. An instruction prefetch buffer was used that stored up to 16 instructions, used to improve performance during branches -- the 29000 did not include any branch prediction system so there was a delay if a branch was taken (nor was it originally superscalar, so it could not "do both sides" as is common in some designs). A superscalar CPU architecture implements a form of parallelism called Instruction-level parallelism within a single processor The buffer mitigated this by storing four instructions from the "other side" of the branch, which could be run instantly while the buffer was re-filled with new instructions from memory.
The first 29000 was released in 1988, including a built-in MMU but floating point support was offloaded to the 29027 FPU. A memory management unit ( MMU) sometimes called paged memory management unit ( PMMU) is a Computer hardware component responsible for handling In Computing, floating point describes a system for numerical representation in which a string of digits (or Bits represents a Real number. A floating point unit (FPU is a part of a Computer system specially designed to carry out operations on Floating point numbers The 29005 was a cut-down version (???). The line was upgraded with the 29030/29035, which included 8k/4k of instruction cache. Another update included the FPU on-die and added 4k of data cache to produce the 29040. The final general purpose version was the 29050, which was a superscalar design that could issue four instructions per clock, and included out-of-order and speculative execution, as well as a much faster FPU. A superscalar CPU architecture implements a form of parallelism called Instruction-level parallelism within a single processor
Several portions of the 29050 design were used as the basis for the K5 series of x86 compatible processors. The K5 was AMD's first x86 processor developed entirely in-house introduced in March 1996 The FPU was used without change, while the rest of the core design was used along with complex microcode to translate x86 instructions to 29k-like code on the fly. Microprogramming (ie writing microcode) is a method that can be employed to implement Machine instructions in a CPU relatively easily often using less