## New MIPS Chip Targets Windows NT Boxes

QED's "Orion" Will Combine Good Performance and Low Cost



## By Linley Gwennap

QED, a small design firm founded by former MIPS employees, is developing a low-cost MIPS processor for inexpensive

desktop systems, particularly those running Windows NT. The project, code-named "Orion," is being funded by Integrated Device Technology. QED's Earl Killian, who presented the part at the Microprocessor Forum, said that IDT has licensed the design to some of the other MIPS semiconductor vendors to provide multiple sources for the part. The chip is currently in design and is expected to sample in late 1993, with volume production in early 1994.

Orion is intended to be logically equivalent to the R4000PC, both in its external pins and in all softwarevisible areas. One caveat is that Orion is intended to be built in a 3.3V process, making it difficult to simply plug it into a 5V R4000PC socket, but it should still be fairly easy for system designers to use the QED part, since it could use the same memory control and I/O interface chips as an R4000. From an internal standpoint, however, the chip is completely redesigned from the R4000.

Figure 1 shows the die plan for Orion. The 32K of cache takes about half of the useful area of the chip, while the CPU and FPU are relegated to small corners at the bottom. As Killian said, "We're building an SRAM with an attached processor, not a processor that has an attached SRAM." With this philosophy, the company is using the 4T (four-transistor) cells favored by most SRAM vendors rather than the 6T cells used for on-chip cache by most microprocessor vendors. The smaller cells allow greater density, although they require a slightly different manufacturing process. Both caches are twoway associative and use 32-byte lines. A writeback protocol reduces system bus traffic, although writethrough is supported on a per-page basis.

The lack of external cache support, unusual among current microprocessors, eliminates the cost, board space, and power of an external cache. It also simplifies the cache control logic, reducing both die size and design time. Finally, it removes a large number of signals used to connect to the external cache, lowering the package cost and preventing the die from being pad-limited. Killian's simulation data showed that, at the chip's minimum target frequency of 100 MHz, there is virtually no performance difference between the current design and a design with a 256K external cache and 16K on-chip cache. Even at the maximum frequency of 167 MHz, the difference is less than 10%. The design team felt that this small performance increase did not justify the added complexity of supporting an external cache.

The CPU uses a five-stage pipeline similar to the R3000. Like HP's PA-RISC processors, Orion uses a simple pipeline with minimal penalties (one cycle each for branch or load-use) and makes it run as fast as possible. No superscalar or superpipelined techniques are used. This simplicity shortens the design time as well as reducing the die size. In one concession to complexity, the system interface is designed to return the requested word first on a cache miss; the CPU is able to continue as soon as that word is received.

The team considered leaving out the floating-point unit, but it takes up only 10% of the die and provides a significant performance boost for graphics and other FP applications. The FPU handles all functions through a single non-pipelined unit, except for a separate 64-bit multiplier that can operate in parallel. Adds and similar functions take 4 cycles; multiplies take a total of 8 cycles, 6 in the multiplier and 2 in the main unit for rounding. Divides and square roots are calculated in the main unit at one bit per cycle. At the peak rate, a new multiply/add combination can be launched every 6 cycles. This is somewhat slower than an R4000, which can launch a multiply/add every 4 cycles.

Multiprocessor support is left out to reduce cost and design complexity. The system bus is clocked at half of the CPU clock frequency. Like the R4000, Orion uses an on-chip phase-locked loop (PLL) to create the 2× CPU clock from the system clock.



Figure 1. Die plan for Orion chip. Circuit areas shown in white.

## Power, Performance, and Cost

The chip is designed to use little power, with a goal of 2.5 Watts (worst-case, but with no load) at 100 MHz. This figure is kept low by the 3.3V design and the relatively simple pipeline. The design is fully static, allowing the clock speed to be reduced to save power, although the PLL makes it difficult to dynamically adjust the clock. To overcome this, the chip provides an external control signal that disconnects the PLL output from the CPU clock, freezing the internal state and reducing power consumption to less than a milliwatt. Releasing this signal restarts the processor with no loss of state.

Like the 68060, Orion powers down functional units

that are not being used on a cycle-bycycle basis. This dynamic power management significantly reduces power; for example, on the SPECint92 suite (which includes no floating-point), the design is expected to save about 35% from the worst-case power.

Depending on design tuning, manufacturing process, and binning, Killian hopes to see some Orion parts running as fast as 167 MHz, but he believes that 100 MHz is "really guaranteed" for volume production. Although the original R4000 required superpipelining to reach 100 MHz, it was designed for vanilla 1micron and 0.8-micron processes at multiple vendors. To achieve high clock rates with a simpler pipeline, Orion takes advantage of IDT's 0.6-micron process with three metal layers. Killian, who worked on the R4000, also believes that the original R4000 design did not take full advantage of its superpipelining, and future R4000 chips will have higher clock rates than Orion chips using the same IC process.

Based on its simulations, QED

expects Orion to reach about 60 SPECint92 and about 55 SPECfp92 at 100 MHz. These figures are comparable to a 50-MHz R4000 with a large secondary cache and to those expected from Intel's forthcoming Pentium chip. With these numbers, Orion would be twice the performance of the fastest 486 on integer code and three times better on floating-point.

It is premature to discuss the price of Orion, which will be set by IDT and other vendors, not QED. Even the die size is not yet final, but Killian said that it would be smaller than the 486DX, which is about 81 mm<sup>2</sup>. This would make Orion less than half the size of current highend RISC chips. QED expects the chip to be used in systems that sell for under \$3000.



"This talk is not about a microprocessor that does eight instructions per cycle, or that does massively parallel speculative execution....[Orion] is remarkable more for what can be achieved by taking a minimalist approach."

Earl Killian, QED

## **Competitive Comparisons**

By the time Orion is available, its competition will come from chips such as IBM's PowerPC 601, DEC's Low-Cost Alpha (LCA) chip, and HP's PA7100LC. In Windows NT systems in particular, Orion will be headto-head with LCA and Pentium. Compared to Pentium, Orion should be much less expensive to manufacture, as its die size target is about 1/3 of Pentium's current size, although Pentium will have higher volumes. Orion's performance is about the same as Pentium's when the Intel chip uses an external cache; Orion eliminates this cost as well. Thus, Orion is likely to offer Pentium-class performance at a much lower system cost.

The 601 is very similar to Orion in that it integrates a RISC CPU and FPU with 32K of cache on-chip. At 120 mm<sup>2</sup>, the 601 comes closer than most to Orion's small die size, but requires an expensive half-micron process with five layers of metal to get there. Based on IBM's figures, Orion could have a slight edge in integer performance and use just 20% as much power. The 601, however, should get to market 3–6 months sooner.

The LCA and the 7100LC use a different strategy to achieve low system cost. Instead of integrating large amounts of cache, these chips include a memory controller and, on LCA, a PCI interface as well. The 7100LC, at 196 mm<sup>2</sup>, will be much larger than Orion, and LCA is likely to be as well, making them much more expensive to build. Orion could not add these features because of its R4000PC compatibility goal, but its small die size allows them to be easily added in the future. Detailed data for the DEC and HP chips is not available, but their integer performance

should be similar to Orion's and floating-point performance should be much higher. They will also probably use much more power than Orion, but could beat Orion to market.

One advantage that Orion has over its competition is that it is multiple-sourced. Some companies prefer the option of more than one vendor, and the competition will keep prices low; look what it has done for the x86 market! Orion also uses much less power than any of these chips; for notebook systems, there is little competition other than the forthcoming 486SL, which will have much lower performance. For these portable products, Orion could really shine. ◆