# Mips Plays Hardball With Soft Cores

# MIPS64 5Kc Is First 64-Bit Synthesizable Processor Core



by Tom R. Halfhill

Mips Technologies is getting softer all the time, but that's not good news for competitors. At this month's Microprocessor

Forum, Mips announced the first implementation of its MIPS64 instruction-set architecture—and the first 64-bit soft core from any microprocessor vendor.

Engineering manager Darren Jones described how the new MIPS64 5Kc adds 64-bit performance to Mips's growing line of synthesizable embedded cores, which started with the

MIPS32 4Kc and 4Kp, introduced at Embedded Processor Forum last May (see MPR 5/31/99, p. 18). In September, Mips rolled out the MIPS32 4Km, another 32-bit soft core. The new 5Kc, code-named Opal, is an R5000-class core that maintains backward compatibility with R4000/R5000 software.

MIPS64 doesn't add any new instructions beyond the 19 already introduced in MIPS32. But it does extend the architecture to 64 bits across the board—including the system I/O bus, memory-address bus, general-purpose registers (GPRs), and internal datapaths. This should improve performance with embedded applications that manipulate data wider than 32 bits, such as ATM switches, Internet packet routers, and encryption software. Mips says

the 5Kc is also intended for printers and disk controllers that need 64-bit processing.

But Mips isn't billing the 5Kc as the ultimate solution for the most demanding embedded applications. The company had to make some compromises to accommodate the limitations of synthesis tools. For customers that need higher performance, Mips plans to introduce another 64-bit embedded



Figure 1. The new pipeline has an extra dispatch stage for future superscalar and MIPS16 execution.

core next year. Known as the 20K, or Ruby, that core targets 1,000 Dhrystone MIPS—up to three times the performance of the 5Kc—and will be a full-custom design instead of a synthesizable core.

#### Going Beyond MIPS32

Mips's target frequency for the 5Kc is 250-300 MHz in a 0.18-micron IC process, which would yield up to 360 Dhrystone 2.1 MIPS. The 5Kc's first licensee, Texas Instruments, plans to sample the first chips in 1Q00.

Excluding caches, the 5Kc core requires about 180,000

gates and occupies about 3 mm<sup>2</sup> of silicon. The estimated power consumption is 2 mW/MHz. When the core moves to a 0.15-micron process next year, Mips projects a nominal clock speed of 375 MHz, which translates into 450 Dhrystone MIPS. The die size will shrink to about 2 mm<sup>2</sup>, and power consumption will drop to less than 1 mW/MHz.

Future enhancements to the micro-

compared with five stages in current MIPS32 cores.

architecture could significantly boost the 5Kc's performance. Right now it's a uniscalar core, but it has some features that will make it easier for Mips to add superscalar capability. As Figure 1 shows, Mips added an extra dispatch stage between the instructionfetch and register-access/instruction-decode stages. The new pipeline is six stages long,

Dividing dispatch and decode operations into separate stages should make it easier for a superscalar version of the core to dispatch instructions in parallel to multiple function units. It also simplifies the clocking for synthesis tools and should allow the 5Kc to run at slightly higher frequencies.

Mips says the extra stage could also support MIPS16 in future cores. MIPS16 (see MPR 10/28/96, p. 40) is a compressed-instruction format for embedded applications that need maximum code density. Although none of the MIPS32 or MIPS64 cores can execute MIPS16 instructions today, that's strictly an implementation issue; both architectures allow it.

To compensate for the stiffer penalty that a longer pipeline pays when a branch changes the instruction flow, Mips added a static branch predictor and instruction prefetching to the 5Kc. The predictor assumes branches will always be taken. The 5Kc can speculatively fetch as many as six instructions, and the penalty for mispredicting a branch is only one cycle.

Another significant difference between the 5Kc and current MIPS32 cores is that it has a coprocessor interface



Darren Jones of Mips Technologies unveils the new MIPS64 5Kc core at Microprocessor Forum.

### Price & Availability

An alpha version of the RTL code for the MIPS64 5Kc core is available for licensing now. Mips Technologies expects to deliver the final code in November. Texas Instruments, the first licensee, is adding the 5Kc to its ASIC library and plans to sample chips in 1Q00. Licensing fees are subject to negotiation. For more information, go to <a href="https://www.mips.com/products">www.mips.com/products</a>.

like the one found in most standalone MIPS chips. This allows ASIC and system-on-a-chip (SOC) designers to integrate a floating-point unit, a graphics coprocessor, or virtually any other type of execution engine. For this reason alone, the 5Kc will be the preferred synthesizable MIPS core for ASICs and SOCs that need special functionality.

The 5Kc has the same fast multiplier as the MIPS32 4Kc and 4Km. It can perform  $32 \times 16$ -bit multiplies with single-cycle throughput, or  $32 \times 32$ -bit multiplies every other cycle. A  $64 \times 64$ -bit multiply has a repeat rate of nine cycles, but the separate multiply/divide pipeline prevents the regular pipeline from stalling during complex calculations.

## Morphing With MIPS

As Figure 2 shows, the 5Kc offers designers several configuration options, just as the MIPS32 cores do. But the 5Kc offers more options, and the options are more powerful.

While the cache controller in current MIPS32 cores limits the instruction and data caches to 16K each, the 5Kc allows them to be as large as 64K. The caches can be direct-mapped or two-, three-, or four-way set-associative. Each way can be 2K, 4K, 8K, or 16K. This gives designers considerable flexibility. For example, 16K caches can be direct-mapped (16K per way), two-way (8K per way), or four-way (4K per way). The data cache supports write-back or write-through protocols; no write-back option is available for current MIPS32 cores.

For memory management, the 5Kc has an MMU with a 32-, 48-, or 96-entry translation-lookaside buffer (TLB)—another configuration option. In contrast, only the MIPS32 4Kc has an MMU with a TLB, and it's limited to 32 entries. The MIPS32 4Kp and 4Km use fixed block-address translation. The MMU is important because Windows CE won't run on processors without it.

The 5Kc allows designers to implement registers as latches or as register cells when using synthesis tools, or as full-custom logic without synthesis. These options are also available with the MIPS32 cores.

Another similarity with MIPS32 is the enhanced JTAG (EJTAG) debug interface. Designers can omit the EJTAG altogether or include it with three different levels of breakpoint support. The trade-off is that enabling more breakpoints requires more gates, enlarging the size of the die. Mips



**Figure 2.** The MIPS64 5Kc offers designers a few configuration options, as do previous MIPS32 cores.

believes that most designers would rather have the EJTAG, because ASICs and SOCs (which tend to have more on-chip peripherals) are more difficult to debug.

#### Is It Really Faster?

Surprisingly, the 64-bit 5Kc doesn't appear to be any faster than the 32-bit 4Kc. Both cores deliver 1.2 Dhrystone MIPS/MHz. Are the engineers at Mips just a pink slip away from an exciting new career in telemarketing?

Not yet. The fallacy lies in assuming that Dhrystone MIPS is an adequate benchmark of CPU performance. The Dhrystone program is so small that primary caches larger than 4K make virtually no difference, so obviously it can't measure the 5Kc's 2× advantage in maximum cache size (64K vs. 32K). More important, Dhrystone doesn't measure the 5Kc's 2× advantage in data-handling capabilities (64 bits vs. 32 bits), its larger TLB (up to 96 entries vs. 32 entries), the greater efficiency of its optional write-back cache, or the versatility of its coprocessor interface.

Nevertheless, Dhrystone is useful for rough approximations. Compared with 64-bit cores and processors from other vendors, the 5Kc fares poorly. IDT's new RISCore 64600—which is based on an enhanced version of the MIPS-IV architecture—is projected to hit 400–500 MHz in a 0.18-micron process and deliver more than 800 MIPS. The 64600 also has a floating-point unit and vector-FP instructions that the 5Kc lacks. The new 64-bit SH-5 architecture from Hitachi and STMicroelectronics leaves the 5Kc in the dust too (see MPR 10/6/99, p. 20). Mips will battle those competitors next year with the 20K hard core.

The 5Kc's big advantage is that it's synthesizable. Synthesis tools can't match the efficiency of custom layouts, but the versatility that only a soft core can provide means everything to some ASIC and SOC designers. If those designers also need 64-bit data wrangling and memory addressing, for now Mips is the only game in town.