# MICROPROCESSOR © REPORT THE INSIDERS' GUIDE TO MICROPROCESSOR HARDWARE

#### VOLUME 8 NUMBER 6

MAY 9, 1994

## IBM, Motorola Preview Embedded PowerPCs 403 and 505 Processors Combine Strong Performance with Low Cost

#### by Curtis P. Feigel

Extending the PowerPC architecture beyond the desktop, IBM and Motorola have each designed embedded PowerPC processors. IBM's PPC 403GA and Motorola's RMCU 505 integrate more system logic and peripheral circuitry than the 600-series PowerPC processors. They are also smaller and less expensive than the desktop chips and are projected to turn in performance ratings from 40 to 47 Dhrystone MIPS, significantly better than competing processors in the same cost range.

The embedded PowerPC processor cores are costreduced versions of those found in the 600-series chips that IBM and Motorola codesigned at their jointly owned Somerset facility (see MPR 7/24/91, p. 1). But even though both new devices use the PowerPC architecture, the 403 and 505 represent separate efforts by the two companies, each of which has created its own processor implementation.

The two teams achieved first silicon within the past six weeks, but Motorola was less forthcoming in its announcement and did not reveal details such as transistor count or die size. IBM's release may have pressured Motorola into disclosing more of its plans than it wanted to as the two companies made head-to-head announcements at the Embedded Systems Conference.

The two companies have already introduced several versions of the PowerPC architecture intended for desktop and portable systems. The first, the 601 (see **061401.PDF**), fits 2.8 million transistors onto a 120-mm<sup>2</sup> die and consumes up to 9 W, while the PPC 603 (see **071402.PDF**) fits 1.6 million transistors onto an 85-mm<sup>2</sup> die that consumes up to 2.5 W. The new IBM processor continues to reduce these three factors, fitting 585,000 transistors onto a die of less than 40 mm<sup>2</sup> and keeping power consumption down to about 1 W. Both new chips will be characterized at 3.3 V and below. The IBM and Motorola chips have maximum speeds of 33 MHz and 40 MHz, respectively, much slower than the 66- to 100-MHz clocks of the 600-series devices.

#### Single-Issue PowerPC Cores

The new devices use execution units that are similar to, but fewer in number than, those of their 600-series big brothers. Each of these embedded processors incorporates a branch unit and a single integer-execution unit but uses smaller caches than the larger devices. Neither processor has an MMU, and only the 505 has an FPU. They are superscalar only in that they can execute integer code and branches simultaneously: both are singleissue machines. Still, these new devices are PowerPC compliant—they are binary-compatible with generalpurpose 600-series chips for user-mode code.

The first inkling that an embedded PowerPC processor was in the works came when Ford dropped its plans to use a variant of Motorola's 88000 (see **0604MSB.PDF**) in favor of PowerPC. The core of Motorola's RMCU 505 is the same as that in the PowerPC processor the company is codeveloping with Ford Electronics Division. Details of the Ford chip have not been disclosed, but its peripherals may be substantially different than the 505's because it is intended for automotive powertrain control.

The primary roles for these new processors will be in consumer electronics, computer peripherals, and communications devices. Neither company mentioned using either of these embedded processors in PDAs; at close to one watt, their power dissipation is a bit too high. Also, the lack of an MMU precludes running the Newton OS, which Apple has said it will port to PowerPC. We expect that PowerPC PDAs will use some future variant with an MMU and lower power consumption.

The IBM chip provides a generally useful processor that requires minimal glue logic, as Figure 1 shows. Its bus-interface unit can directly control DRAM, SRAM, ROM, and memory-mapped I/O. It even has a built-in, full-duplex serial port that can run at speeds up to 1/16 of the processor clock.

As Figure 2 shows, the 403's caches, at 2K for instructions and 1K for data, may seem small, but both are

#### MICROPROCESSOR REPORT

two-way set-associative, and the data cache employs a write-back policy. The cache line size is 16 bytes. The chip includes a four-channel DMA controller that can transfer data directly between ROM, DRAM, SRAM, and I/O while the processor is executing code from cache.

The 403's integral memory controller defines the first 256M as DRAM and the second as SRAM and I/O. The bus interface operates in big-endian mode only, with 256M assigned as DRAM space and a 256M combined space for ROM, SRAM, and I/O. To simplify marking parts of the address space as cachable or noncachable, the device uses the high-order address bit. Thus, each 16-byte line of memory appears at two addresses exactly 2G apart. The cache control register determines, on a line-by-line basis, whether a 1 or a 0 in the high-order address bit indicates cachability.

The 403 can trap and emulate FPU instructions. It is a requirement of the PowerPC architecture that, from the software's point of view, emulated floating-point operations be indistinguishable from operations performed on a hardware FPU.

Figure 3 shows that, in addition to its FPU, Motorola's chip has a large 4K instruction cache and uses 4K of SRAM instead of a data cache. Its interface to the outside world is provided by Motorola's SIU (System Integration Unit), which incorporates the bus interface along with typical system logic. The SIU is also found on 68300-family processors and provides an interrupt controller, a JTAG port, and 12 chip selects that are each programmable for 0–7 wait states. Its protection block contains watchdog timers that detect a problem on the external bus or in the processor itself. The processor clock is based on a 4-MHz external clock that can be scaled up or down under software control via a built-in PLL. Software can change the scaling factor dynamically.



Figure 1. IBM's 403GA is developed from the PowerPC architecture. It can execute branches in parallel with integer operations, but unlike the 600-series PPC chips, it is a single-issue machine.

#### Special Modes Conserve Power

Each chip uses a fully static CMOS design that saves power by not clocking function units that are idle. Each can also run at any arbitrary frequency below the maximum, which saves power but reduces performance in proportion. Production versions of the 505 will be capable of running at 40 MHz with a 3.3-V supply and can run at 25 MHz at 2.2 V; initial samples will be limited to 25 MHz. Motorola plans versions of the 505 that operate at even lower voltages.

The chips also have several power-saving modes. The 403, for example, has commands for wait and sleep, while the 505 adds a doze mode. In wait mode, much of the processor is shut down, with only some of the clock circuitry and timer facilities operating (including DRAM refresh on the 403). The processor can be reawakened by a timer or external interrupt, or by a reset.

The 505's doze mode conserves more power by shutting off everything but the chip's clock; waking the processor requires just a few cycles. Sleep mode shuts off the device completely so the external clock can be stopped, leaving the chip to draw only a few microamps of leakage current. Awakening from this state requires applying the clock for hundreds of cycles to ensure that the internal PLL circuitry is synchronized and locked.

Interrupt latency is another important facet in embedded systems. Software-related factors can increase a processor's interrupt-response time by orders of magnitude, but clever designers will identify critical interrupt



Figure 2. This plot shows the 403's 585,000 transistors on its 39mm<sup>2</sup> die. The fully static device will be built in a 0.5-micron process with three metal layers.

#### MICROPROCESSOR REPORT

code and optimize it to minimize response time, letting the processor's hardware limits come into play. The 403 and 505 both have a minimum interrupt latency of 3 clocks and a worst-case latency of less than 39 clocks. Both manufacturers estimate that the typical interrupt latency (from the time the IRQ line is asserted to the time the interrupt's first instruction is executed) should be between three and eight clocks.

#### Third Parties Help with Tools

It will be interesting to watch the evolution of development systems and tools for PPC processors. Not only are Motorola and IBM individually working to provide development tools, but the two have collaborated with Apple to bring PPC processors to the desktop market. The combination of the three companies promises a strong infrastructure of third-party support. Having a desktop system that runs the same native code as the embedded target system is a major attraction to some developers; being able to buy it at the corner computer store is a bonus. There is immense opportunity for synergy among third-party tool makers.

Both new chips sport JTAG ports with extended capabilities that allow developers to trace code in real time. Extra pins indicate address changes to an external analyzer, so users can follow the execution of code even if no external cycles occur and without affecting the performance of the processor. This method overcomes the speed limitations of the standard JTAG serial interface. Other options include control of the processor (halt, start, step, etc.), access to register and memory contents, and a source-level debugger.

IBM will offer a development system hosted on RS/6000 machines under AIX, with Sun and PC-compatible tools slated to be available in 1Q95. Motorola will offer similar tools beginning 4Q94. IBM is also pushing its OS/Open software, the first real-time operating system for the PowerPC architecture, which conforms to the POSIX standard. It may seem surprising but, initially, neither company plans to introduce tools that run on PowerPC-based Macintosh systems; those efforts are being made by third parties.

#### Embedded Designs Seek Low Cost

It is hard to accurately model manufacturing costs for devices this small: the MPR Cost Model (*see* 071004.PDF) estimates that IBM's PPC 403GA costs about \$15 to make. Of this, only about \$5 is due to the die; the rest is due to the package, test, and assembly, costs that could be smaller for such a tiny chip.

Without an accurate die size for the 505, the crystal ball is even less clear. But we can make a rough estimate by using the  $85\text{-mm}^2$  PPC 603 as a starting point—its function blocks are similar and it is built in a similar process. By eliminating the function blocks not present in



Figure 3. Even though it issues a single instruction per cycle, Motorola's RMCU 505 can concurrently execute integer, floating-point, branch, and load/store operations.

the 505, we estimate that its die size is about 50 mm<sup>2</sup>, making its manufacturing cost roughly \$20. The 505's cost is much smaller than the 603's, mainly because the former uses a small plastic package while the latter uses a 240-pin ceramic package. The smaller die also helps, but even if the 505 die is as large as 65 mm<sup>2</sup>, its cost would be only about \$25.

Pricing policies for these new chips will also be somewhat different than the 600 series because embedded systems are more price-sensitive. Where volume prices are four to five times the estimated manufacturing cost for the desktop chips, a factor of two or less is more common for embedded processors. Motorola has already quoted \$75 for samples of the 505 in 100-unit quantities. Given our cost estimates, the 505's production price could fall by half and still maintain a healthy margin for the company.

The two chips are different enough to find separate niches. The 403's lower performance and higher integration aim it at low-cost systems, while the 505's FPU targets higher-performance embedded applications.

#### Price/Performance Is Tough to Beat

Comparing these new processors to likely competitors poses a challenge—there are no direct comparisons. The embedded PPC chips have better performance and should cost less than other high-end embedded processors on the market today. To put things in perspective, the 505 would be placed high in Motorola's 68000 line, delivering 20% more Dhrystones per second than a 68EC040 at the same clock speed, but the 'EC040 is priced at over \$70 in 1,000s. By the same measure, a 33-MHz 403 has about the same performance as a 40-MHz

|                    | IBM<br>403GA       | Motorola<br>505 | Intel<br>i960CA     | AMD<br>Am29240    |
|--------------------|--------------------|-----------------|---------------------|-------------------|
| Clock (MHz)        | 25, 33             | 25, 40          | 16, 25, 33          | 20, 25, 33        |
| Cache              | 2K I, 1K D         | 4K I, 4K D      | 1K I                | 4K I, 2K D        |
| MMU                | no                 | no              | no                  | yes               |
| FPU                | no                 | yes             | no                  | no                |
| Fully Static       | yes                | yes             | no                  | yes               |
| DMA Channels       | 4                  | none            | 4                   | 4                 |
| Interrupts         | 6 IRQs             | N/A             | 4 IRQs              | 4 IRQs            |
| Bus Sizing         | 8, 16, 32          | 16, 32          | ?                   | 8, 16, 32         |
| Supply Voltage     | 3.3 V              | 3.3 V           | 5 V                 | ? V               |
| Power Dissipation* | 1 W                | 0.7 W           | 2.2 W               | ? W               |
| Dhrystone MIPS*    | 39.0               | 46.4            | 37.0                | 26.4              |
| Transistors        | 585,000            | N/A             | 675,000             | ?                 |
| Die Size           | 39 mm <sup>2</sup> | N/A             | 147 mm <sup>2</sup> | ? mm <sup>2</sup> |
| IC Process         | 0.5 μ              | 0.5 μ           | 0.8 μ               | ?μ                |
| Metal Layers       | 3                  | 3               | 2                   | ?                 |
| Package            | 160 PQFP           | 144 PQFP        | ? PQFP              | 196 PQFP          |
| Price (1,000s)     | N/A                | N/A             | \$92                | \$92              |
| Mfg. Cost (est.)   | \$15               | \$20            | \$?                 | \$?               |

Table 1. The two embedded PPC processors should have lower prices and similar or better performance levels, than their competitors. \*At maximum clock frequency. (Source: vendor data)

'EC040, but the 403 integrates a memory controller and several useful interfaces, again for a lesser cost.

When compared with Intel's popular i960 line, the embedded PPC devices fall into a large price gap between the under-\$25 K- and S-series chips and the above-\$90 C-series. Table 1 shows that the i960CA has a function set similar to the 505's, but a 33-MHz i960CA has about 20% less performance than a 40-MHz 505 and costs more than \$100 in 1,000s. The fastest i960, the superscalar 40-MHz CF, has about 30% greater performance than the 505 but costs \$160.

Although the i960CA contains a DMA controller, its transfers cannot occur simultaneously with processor execution. Also, the i960 cache uses a write-through policy, so writes have a greater chance of stalling the processor when running with a slow memory system. The 403's integrates more on-chip peripherals, which would



Figure 4. The 403 is just the first of a family of embedded processors planned on IBM's PPC roadmap.

save tens of dollars in parts costs, but the i960 is well established and has a wider range of variants. IBM makes the point that third-party companion chips designed for the i960 can be connected to the 403 with less glue logic.

Similar points can be made for AMD's 29000 family. Some versions integrate a set of functions similar to the 403 and some have lower manufacturing costs, but only the more expensive can match the performance of the embedded PPC chips. The 29240's set of features is similar to the 403's, but at its maximum clock frequency (33 MHz), a 29240 produces about two-thirds the 403's performance of 39 MIPS.

It takes a processor like NEC's R4200 running at 80 MHz to reach the cost/performance levels of the embedded PowerPC chips. The R4200 has better integer performance than the 505 but costs about \$40 to manufacture and uses twice the 505's 0.7 W. But both Intel and AMD are likely to announce new embedded processors before the PPC chips begin shipping, and it remains to be seen how they stack up.

#### Roadmap Goes in Both Directions

Both Motorola and IBM plan to extend their families of embedded PPC processors. The companies have roadmaps that show both lower-cost processors and higher-performance processors, and each points out that these initial PPC cores can be shrunk by moving to more advanced IC processes. They also indicate a willingness to provide PPC cores as megacells for use in ASICs.

Figure 4 shows that, in addition to the 403, IBM has planned at least two more processors, each with its own core. The 401 is intended to bring the PowerPC architecture to the ultra-low-cost realm, while the 405 will have more execution units and will therefore offer better performance than the 403.

Motorola has a large library of peripherals that it can combine with its core in future versions—everything from simple blocks of ROM or flash EEPROM to TPUs (time processing units) or the communications modules from the popular 68360 processor. Motorola has already redesigned and resized its proprietary IMB (intermodule bus), developed for the 68300 family, to work with its embedded PowerPC core.

Leveraging existing peripherals is an advantage for customers as well: I/O drivers written in C for peripherals on a 68300-series chip should be easy to port for the same peripherals on an embedded PPC processor. Motorola is currently working with a customer to employ the IMB and a custom mix of peripherals with the 505 core.

#### A Potent New Embedded Competitor

In the five years since RISC processor vendors first began attacking embedded applications, they have made major strides in the high end of that market. Intel's 960 family and AMD's 29000 family dominate the embedded

#### MICROPROCESSOR REPORT

RISC market share, although both trail Motorola's 68K line in the overall 32-bit market. Despite many technical and marketing differences, the Intel and AMD architectures have one key factor in common—a total focus on the embedded market. Both companies have invested tremendous resources in sponsoring the creation of development tools and soliciting design wins.

To expand the volumes of their architectures, vendors of desktop RISC chips have also sought embedded customers. None has matched the volumes achieved by Intel or AMD, although MIPS has come the closest. This lack of success is probably due more to a narrower range of development tools and more modest marketing efforts than to any technical differences in the architectures.

PowerPC is yet another desktop RISC seeking to enter the embedded market. Initially, it will face some of the same constraints that have limited the success of MIPS and others in such applications. In terms of the architecture itself, PowerPC has no particular advantages, and its complexity is somewhat of a hindrance.

PowerPC has some significant marketing advantages, however. Perhaps the biggest is that Motorola, the dominant supplier of midrange embedded processors, is one of its backers. PowerPC will serve as a new high-end core for Motorola's integrated processor family, filling a role originally planned for the 88K. The PPC core can be combined with the peripherals already developed for the 68300 family, potentially yielding a broad range of implementations fairly quickly.

Even more important may be Motorola's relationships with hundreds of embedded processor users. The Ford design win for PPC, inherited from the 88K, is one example of a relationship-based design win, and it guarantees PPC significant volume. It remains to be seen, however, how aggressively Motorola chooses to push its customers toward PowerPC. Once customers decide to consider changing architectures, they are going to look at all the alternatives—not just those from Motorola—so the transition can be dangerous.

The 68000- and CPU32-based embedded processors will continue to rule the low end; PPC-based chips will be alternatives to devices based on the 68040 and 68060

### Price & Availability

PPC 403GA samples will be released to selected beta sites in June, with general sampling planned for 3Q94. Production volumes of both 25- and 33-MHz processors will follow in 4Q94. The chip's price has not yet been announced. The evaluation circuit board will be available on the same schedule. For more information, contact IBM at 800.769.3772 or 708.296.6767.

Motorola indicates that it should begin sampling a 25-MHz version of the RMCU 505 in 4Q94. Samples will be priced at \$75. The chip's production date has not been set, but the company says its production price should be significantly lower than the sample price. For more information, contact Motorola at 512.891.3260.

cores. Given the information revealed so far, it appears that the PPC chips will overlap the performance of these chips at a lower manufacturing cost. Motorola must continue to provide an upgrade path for those customers who have an investment in 68K code and don't want to switch, while offering a RISC alternative for customers seeking the best price/performance at the higher end and ultimately the highest performance.

IBM lacks the experience in selling to a wide range of embedded CPU customers that is Motorola's greatest asset. But its resources are substantial, and its determination to be a major force in the merchant microprocessor market should not be underestimated.

With only two chips revealed so far, no volume pricing information, and limited technical details, it is too early to gauge how competitive the embedded PPC chips will be. By the time these chips are in production, the competitors will all have another generation of chips as well. Indications are, however, that the chips will be very competitively priced, and that their performance will be at the head of the pack. It will take time to build the required support infrastructure, develop a full range of chips, and capture design wins, but PowerPC will clearly be a force to be reckoned with in the high-end embedded processor market.  $\blacklozenge$