## TI Previews High-Performance Video Engine Multiprocessor-on-a-Chip Enables Two-Way Video Conferencing

## By Linley Gwennap

Texas Instruments (TI) has begun disclosing information about its next-generation video processor, publishing a paper describing a new video-processing architecture. The design combines multiple processors and memories on a single chip, with a goal of achieving up to two billion operations per second. This performance will enable desktop video conferencing and other multimedia applications requiring fast JPEG, MPEG, and P×64 decoding and encoding.

The company is not yet ready to announce the first chip that implements the new architecture and would not discuss price or availability of that part. A formal announcement is expected in the summer, with parts shipping by the end of the year. Specific details about the chip remain unavailable, but the paper (see *IEEE Computer Graphics & Applications*, Vol. 12, No. 6) presents a detailed description of the chip's architecture.

## Five Processors Included on One Chip

Figure 1 shows a block diagram of the new chip. It starts with a "master" processor (MP), described as a general-purpose, 32-bit RISC CPU. The MP includes an IEEE floating-point unit with a single-cycle, singleprecision multiplier and a 64-bit ALU. TI has chosen to define a new instruction set for the MP rather than using an existing CPU core. In addition to the standard threeoperand RISC instructions, memory operations, and delayed branches, the MP includes a set of vector floating-point instructions and instructions to find the first or last "one" in a word. It uses a Harvard architecture internally with a 32-bit path to the instruction cache and a 64-bit data-cache bus.



Figure 1. TI video processor architecture with four parallel processors. Only data paths are shown.

The architecture also allows for up to eight parallel processors (PPs) that are designed specifically for intensive signal-processing tasks. The first implementation will have four PPs on the chip, each of which uses a 64-bit, DSP-like instruction word that can launch 3 to 15 operations per cycle. The PPs contain special hardware for pixel expansion, rotation, and extraction, along with a 32-bit ALU and a  $16 \times 16$ -bit integer multiplier.

Each processor (the MP and PPs) includes a private instruction cache. Each processor has its own data memory, but can also access other processors' memory through a crossbar switch. Each processor's data memory is highly interleaved to avoid bank conflicts. The global crossbar can support up to five independent transactions simultaneously; the total memory bandwidth with four PPs is up to 84 bytes per clock cycle. Finally, the chip includes a bus interface that can perform DMA to any of the on-chip memories through the crossbar.

The master processor is intended to supervise and distribute tasks to the PPs. It can also be used for audio and graphics algorithms that require 32-bit precision or floating-point math. The PPs will efficiently handle pixel operations as well as DCT-based video algorithms such as JPEG, MPEG, and P×64. TI expects the chip to perform real-time P×64 encoding and decoding at 30 frames per second (with  $352 \times 288$  resolution), which it estimates will require over 1.2 billion operations per second.

Although TI would not provide specifics, it says the new chip will be comparable to SuperSPARC in size, meaning around 250 mm<sup>2</sup> and 3 million transistors. We can guess that the total on-chip memory will be about the same size as SuperSPARC's 36 Kbytes, and the clock rate will also be around 40–50 MHz. The chip will be built in TI's 0.6-micron, 4M-bit DRAM process.

OEMs may find it difficult to write software for such a complex design. Although TI will presumably provide compilers and basic software, some knowledge of the arrangement and abilities of the different processors may be needed to achieve maximum performance with the new chip. OEMs that want to add value by writing their own software modules will be challenged.

One competitor will be IIT's next video processor, a single-chip design also due late this year. IIT, however, may not be able to match TI's performance. Intel's cancellation of its i750-successor leaves them out of this market. While TI should do well in the high end of the market, its recent partnership with C-Cube shows that TI does not plan to compete with low-cost products such as C-Cube's dedicated video-compression chips.◆