Source: Understanding IBM RS/6000 Performance and Sizing (SG24-4810-00)
The early POWER2 processor complex consists of eight semi-custom
chips partitioned in the same way as the POWER: the instruction cache unit
(ICU) which also processes branches, the FXU, FPU, four data cache units
(DCUs), and a storage control unit (SCU). The early models of the POWER2
were announced in September 1993 and are the basis of the POWER2 Super
Chip (P2SC) version. The POWER2 architecture expands the cache capacity,
doubles the number of
functional units, doubles the bandwidth of most buses, and quadruples
the cache to floating-point bandwidth.
The POWER2 architecture offers numerous performance gains:
· New instructions enhance floating-point bandwidth, square
root, and data conversion.
· User-level instruction set is a superset of the original POWER
set. Existing POWER binaries can
run unmodified on the faster systems.
· Existing POWER binaries gain the performance benefits of the
larger data cache, higher clock
rates, improved bandwidth, and additional
functional units.
· Recompiling can maintain portability while providing gains
from added compiler transformations
. When portability is not required, a recompiled application
can also exploit new instructions.
POWER implements one floating-point, one integer, and one
branch unit. It can execute four instructions per cycle (five operations).The
POWER2 multichip implements dual integer, floating-point, and branch units
that can execute six instructions per cycle (eight operations).
The architecture features high-performance, floating-point
storage access instructions, load quad word (128 bits) and store quad word,
which support all of the addressing forms of double-precision storage references.
The load quad word moves two adjacent double-precision storage operands
into two adjacent floating-point registers.
The instruction cache size is 32 KB, two-way set associative,
and the data cache is either 128 KB or 256 KB, four-way set associative
single line, depending on the RS/6000 model. The following are some other
differences in the POWER2 multichip compared to the POWER architecture:
· Eleven new instructions (giving a total of 195 instructions)
· Different page frame table format (hash instead of inverted
page frame tables)
· Page aliasing
· Support for floating-point imprecise mode
· Different interrupt mechanism
· Faster clock speed
· Doubles B/W from memory to scache (eight-word bus, 128 bytes)
· Double B/W from scache to sinteger (two single words,16 bit)
and FPUs (two quad words, 32 bit)
· Different alignment requirements for the quad word data
· A special data address break-point register
· A performance monitoring facility
Figure 13. POWER2 Eight-Word System
ICU Instruction Cache Unit
FXU Fixed Point Unit
FPU Floating Point Unit
DCU Data Cache Unit
SCU Sorage Control Unit
POWER2 Super Chip
The POWER2 Super Chip (P2SC) is a compression of the POWER2
eight-chip architecture into a single chip with increased processor speed
and performance. It retains the design of its predecessor, the POWER2.
The initial models have clock speeds of 120 MHz and 135 MHz.
High-density CMOS-6S technology allows each to incorporate 15 million transistors.
The most significant change is a halving of the size of the data cache
and the data TLB, which now are 128 KB and 256 KB, respectively. These
changes were required to fit the eight-chip processor onto a single chip.
The P2SC delivers the processing and dual floating-point
power needed for large, numeric-intensive tasks as well as the integer
and transaction performance for commercial applications. The P2SC contains
on-chip 32 KB instruction and 128 KB data cache and is full binary compatible
with the POWER2 architecture.
Figure 14. POWER2 Super Chip Module
ICU Instruction Cache Unit
FXU Fixed Point Unit
FPU Floating Point Unit
DCU Data Cache Unit
SCU Sorage Control Unit
The P2SC can issue and execute six instructions per cycle,
two of which can be floating-point multiply-add (FMA) instructions. It
supports register renaming and out-of-order execution, although only for
floating-point instructions. With dual branch units, the P2SC can also
execute two branches per cycle, although only one can be taken, while the
other is put on hold. The data cache is triple-clocked to handle two CPU
accesses (load or store) plus a cache refill (write to cache from main
memory). Thus, this part of the chip operates at 500 MHz.
The P2SC's great performance is directly associated with
the inclusion of up to 2 GB of DRAM across a 256-bit-wide interface. This
interface plugs the processor directly into a byte stream bus of up to
2.2 GB per second. The chip also integrates a 64-bit I/O bus for peripheral
interconnection.
>Can anyone tell me when where the MicroChannel Bus bottlenecks on
the RS/600 line?
Milton Miller
For the Combo boxes (520-560 except 550L, 320-340 and 350, 920, 930),
the Micro Channel and IOCC max out about 17MB/s (writes) and 19MB/s (reads).
The newer XIO IOCC mahcines (Power w/32k I-cache and POWER2 machines) max
out at 77MB/s (read or write per bus) sustained. These are based
on long (4k) block transfers by Bus Master devices, smaller blocks, slower
adapters, and PIO to initate transfers will lower achieved throughput.
Also, in Combo boxes PIO (loads and stores vs dma) had last priority,
in XIO they are hidden under grant (highest priority).
> (which models use the bus for memory access?)?
All the RS/6000 series (including the PowerPC boxes) have a bridge
from the IO bus to a system bus which gives access to memory for DMA (none
use the Micro Channel for system memory).
In systems with dual Micro Channel busses, both buses can operate at
full bandwidth at the same time
Faster XIO instead of I/O Channel Controller IOCC (Micro Channel
controller)
Frank Kraemer
>IBM ALSO has a link which IS proprietary called Serial Optical. This
link hangs DIRECTLY OFF the CPU ( as opposed to all of the other network
types which run via the Micro-channel ). It is this serial optical link
that IBM itself uses to create parallel computing arrangements of RS/6000s.
Serial Optical is only available on certain models of RS/6000. I think
it is: 930,970,980,560, and 580 - maybe 540 & 550 also have it - I
can't remember.
Serial Optical Link (called SOCC) IS proprietary - YES, but it's driven
from the IOCC (I/O Controller Chip). The IOCC is attached to the CPU/RAM
channel and drives the MicroChannel bus and the SOCC ports. It offers about
twice the FDDI speed and is very very cheap, yes I know IBM and cheap are
two different words ;-), but it's true. One SOCC adapter has two ports.
You can have 1 SOCC adapter in all 5xx systems (2 ports) - all 9xx systems
support 1 SOCC adapters (4 ports). The SOCC adapter does not need a MicroChannel
slot.
+--------+
+--------+
! CPU ! =========================!
RAM !
+--------+
!!
+--------+
!!
!!
40 MB/s MC or !!
80 MB/s MC +--------+
SOCC !--- SOCC Port 1
+--------!
IOCC ! =======+
(220 Mbit/s)
!
+--------+ !--- SOCC Port
2
!
!
+------
Micro Channel ......
!
+------
Micro Channel ......
!
+------
Micro Channel Adapter for FDDI (100 Mbit/s)
!
+------
.....
The main disadvantage of SOCC is the point to point connection. Without
any special switch you can connect only 3 boxes (5xx) together. This is
a good solution for some kind of compute cluster,
but if you think of clustering more boxes the NSC switch is big money.
So the best solution is the combination of SOCC and FDDI:
....===================================....FDDI/Ethernet/TR
! !
!
X X
X
/
\ / \
/ \
/ 1 \
/ 2 \ / 3 \
SOCC clusters 1,2,3
X-----X
X-----X X-----X
!
! ! !
! !
....=====================================...FDDI/Ethernet/TR
before I stop, SOCC runs TCP/IP....
|