Source: IBM RISC System/6000 Technology: Volume II, Sept. 23, 1993 (SA23-2619-00)
Authors: Charles R. Johns and Taggart Robertson
The POWER GXT100 and GXT150 are 2D graphics adapters based
on the Pixel Accelerator for X (PAX) 2D rendering engine and the IBM RGB
530 Palette Digital to Analog Converter (Palette-DAC). The PAX engine draws
or renders the graphical data sent from the 601 processor complex and the
RGB 530 displays the data on the screen.
The GXT100 has enough memory to support 1024 x 768, GXT150
has added memory to support up to a 1280 x 1024 screen and to utilize the
multiple color palettes in the IBM RGB 530 Palette-DAC. Both the GXT100
and the GXT150 are 8-bit adapters which require AIXwindows Environment/6000
Version 1.2.5 2D and AIX Version 3.2.5.
Design Goals and Considerations
The major goal of the PAX 2D rendering engine is for entry
workstations. The architecture is dedicated to accelerating X Window applications.
The architecture of a graphics subsystem can influence X Window performance
and time to market in several areas. The PAX architecture targeted three
areas: efficient interfaces, simple programming model, and rendering speed.
Graphics Subsystem Organization
The POWER GXT100 and GXT150 consists of the PAX 2D rendering
engine, the IBM RGB 530 Palette-DAC, 1M to 3M bytes of frame buffer memory,
and the initialization Read Only Memory (ROM). Figure 1 shows the complete
block diagram and the major interfaces of the graphics subsystem. The PAX
chip serves as the graphics accelerator as well as the system interface
for the adapter. It contains the control for accessing the initialization
ROM, the Palette-DAC control port, and the video RAM (VRAM) parallel port.
Graphics Subsystem Block Diagram
PAX attaches directly to the PowerPC 601 Microprocessor
bus. Attaching directly to the
PowerPC 601 bus improves performance by avoiding the latency and synchronization
overhead created when converting from one bus to another. The 601 bus is
a split address and data bus with 32 bits of address and 64 bits of data
[1]. The processor uses byte, half-word, or word operations to access the
graphics subsystem. Based on the address of the operation, PAX directs
the access to one of four locations: the internal registers, the frame
buffer (i.e. the VRAM),
the RGB 530 Palette-DAC, or the initialization ROM.
PAX Block Diagram
The frame buffer is constructed of specialized memory called video RAM
(VRAM). These
memory devices contain key features for increasing graphics performance
including:
- Serial output port
- Block Write
- Write per bit
The serial output port is a dedicated high speed interface
used by the Palette-DAC for scanning pixels out of the frame buffer VRAM.
This interface eliminates the overhead of refreshing the screen from the
parallel port which is often required with more conventional DRAM interfaces.
Block Write is a unique feature of VRAM which allows a
constant color to be written to multiple locations within a single write
cycle. This feature allows PAX to render up to 32 pixels in a single write
cycle.
The Write per bit feature provides a write mask which the
rendering engine uses to select which bits of the pixel are to be updated.
This feature eliminates costly read / modify / write operations.
PAX's 32-bit frame buffer architecture exploits these VRAM
features, as well as employs advanced interleaving techniques referred
to as Pixel Interleaving [2] and Load Clock Interleaving to enhance performance.
The architecture supports 1M to 5M bytes of memory which provides the capability
to support screen resolutions ranging from 1024 x 768 to 1280 x 1024. With
2M bytes or more of memory, PAX can support a double buffer display.
The IBM RGB 530 Palette-DAC serves as the video controller.
It provides four color palettes, video output (red, green, blue), display
timings, VRAM serial port control, and two hardware cursors. This device
also provides an on-chip programmable Phase Locked Loop (PLL) which allows
the graphics subsystem to support a wide range of monitors with varying
timing requirements.
As the block diagram and descriptions illustrate, most
of the logic required for the graphics subsystem is integrated into the
PAX and the IBM RGB 530 custom chips. This level of integration is key
to keeping the cost of the subsystem to a minimum while enhancing the performance
and function.
Programming Models
The PAX chip supports two different programming models: Direct Frame
Buffer Access (DFA) and "Poly" commands.
Direct Frame Buffer Access allows the Frame Buffer to be
accessed by the PowerPC 601 processor as if it were part of system memory.
This simplified interface is very effective in reducing the X Window software
development time. X Window software development uses DFA to directly map
the X Window System code received from Massachusetts Institure of Technology
(MIT) into the initial device driver for the adapter. Then, X Window software
development
optimizes the device driver to exploit the hardware accelerated functions.
DFA is also key to the performance of certain X Window commands such as
points, circles, and complex area fills because there is no additional
hardware acceleration provided for these primitives.
The "Poly" command interface provides a rich set
of rendering instructions which complement the DFA programming model. The
word "Poly" refers to the ability to render multiple primitives of the
same type with only one command. The ability to render multiple primitives
with only one command eliminates the overhead of sending a new command
with each primitive. The command set is designed to map closely to the
X Window protocol. The performance of accelerated functions using the "Poly"
command interface is significantly faster than DFA. Some of these accelerated
functions include line draw, area fill, and bit block transfer.
Rendering Engine Architecture
The PAX rendering engine architecture includes rendering functions,
rendering attributes, and 3D Application Programming Interface (API) assist
functions.
Rendering Functions
The PAX architecture consists of several processing units for accelerating
X Window System rendering commands. These processing units are used to
accelerate:
- Line Draw
- Points
- Area Fill
- Bit Block Transfer
See Figure 2 for a block diagram of the internal architecture. These
processing units provide the X Window server (X server) with dedicated
hardware to accelerate the rendering of lines, points, and area fills.
In addition to rendering, PAX also provides assistance for moving pixels
to and from system memory or to and from another screen location. This
function is referred to as Bit Block Transfer (Blits).
Vertices define the boundary of the region to be rendered.
These vertices are included in "Poly" commands. All vertices for these
drawing commands can be sent as 16-bit, two's complement, window relative
coordinates. This 16-bit vertex provides the X server with a 64 K x 64
K virtual screen. The origin of this virtual screen space is at the center,
thus allowing for both positive and negative X and Y addresses.
Line Draw
The line draw engine provides the rasterization of a line
between two points provided by an application. A technique known as Bresenham's
line algorithm [3] is used to render the line. PAX supports two commands
for rendering lines: Poly Line and Poly Segment.
Every vertex sent with the Poly Line command defines a
new line. This is useful for drawing connected lines. The Poly Segment
command requires two vertices to define a line. This command is used to
render multiple non-connecting lines.
The X Window protocol supports styled lines. These are
referred to as OnOffDashed and
DoubleDashed lines [4]. An OnOffDashed line appears as a line with
sections, or dashes, rendered in the foreground color and sections not
rendered. DoubleDashed lines appear as a line with sections rendered in
the foreground color and sections rendered in the background color.
PAX supports a set of dash counters which allow the server
to define a line style with up to 8 unique segments or dashes. The X server
selects between Dashed and Double Dashed by enabling transparent rendering.
The major performance bottleneck for the line generation
logic is the frame buffer. As mentioned earlier, PAX employs Pixel Interleaving
and Load Clock Interleaving to reduce this bottleneck.
Points
Points are rendered using either DFA or "Poly" point commands.
Points are not accelerated since there is no significant processing required
to generate them. Instead, points are provided a fast path through the
hardware so that they are rendered into the frame buffer at the maximum
frame buffer bandwidth.
Area Fill
PAX supports four types of area fills: Spans, Triangles,
Rectangles, and Quadrilaterals. A
unique command is dedicated to each type of area fill operation to
reduce the number of stores required. The Block Write VRAM feature is used
to increase the fill rate for these objects. The Block Write feature allows
PAX to write up to 32 pixels in one memory cycle - four times faster than
normal writes.
Spans are a continuous row of pixels. These are the basic
area fill primitives. All other types of area fills are broken down into
spans by the internal area fill logic.
The triangle and quadrilateral fill logic uses the line
logic and another simpler line generator to find the edges of the area.
Since only two line generators (edge walkers) are available, PAX can only
support quadrilaterals which are convex in the Y direction. See Figure
3 for examples of the
supported and unsupported areas.
Rectangles are a special case of a quadrilateral and are
handled separately. Special casing rectangles provides additional performance
since the overhead of walking the edges is eliminated. This extra performance
is beneficial to window management and clearing areas on the screen.
Quadrilaterals, rectangles, and triangles can
be drawn in one of two modes: X Window compliant or Full Fill. The X Window
mode draws the fill area such that when an object is connected to other
objects along an edge, no pixel is written twice. This is accomplished
by
rendering pixels whose center lies inside the area, on the left or
top edge, but not pixels on the right and bottom edges. Full Fill mode
draws all pixels including the edges.
Figure 3 Area Fill Examples
Bit Block Transfer
Bit Block Transfers (Blits) are a hardware assist for
the movement of blocks of pixel data. PAX supports three types of Blits:
Screen to Screen, System to Screen, and Screen to System. They are used
extensively by the X server to accelerate the movement of windows.
Screen to Screen Blits are used to copy a block of pixels
from one location on the screen to another. PAX automatically handles overlapping
source and destination blocks so that the source block appears correctly
at the destination.
Screen to Screen Blits continuously switch between frame
buffer reads and writes. These transitions drastically reduce the usable
frame buffer bandwidth. PAX has an internal buffer to reduce the number
of transitions which increases the utilization of the frame buffer's bandwidth.
System to Screen and Screen to System Blits are used to
copy pixels between system memory and the frame buffer. There are two modes
of operation for these commands: Direct and Indirect. In Direct mode, system
software controls the pixel transfer from system memory and PAX only controls
the frame buffer address. In Indirect mode, the PAX chip becomes a master
of the PowerPC bus and completes the transfer of data with no processor
intervention. The software in
this case only supplies the source and destination addresses.
PAX supports a special System to Screen Blit mode which
accelerates character performance. When operating in this mode, each bit
of the data sent is interpreted as a pixel. The hardware renders the foreground
color for all the 1s in the data word. The background color is rendered
for all the 0s in the data word if transparency is disabled and nothing
is rendered if transparency is
enabled. With transparency enabled, this function uses the Block Write
VRAM feature.
Rendering Attributes
PAX also supports a variety of rendering attributes. These
attributes can be applied to both the DFA and "Poly" command programming
models. Some of these attributes are listed below.
- Boolean operations
- Window Management
- Stipple
These attributes modify and control the rendering functions by modifying
vertices and pixel generation. Different attributes apply at different
stages in the rendering process.
Boolean Operations
The X Window protocol specifies
the rendered pixel to be a combination of the pixel's current color (destination)
and the source color. One of 16 logical functions can be selected, all
of which are supported by the PAX architecture. Below is a list of the
logical functions.
- Clear (0)
- Set (1)
- Destination (D)(NoOp) - Source (S) (Copy)
- !S
- !D
- S & D
- S | D
- S & !D
- S | !D
- !S & D
- !S | D
- !S & !D
- !S | !D
- S^D
- !(S^D)
By implementing Boolean operations in hardware, the X server
is relieved of the slow task of reading the frame buffer, modifying the
color, and writing the new color to the frame buffer. This function is
extremely important to applications which require Boolean operations.
Windows Management
Applications running under the X Window System request
a window from the X server. The user is free to resize and move these windows
at any time. Since multiple windows may be open on the screen, there is
a possibility a window may be partially obscured by others. It is the X
server's responsibility to manage these windows so that pixels rendered
to the obscured sections are not visible. PAX supports the following functions
which assist the X server:
- Rectangular Clippers
- Clipping Planes
- Window Origin Offset
- Window ID planes
The Rectangular Clipping logic consists of four pairs of
extent registers. Each pair defines a rectangular region with either an
inclusive (all pixels inside the region are rendered) or exclusive (all
pixels inside the region are NOT rendered) attribute. The X server uses
these registers to define the window's geometry to the PAX chip so that
pixels in the obscured sections of the window
are clipped.
Four regions are not always enough to define a window's
geometry. Such is the case when the window is obscured in four or more
unique areas or when shaped (i.e. non rectangular) windows are used. For
these cases, the X server may render the window's geometry to off screen
memory, referred to as clipping planes, with a unique ID. PAX then reads
the pixel's corresponding clip plane value and compares it with the clipping
ID to determine if the pixel should be written to the
frame buffer.
Applications draw using a window relative coordinate system.
Vertices in this system must be converted to screen coordinates before
they can be used to render an object. This conversion is accomplished,
in hardware, by adding the Window Origin Offset to every pixel before it
is rendered. Providing this capability in the PAX chip eliminates
the task of converting the coordinates in software which increases the
rate at which the vertices are sent to PAX. Overall, adding the window
origin offset increases the performance of processor bound primitives such
as points.
Some applications require a different color palette than
the default. If only one palette is available, the colors of other windows
will change when focused on these types of applications. The PAX chip supports
an additional four planes of memory, referred to as Window ID planes, which
allow the X server to select a unique palette on a per pixel basis. These
planes also identify other attributes of the pixels such as: frame buffer
select, pixel type, and overlay plane enable.
Stipple
The X Window protocol allows a fill pattern, such as a
checkerboard, to be applied to the
objects rendered [4]. This pattern is referred to as the Stipple pattern.
The pattern can be transparent or opaque. Transparent patterns result in
the foreground color being rendered where there are 1s in the pattern and
nothing rendered where there are 0s in the pattern. The only difference
for an opaque pattern is that the background color is rendered where there
are 0s in the pattern.
PAX supports a fixed 16 x 16 Stipple pattern. This pattern
is addressed by the four least significant bits of the window coordinate
of the pixel to be rendered. The value at that location is the stipple
value for that pixel. PAX supports stippling for all rendering operations.
Applying a transparent stipple pattern to an area fill
operation does not effect the ability to use the Block Write function.
However, applying an opaque stipple prevents the use of Block Write since
two colors must be rendered (foreground and background). This normally
reduces the performance of large opaque stippled objects by a factor of
approximately four. To prevent such a drastic drop in performance, the
PAX architecture employs a unique feature called Stipple
Invert. This feature allows an opaque stippled object to be rendered
twice using Block Write. The second time it is rendered, the stipple pattern
is inverted and the foreground and background colors are swapped. This
simple feature almost doubles the performance for opaque stippled objects.
3D API Assist Functions
PAX supports a few functions which are intended to enhance
the performance of 3D
Application Programming Interfaces (APIs). These additional functions
are listed below:
- Anti-Aliased Lines
- Sub pixel positioning of lines
- 24-bit RGB to 8-bit RGB Dither
The Anti-Aliased line draw function provides the system
software with the ability to render lines without the familiar problem
of "stair steps" or "jaggies" (i.e. aliasing). This function uses a proprietary
two pixel approximation technique to visually remove the aliasing caused
by the discrete pixels on a raster display.
Sub pixel positioning of lines allows the system software
to more precisely place the anti-aliased lines on the display.
Dithering is a technique which trades spatial resolution
for more color resolution. Essentially a 24-bit color value is converted
to an 8-bit value and then slightly modified based on the pixel's position
in the window. The overall appearance is that the graphics subsystem has
more than 256 colors.
|