1.9. 坐标转换 Coordinate Transforms
The purpose of the OpenGL graphics processing pipeline is to convert threedimensional
descriptions of objects into a two-dimensional image that can be displayed. In many ways, this
process is similar to using a camera to convert a real-world scene into a two-dimensional print.
To accomplish the transformation from three dimensions to two, OpenGL defines several
coordinate spaces and transformations between those spaces. Each coordinate space has some
properties that make it useful for some part of the rendering process. The transformations
defined by OpenGL afford applications a great deal of flexibility in defining the 3D-to-2D
mapping. For success at writing shaders in the OpenGL Shading Language, understanding the
various transformations and coordinate spaces used by OpenGL is essential.
In computer graphics, MODELING is the process of defining a numerical representation of an
object that is to be rendered. For OpenGL, this usually means creating a polygonal
representation of an object so that it can be drawn with the polygon primitives built into
OpenGL. At a minimum, a polygonal representation of an object needs to include the
coordinates of each vertex in each polygon and the connectivity information that defines the
polygons. Additional data might include the color of each vertex, the surface normal at each
vertex, one or more texture coordinates at each vertex, and so on.
In the past, modeling an object was a painstaking effort, requiring precise physical
measurement and data entry. (This is one of the reasons the Utah teapot, modeled by Martin
Newell in 1975, has been used in so many graphics images. It is an interesting object, and the
numerical data is freely available. Several of the shaders presented in this book are illustrated
with this object; see, for example, Color Plate 24.) More recently, a variety of modeling tools
have become available, both hardware and software, and this has made it relatively easy to
create numerical representations of threedimensional objects that are to be rendered.
Three-dimensional object attributes, such as vertex positions and surface normals, are defined
in OBJECT SPACE. This coordinate space is one that is convenient for describing the object that is
being modeled. Coordinates are specified in units that are convenient to that particular object.
Microscopic objects may be modeled in units of angstroms, everyday objects may be modeled
in inches or centimeters, buildings might be modeled in feet or meters, planets could be
modeled in miles or kilometers, and galaxies might be modeled in light years or parsecs. The
origin of this coordinate system (i.e., the point (0, 0, 0)) is also something that is convenient
for the object being modeled. For some objects, the origin might be placed at one corner of the
object's three-dimensional bounding box. For other objects, it might be more convenient to
define the origin at the centroid of the object. Because of its intimate connection with the task
of modeling, this coordinate space is also often referred to as MODEL SPACE or the MODELING
COORDINATE SYSTEM. Coordinates are referred to equivalently as object coordinates or modeling
coordinates.
To compose a scene that contains a variety of three-dimensional objects, each of which might
be defined in its own unique object space, we need a common coordinate system. This common
coordinate system is called WORLD SPACE or the WORLD COORDINATE SYSTEM, and it provides a
common frame of reference for all objects in the scene. Once all the objects in the scene are
transformed into a single coordinate system, the spatial relationships between all the objects,
the light sources, and the viewer are known. The units of this coordinate system are chosen in a
way that is convenient for describing a scene. You might choose feet or meters if you are
composing a scene that represents one of the rooms in your house, but you might choose city
blocks as your units if you are composing a scene that represents a city skyline. The choice for
the origin of this coordinate system is also arbitrary. You might define a three-dimensional
bounding box for your scene and set the origin at the corner of the bounding box such that all
of the other coordinates of the bounding box have positive values. Or you may want to pick an
important point in your scene (the corner of a building, the location of a key character, etc.)
and make that the origin.
After world space is defined, all the objects in the scene must be transformed from their own
unique object coordinates into world coordinates. The transformation that takes coordinates
from object space to world space is called the MODELING TRANSFORMATION. If the object's modeling
coordinates are in feet but the world coordinate system is defined in terms of inches, the object
coordinates must be scaled by a factor of 12 to produce world coordinates. If the object is
defined to be facing forward but in the scene it needs to be facing backwards, a rotation must
be applied to the object coordinates. A translation is also typically required to position the
object at its desired location in world coordinates. All of these individual transformations can be
put together into a single matrix, the MODEL TRANSFORMATION MATRIX, that represents the
transformation from object coordinates to world coordinates.
After the scene has been composed, the viewing parameters must be specified. One aspect of
the view is the vantage point (i.e., the eye or camera position) from which the scene will be
viewed. Viewing parameters also include the focus point (also called the lookat point or the
direction in which the camera is pointed) and the up direction (e.g., the camera may be held
sideways or upside down).
The viewing parameters collectively define the VIEWING TRANSFORMATION, and they can be
combined into a matrix called the VIEWING MATRIX. A coordinate multiplied by this matrix is
transformed from world space into EYE SPACE, also called the EYE COORDINATE SYSTEM. By definition,
the origin of this coordinate system is at the viewing (or eye) position. Coordinates in this space
are called eye coordinates. The spatial relationships in the scene remain unchanged, but
orienting the coordinate system in this way makes it easy to determine the distance from the
viewpoint to various objects in the scene.
Although some 3D graphics APIs allow applications to separately specify the modeling matrix
and the viewing matrix, OpenGL combines them into a single matrix called the MODELVIEW MATRIX.
This matrix is defined to transform coordinates from object space into eye space (see Figure
1.2).
Figure 1.2. Coordinate spaces and transforms in OpenGL
You can manipulate a number of matrices in OpenGL. Call the glMatrixMode function to select the
modelview matrix or one of OpenGL's other matrices. Load the current matrix with the identity
matrix by calling glLoadIdentity, or replace it with an arbitrary matrix by calling glLoadMatrix. Be
sure you know what you're doing if you specify an arbitrary matrixthe transformation might
give you a completely incomprehensible image! You can also multiply the current matrix by an
arbitrary matrix by calling glMultMatrix.
Applications often start by setting the current modelview matrix to the view matrix and then
add on the necessary modeling matrices. You can set the modelview matrix to a reasonable
viewing transformation with the gluLookAt function. (This function is not part of OpenGL proper
but is part of the OpenGL utility library that is provided with every OpenGL implementation.)
OpenGL actually supports a stack of modelview matrices, and you can duplicate the topmost
matrix and copy it onto the top of the stack with glPushMatrix. When this is done, you can
concatenate other transformations to the topmost matrix with the functions glScale, glTranslate,
and glRotate to define the modeling transformation for a particular threedimensional object in the
scene. Then, pop this topmost matrix off the stack with glPopMatrix to get back to the original
view transformation matrix. Repeat the process for each object in the scene.
At the time light source positions are specified with the glLight function, they are transformed by
the current modelview matrix. Therefore, light positions are stored within OpenGL as eye
coordinates. You must set up the modelview matrix to perform the proper transformation
before light positions are specified or you won't get the lighting effects that you expect. The
lighting calculations that occur in OpenGL are defined to happen on a per-vertex basis in the
eye coordinate system. For the necessary reflection computations, light positions and surface
normals must be in the same coordinate system. OpenGL implementations often choose to do
lighting calculations in eye space; therefore, the incoming surface normals have to be
transformed into eye space as well. You accomplish this by transforming surface normals by the
inverse transpose of the upper leftmost 3 x 3 matrix taken from the modelview matrix. At that
point, you can apply the pervertex lighting formulas defined by OpenGL to determine the lit
color at each vertex.
After coordinates have been transformed into eye space, the next thing is to define a viewing
volume. This is the region of the three-dimensional scene that is visible in the final image. The
transformation that takes the objects in the viewing volume into CLIP SPACE (also known as the
CLIPPING COORDINATE SYSTEM, a coordinate space that is suitable for clipping) is called the PROJECTION
TRANSFORMATION. In OpenGL, you establish the projection transformation by calling glMatrixMode to
select the projection matrix and then setting this matrix appropriately. Parameters that may go
into creating an appropriate projection matrix are the field of view (how much of the scene is
visible), the aspect ratio (the horizontal field of view may differ from the vertical field of view),
and near and far clipping planes to eliminate things that are too far away or too close (for
perspective projections, weirdness will occur if you try to draw things that are at or behind the
viewing position). Three utility functions set the projection matrix: glOrtho, glFrustum, and
gluPerspective. The difference between these functions is that glOrtho defines a parallel projection
(i.e., parallel lines in the scene are projected to parallel lines in the final two-dimensional
image), whereas glFrustum and gluPerspective define perspective projections (i.e., parallel lines in
the scene are foreshortened to produce a vanishing point in the image, such as railroad tracks
converging to a point in the distance).
FRUSTUM CLIPPING is the process of eliminating any graphics primitives that lie outside an axisaligned
cube in clip space. This cube is defined such that the x, y, and z components of the clip
space coordinate are less than or equal to the w component for the coordinate, and greater
than or equal to -w (i.e., -w x w, -w y w, and -w z w). Graphics primitives (or
portions thereof) that lie outside this cube are discarded. Frustum clipping is always performed
on all incoming primitives in OpenGL. USER CLIPPING, on the other hand, is a feature that can be
enabled or disabled by the application. Applications can call glClipPlane to specify one or more
clipping planes that further restrict the size of the viewing volume, and each clipping plane can
be individually enabled with glEnable. At the time user clipping planes are specified, OpenGL
transforms them into eye space using the inverse of the current modelview matrix. Each plane
specified in this manner defines a half-space, and only the portions of primitives that lie within
the intersection of the view volume and all of the enabled half-spaces defined by user clipping
planes are drawn.
The next step in the transformation of vertex positions is the perspective divide. This operation
divides each component of the clip space coordinate by the homogeneous coordinate w. The
resulting x, y, and z components range from [-1,1], and the resulting w coordinate is always 1,
so it is no longer needed. In other words, all the visible graphics primitives are transformed into
a cubic region between the point (-1, -1, -1) and the point (1, 1, 1). This is the NORMALIZED DEVICE
COORDINATE SPACE, which is an intermediate space that allows the viewing area to be properly
mapped onto a viewport of arbitrary size and depth.
Pixels within a window on the display device aren't referred to with floating-point coordinates
from -1 to 1; they are usually referred to with coordinates defined in the WINDOW COORDINATE
SYSTEM, where x values range from 0 to the width of the window minus 1, and y values range
from 0 to the height of the window minus 1. Therefore, one more transformation step is
required. The VIEWPORT TRANSFORMATION specifies the mapping from normalized device coordinates
into window coordinates. You specify this mapping by calling the OpenGL functions glViewport,
which specifies the mapping of the x and y coordinates, and glDepthRange, which specifies the
mapping of the z coordinate. Graphics primitives are rasterized in the window coordinate
system.