CG Programming
CG Programming
Getting Started
• 3D Geometry
- 3D model coordinate systems
- 3D world coordinate system
- 3D eye coordinate system
- Clipping
- Projections
- 2D eye coordinates
- 2D screen coordinates
• Appearance
- Color
- Texture mapping
- Depth buffering
• The viewing process
- Different implementation, same result
- Summary of viewing advantages
• A basic OpenGL program
- The structure of the main() program using OpenGL
- Model space
- Modeling transformation
- 3D world space
- Viewing transformation
- 3D eye space
- Projections
- 2D eye space
- 2D screen space
- Appearance
- Another way to see the program
• OpenGL extensions
• Summary
• Questions
• Exercises
• Experiments
3/15/03 Page 2
• Some aspects of managing the view
- Hidden surfaces
- Double buffering
- Clipping planes
• Stereo viewing
• Implementation of viewing and projection in OpenGL
- Defining a window and viewport
- Reshaping the window
- Defining a viewing environment
- Defining perspective projection
- Defining an orthogonal projection
- Managing hidden surface viewing
- Setting double buffering
- Defining clipping planes
• Implementing a stereo view
• Summary
• Questions
• Exercises
• Experiments
3/15/03 Page 3
• Using the modeling graph for coding
- Example
- Using standard objects to create more complex scenes
• Summary
• Questions
• Exercises
• Experiments
• Projects
3/15/03 Page 4
• Polyhedra
• Collision detection
• Polar, cylindrical, and spherical coordinates
• Higher dimensions?
• Questions
• Exercises
• Experiments
3/15/03 Page 5
• Color
- Emphasis colors
- Background colors
- Color deficiencies in audience
- Naturalistic color
- Pseudocolor and color ramps
- Implementing color ramps
- Using color ramps
- To light or not to light
- Higher dimensions
• Dimensions
• Image context
- Choosing an appropriate view
- Legends to help communicate your encodings
- Labels to help communicate your problem
• Motion
- Leaving traces of motion
- Motion blurring
• Interactions
• Cultural context of the audience
• Accuracy
• Output media
• Implementing some of these ideas in OpenGL
- Using color ramps
- Legends and labels
- Creating traces
- Using the accumulation buffer
• A word to the wise
3/15/03 Page 6
- Scalar fields
- Representation of objects and behaviors
- Molecular display
- Monte Carlo modeling
- 4D graphing
- Higher dimensional graphing
- Data-driven graphics
• Summary
• Credits
• Questions
• Exercises
• Experiments
• Projects
3/15/03 Page 7
Chapter 9: Texture Mapping
• Introduction
• Definitions
- 1D texture maps
- 2D texture maps
- 3D texture maps
- Associating a vertex with a texture point
- The relation between the color of the object and the color of the texture map
- Other meanings for texture maps
- Texture mapping in the scene graph
• Creating a texture map
- Getting an image as a texture map
- Generating a synthetic texture map
• Texture mapping and billboards
- Including multiple textures in one texture map
• Interpolation for texture maps
• Antialiasing in texturing
• MIP mapping
• Multitexturing
• Using billboards
• Texture mapping in OpenGL
- Associating vertices and texture points
- Capturing a texture from the screen
- Texture environment
- Texture parameters
- Getting and defining a texture map
- Texture coordinate control
- Texture interpolation
- Texture mapping and GLU quadrics
• Some examples
- The Chromadepth™ process
- Environment maps
• A word to the wise
• Code examples
- A 1D color ramp
- An image on a surface
- An environment map
- Multitexturing code
• Summary
• Questions
• Exercises
• Experiments
• Projects
3/15/03 Page 8
• Some 3D Viewing Operations with Graphics Cards
• Summary
• Questions
• Exercises
• Experiments
3/15/03 Page 9
• Projects
3/15/03 Page 10
• Some examples
- Spline curves
- Spline surfaces
• A word to the wise
• Summary
• Questions
• Exercises
• Experiments
• Projects
Appendices
• Appendix I: PDB file format
• Appendix II: CTL file format
• Appendix III: STL file format
Index
3/15/03 Page 11
Evaluation
• Instructor’s evaluation
• Student’s evaluation
Because this is a draft of a textbook for an introductory, API-based computer graphics course, the
author recognizes that there may be some inaccuracies, incompleteness, or clumsiness in the
presentation and apologizes for these in advance. Further development of these materials, as well
as source code for many projects and additional examples, is ongoing continuously. All such
materials will be posted as they are ready on the author’s Web site:
http://www.cs.csustan.edu/~rsc/NSF/
Your comments and suggestions will be very helpful in making these materials as useful as
possible and are solicited; please contact
Steve Cunningham
California State University Stanislaus
[email protected]
While the author retains copyright and other associated rights, permission is given to use this
manuscript and associated materials in both electronic and printed form as a resource for a
beginning computer graphics course.
This work was supported by National Science Foundation grant DUE-9950121. All
opinions, findings, conclusions, and recommendations in this work are those of the author
and do not necessarily reflect the views of the National Science Foundation. The author
also gratefully acknowledges sabbatical support from California State University Stanislaus
and thanks the San Diego Supercomputer Center, most particularly Dr. Michael J. Bailey,
for hosting this work and for providing significant assistance with both visualization and
science content. Ken Brown, a student of the author’s, provided invaluable and much-
appreciated assistance with several figures and concepts in this manuscript. The author
also thanks students Ben Eadington, Jordan Maynard, and Virginia Muncy for their
contributions through examples, and a number of others for valuable conversations and
suggestions on these notes.
3/15/03 Page 12
Preface
Computer graphics is one of the most exciting ways that computing has made an impact on the
world. From the simple ways that spreadsheets allow you to create charts to see data, to the
ways graphics has enhanced entertainment by providing new kinds of cartoons and special effect,
to the ways graphics has enabled us to see and understand scientific principles, computer
graphics is everywhere we turn. This important presence has come from the greatly improved
graphics hardware and software that is found in current computing systems. With these
advances, computer graphics has emerged from a highly technical field, needing very expensive
computers and frame buffers and requiring programmers to master all the mathematics and
algorithms needed to create an image. It has now become a field that allows the graphics
programmer to think and work at a much higher level of modeling and to create effective images
that communicate effectively with the user. We believe that the beginning computer graphics
course should focus on how the student can learn to create effective communications with
computer graphics, including motion and interaction, and that the more technical details of
algorithms and mathematics for graphics should be saved for more advanced courses.
Computer graphics is involved in any work that uses computation to create or modify images,
whether those images are still or moving; interactive or fixed; on film, video, screen, or print. It
can also be part of creating objects that are manufactured from the processes that also create
images. This makes it a very broad field, encompassing many kinds of uses in the creative,
commercial, and scientific worlds. This breadth means that there are many kinds of tools to
create and manipulate images for these different areas. The large number of tools and
applications means that there are many different things one could learn about computer graphics.
In this book, we do not try to cover the full scope of work that can be called computer graphics.
Rather, we view computer graphics as the art and science of creating synthetic images by
programming the geometry and appearance of the contents of the images, and by displaying the
results of that programming on appropriate devices that support graphical output and interaction.
This focus on creating images by programming means that we must learn how to think about
how to represent graphical and interaction concepts in ways that can be used by the computer,
which both limits and empowers the graphics programmer.
The work of the programmer is to develop appropriate representations for the geometric objects
that are to make up the images, to assemble these objects into an appropriate geometric space
where they can have the proper relationships with each other as needed for the image, to define
and present the look of each of the objects as part of that scene, to specify how the scene is to be
viewed, and to specify how the scene as viewed is to be displayed on the graphic device. The
programming may be done in many ways, but in current practice it usually uses a graphics API
that supports the necessary modeling and does most of the detailed work of rendering the scene
that is defined through the programming. There are a number of graphics APIs available, but the
OpenGL API is probably most commonly used currently.
In addition to the creation of the modeling, viewing, and look of the scene, the programmer has
two other important tasks. Because a static image does not present as much information as a
moving image, the programmer may want to design some motion into the scene, that is, may
want to define some animation for the image. And because the programmer may want the user
to have the opportunity to control the nature of the image or the way the image is seen, the
programmer may want to design ways for the user to interact with the scene as it is presented.
These additional tasks are also supported by the graphics API.
Besides covering the basic ideas of interactive computer graphics, this book will introduce you to
the OpenGL graphics API and to give you a number of examples that will help you understand
the capabilities that OpenGL provides and will allow you to learn how to integrate graphics
programming into your other work.
Computer graphics has many faces, so there are many reasons why one might want to use
computer graphics in his or her work. Many of the most visible uses of computer graphics are to
create images for the sciences (scientific visualization, explanations to the public), entertainment
(movies, video games, special effects), for creative or aesthetic work (art, interactive
installations), for commercial purposes (advertising, communication, product design), or for
general communication (animated weather displays, information graphics). The processes
described in this book are all fundamental to each of these applications, although some of the
applications will want the kinds of sophistication or realism in images that are not possible
through simple API programming.
In all of these application areas, and more, there is a fundamental role for computer graphics in
solving problems. Problem solving is a basic process in all human activity, so computer graphics
can play a fundamental role in almost any area, as shown in Figure 1. This figure describes what
occurs as someone:
• identifies a problem
• addresses the problem by building a model that represents it and allows it to be considered
more abstractly
• identifies a way to represent the problem geometrically
• creates an image from that geometry so that the problem can be seen visually
3/15/03 Page 2
• uses the image to understand the problem or the model and to try to understand a possible
solution.
Problem
Model
Geometry
Image
The image that represents a problem can be made in many ways. One of the classical uses of
images in problem solving is simply to sketch an image—a diagram or picture—to communicate
the problem to a colleague so it can be discussed informally. (In the sciences, it is assumed that
restaurants are not happy to see a group of scientists or mathematicians come to dinner because
they write diagrams on the tablecloth!) But an image can also be made with computer graphics,
and this is especially useful when it is important to share the idea to a larger audience. If the
model permits it, this image may be an animation or an interactive display so that the problem
can be examined more generally than a single image would permit. That image, then, can be
used by the problem-solver or the audience to gain a deeper understanding of the model and
hence of the problem, and the problem can be refined iteratively and a more sophisticated model
created, and the process can continue.
This process is the basis for all of the discussions in a later chapter on graphical problem solving
in the sciences, but it may be applied to more general application areas. In allowing us to bring
the visual parts of our brain and our intelligence to a problem, it gives us a powerful tool to think
about the world. In the words of Mike Bailey of the San Diego Supercomputer Center, computer
graphics gives us a “brain wrench” that magnifies the power of our mind, just as a physical
wrench magnifies the power of our hands.
This book is a textbook for a beginning computer graphics course for students who have a good
programming background, equivalent to a full year of programming courses. We use C as the
programming language in our examples because it is the most common language for developing
applications with OpenGL. The book can be used by students with no previous computer
graphics experience and less mathematics and advanced computer science studies than the
traditional computer graphics course. Because we focus on graphics programming rather than
algorithms and techniques, we have fewer instances of data structures and other computer
science techniques. This means that this text can be used for a computer graphics course that can
be taken earlier in a student’s computer science studies than the traditional graphics course, or
for self-study be anyone with a sound programming background. In particular, this book can be
used as a text for a computer graphics course at the community college level.
3/15/03 Page 3
Many, if not most, of the examples in this book are taken from sources in the sciences, and we
include a chapter that discusses several kinds of scientific and mathematical applications of
computer graphics. This emphasis makes this book appropriate for courses in computational
science or in computer science programs that want to develop ties with other programs on
campus, particularly programs that want to provide science students with a background that will
support development of computational science or scientific visualization work. It is tempting to
use the word “visualization” somewhere in the title of this book, but we would reserve that word
for material that is primarily focused on the science with only a sidelight on the graphics;
because we reverse that emphasis, the role of scientific visualization is in the application of the
computer graphics.
The book is organized along fairly traditional lines, treating projection, viewing, modeling,
rendering, lighting, shading, and many other aspects of the field. It also includes an emphasis on
using computer graphics to address real problems and to communicate results effectively to the
viewer. As we move through this material, we describe some general principles in computer
graphics and show how the OpenGL API provides the graphics programming tools that
implement these principles. We do not spend time describing in depth the algorithms behind the
techniques or the way the techniques are implemented; your instructor will provide these if he or
she finds it necessary. Instead, the book focuses on describing the concepts behind the graphics
and on using a graphics API (application programming interface) to carry out graphics
operations and create images.
We have tried to match the sequence of chapters in the book to the sequence we would expect to
be used in a beginning computer graphics course, and in some cases the presentation of one
module will depend on your knowing the content of an earlier chapter. However, in other cases
it will not be critical that earlier chapters have been covered. It should be pretty obvious if other
chapters are assumed, and we may make that assumption explicit in some modules.
The book focuses on computer graphics programming with a graphics API, and in particular uses
the OpenGL API to implement the basic concepts that it presents. Each chapter includes a
general discussion of a topic in graphics as well as a discussion of the way the topic is handled in
OpenGL. However, another graphics API might also be used, with the OpenGL discussion
serving as an example of the way an API could work. Many of the fundamental algorithms and
techniques that are at the root of computer graphics are covered only at the level they are needed
to understand questions of graphics programming. This differs from most computer graphics
textbooks that place a great deal of emphasis on understanding these algorithms and techniques.
We recognize the importance of these for persons who want to develop a deep knowledge of the
subject and suggest that a second graphics course can provide that knowledge. We believe that
the experience provided by API-based graphics programming will help you understand the
importance of these algorithms and techniques as they are developed and will equip you to work
with them more fluently than if you met them with no previous background.
This book includes several features that are not found in most beginning textbooks. These
features support a course that fits the current programming practice in computer graphics. The
discussions in this book will focus on 3D graphics and will almost completely omit uniquely 2D
3/15/03 Page 4
techniques. It has been traditional for computer graphics courses to start with 2D graphics and
move up to 3D because some of the algorithms and techniques have been easier to grasp at the
2D level, but without that concern it is easier to begin by covering 3D concepts and discuss 2D
graphics as the special case where all the modeling happens in the X-Y plane.
Modeling is a very fundamental topic in computer graphics, and there are many different ways
that one can model objects for graphical display. This book uses the standard beginning
approach of focusing on polygon-based modeling because that approach is supported by
OpenGL and most other graphics APIs. The discussion on modeling in this book places an
important emphasis on the scene graph as a fundamental tool in organizing the work needed to
create a graphics scene. The concept of the scene graph allows the student to design the
transformations, geometry, and appearance of a number of complex components in a way that
they can be implemented quite readily in code, even if the graphics API itself does not support
the scene graph directly. This is particularly important for hierarchical modeling, but it also
provides a unified design approach to modeling and has some very useful applications for
placing the eye point in the scene and for managing motion and animation.
A key feature of this book is an emphasis on using computer graphics to create effective visual
communication. This recognizes the key role that computer graphics has taken in developing an
understanding of complex problems and in communicating this understanding to others, from
small groups of working scientists to the general public. This emphasis is usually missing from
computer graphics textbooks, although we expect that most instructors include this somehow in
their courses. The discussion of effective communication is integrated throughout several of the
basic chapters in the book, because it is an important consideration in graphics modeling,
viewing, color, and interaction. We believe that a systematic discussion of this subject will help
prepare students for more effective use of computer graphics in their future professional lives,
whether this is in technical areas in computing or is in areas where there are significant
applications of computer graphics.
This book also places a good deal of emphasis on creating interactive displays. Most computer
graphics textbooks cover interaction and the creation of interactive graphics. Historically this
was a difficult area to implement because it involved writing or using specialized device drivers,
but with the growing importance of OpenGL and other graphics APIs this area has become much
more common. Because we are concerned with effective communication, we believe it is
critically important to understand the role of interaction in communicating information with
graphics. Our discussion of interaction includes a general treatment of event-driven
programming and covers the events and callbacks used in OpenGL, but it also discusses the role
of interaction in creating effective communications. This views interaction in the context of the
task that is to be supported, not just the technology being studied, and thus integrates it into the
overall context of the book.
This book’s approach, discussing computer graphics principles without covering the details of
the algorithms and mathematics that implement them, differs from most computer graphics
textbooks that place a much larger emphasis on understanding these graphics algorithms and
techniques. We recognize the importance of these ideas for persons who want to develop a deep
knowledge of the subject and suggest that a second graphics course can provide that knowledge.
3/15/03 Page 5
We believe that the experience provided by API-based graphics programming will help the
student understand the importance of these algorithms and techniques as they are developed and
will equip someone to work with them more fluently than if they were covered with no previous
computer graphics background.
3/15/03 Page 6
Chapter 0: Getting Started
This chapter is intended to give you a basic overview of the concepts of computer graphics so that
you can move forward into the rest of the book with some idea of what the field is about. It gives
some general discussion of the basis of the field, and then has two key content areas.
The first key area is the discussion of three-dimensional geometry, managed by the 3D geometry
pipeline, and the concept of appearance for computer graphics objects, managed by the rendering
pipeline. The geometry pipeline shows you the key information that you must specify to create an
image and the kind of computation a graphics system must do in order to present that image. We
will also discuss some of the ways appearance can be specified, but we will wait until a later
chapter to discuss the rendering pipeline.
The second key area is a presentation of the way a graphics program is laid out for the OpenGL
graphics API, the key API we will use in this book. In this presentation you will see both the
general structure of an OpenGL program and a complete example of a program that models a
particular problem and produces a particular animated image. In that example you will see how the
information for the geometry pipeline and the appearance information are defined for the program
and will be able to try out various changes to the program as part of the chapter exercises.
In order to create an image, we must define the geometry that represents each part of the image.
The process of creating and defining this geometry is called modeling, and is described in the
chapters below on principles of modeling and on modeling in OpenGL. This is usually done by
defining each object in terms of a coordinate system that makes sense for that particular object, and
then using a set of modeling transformations that places that object in a single world coordinate
system that represents the common space in which all the objects will live. Modeling then creates
the 3D model coordinates for each object, and the modeling transformations place the objects in the
world coordinate system that contains the entire scene.
3D world coordinate system
The 3D coordinate system that is shared by all the objects in the scene is called the world
coordinate system. By placing every component of the scene in this single shared world, we can
treat the scene uniformly as we develop the presentation of the scene through the graphics display
device to the user. The scene is a master design element that contains both the geometry of the
objects placed in it and the geometry of lights that illuminate it. Note that the world coordinate
system often is considered to represent the actual dimensions of a scene because it may be used to
model some real-world environment. This coordinate system exists without any reference to a
viewer, as is the case with any real-world scene. In order to create an image from the scene, the
viewer is added at the next stage.
Once the 3D world has been created, an application programmer would like the freedom to allow
an audience to view it from any location. But graphics viewing models typically require a specific
orientation and/or position for the eye at this stage. For example, the system might require that the
eye position be at the origin, looking in –Z (or sometimes +Z). So the next step in the geometry
pipeline is the viewing transformation, in which the coordinate system for the scene is changed to
satisfy this requirement. The result is the 3D eye coordinate system. We can think of this process
as grabbing the arbitrary eye location and all the 3D world objects and sliding them around to
realign the spaces so that the eye ends up at the proper place and looking in the proper direction.
The relative positions between the eye and the other objects have not been changed; all the parts of
the scene are simply anchored in a different spot in 3D space. Because standard viewing models
may also specify a standard distance from the eyepoint to some fixed “look-at” point in the scene,
there may also be some scaling involved in the viewing transformation. The viewing
transformation is just a transformation in the same sense as modeling transformations, although it
can be specified in a variety of ways depending on the graphics API. Because the viewing
transformation changes the coordinates of the entire world space in order to move the eye to the
standard position and orientation, we can consider the viewing transformation to be the inverse of
whatever transformation placed the eye point in the position and orientation defined for the view.
We will take advantage of this observation in the modeling chapter when we consider how to place
the eye in the scene’s geometry.
Clipping
At this point, we are ready to clip the object against the 3D viewing volume. The viewing volume
is the 3D volume that is determined by the projection to be used (see below) and that declares what
portion of the 3D universe the viewer wants to be able to see. This happens by defining how much
of the scene should be visible, and includes defining the left, right, bottom, top, near, and far
boundaries of that space. Any portions of the scene that are outside the defined viewing volume
are clipped and discarded. All portions that are inside are retained and passed along to the
projection step. In Figure 0.2, it is clear that some of the world and some of the helicopter lie
outside the viewable space to the left, right, top, or bottom, but note how the front of the image of
the ground in the figure is clipped—is made invisible in the scene—because it is too close to the
viewer’s eye. This is a bit difficult to see, but if you look at the cliffs at the upper left of the scene
you will see a clipped edge.
Clipping is done as the scene is projected to the 2D eye coordinates in projections, as described
next. Besides ensuring that the view includes only the things that should be visible, clipping also
increases the efficiency of image creation because it eliminates some parts of the geometry from the
rest of the display process.
Projections
The 3D eye coordinate system still must be converted into a 2D coordinate system before it can be
mapped onto a graphics display device. The next stage of the geometry pipeline performs this
operation, called a projection. Before discussing the actual projection, we must think about what
we will actually see in the graphic device. Imagine your eye placed somewhere in the scene,
looking in a particular direction. You do not see the entire scene; you only see what lies in front of
your eye and within your field of view. This space is called the viewing volume for your scene,
and it includes a bit more than the eye point, direction, and field of view; it also includes a front
plane, with the concept that you cannot see anything closer than this plane, and a back plane, with
the concept that you cannot see anything farther than that plane. In Figure 0.3 we see two viewing
volumes for the two kinds of projections that we will discuss in a moment.
There are two kinds of projections commonly used in computer graphics. One maps all the points
in the eye space to the viewing plane by simply ignoring the value of the z-coordinate, and as a
While the viewing volume describes the region in space that is included in the view, the actual view
is what is displayed on the front clipping space of the viewing volume. This is a 2D space and is
essentially the 2D eye space discussed below. Figure 0.4 presents a scene with both parallel and
perspective projections; in this example, you will have to look carefully to see the differences!
2D eye coordinates
The space that projection maps to is a two-dimensional real-coordinate space that contains the
geometry of the original scene after the projection is applied. Because a single point in 2D eye
coordinates corresponds to an entire line segment in the 3D eye space, depth information is lost in
the projection and it can be difficult to perceive depth, particularly if a parallel projection was used.
Even in that case, however, if we display the scene with a hidden-surface technique, object
occlusion will help us order the content in the scene. Hidden-surface techniques are discussed in a
later chapter.
The final step in the geometry pipeline is to change the coordinates of objects in the 2D eye space
so that the object is in a coordinate system appropriate for the 2D display device. Because the
screen is a digital device, this requires that the real numbers in the 2D eye coordinate system be
converted to integer numbers that represent screen coordinate. This is done with a proportional
mapping followed by a truncation of the coordinate values. It is called the window-to-viewport
mapping, and the new coordinate space is referred to as screen coordinates, or display coordinates.
When this step is done, the entire scene is now represented by integer screen coordinates and can
be drawn on the 2D display device.
Note that this entire pipeline process converts vertices, or geometry, from one form to another by
means of several different transformations. These transformations ensure that the vertex geometry
of the scene is consistent among the different representations as the scene is developed, but
computer graphics also assumes that the topology of the scene stays the same. For instance, if two
points are connected by a line in 3D model space, then those converted points are assumed to
likewise be connected by a line in 2D screen space. Thus the geometric relationships (points,
lines, polygons, ...) that were specified in the original model space are all maintained until we get
to screen space, and are only actually implemented there.
Appearance
Along with geometry, computer graphics is built on the ability to define the appearance of objects,
so you can make them appear naturalistic or can give them colors that can communicate something
to the user.
In the discussion so far, we have only talked about the coordinates of the vertices of a model.
There are many other properties of vertices, though, that are used in rendering the scene, that is, in
creating the actual image defined by the scene. These are discussed in many of the later chapters,
but it is worth noting here that these properties are present when the vertex is defined and are
preserved as the vertex is processed through the geometry pipeline. Some of these properties
involve concepts that we have not yet covered, but these will be defined below. These properties
include:
• a depth value for the vertex, defined as the distance of the vertex from the eye point in the
direction of the view reference point,
• a color for the vertex,
• a normal vector at the vertex,
• material properties for the vertex, and
• texture coordinates for the vertex.
These properties are used in the development of the appearance of each of the objects in the image.
They allow the graphics system to calculate the color of each pixel in the screen representation of
the image after the vertices are converted to 2D screen coordinates. For the details of the process,
see the chapter below on the rendering pipeline.
Appearance is handled by operations that are applied after the geometry is mapped to screen space.
In order to do this, the geometric primitives described above are broken down into very simple
primitives and these are processed by identifying the parts of the window raster that make up each
one. This is done by processing the vertex information described in the previous paragraph into
scanline information, as described in a later chapter. Appearance information is associated with
each vertex, and as the vertex information is processed into scanlines, and as the pixels on each
scanline are processed, appearance information is also processed to create the colors that are used
in filling each primitive. Processes such as depth buffering are also handled at this stage, creating
the appropriate visible surface view of a scene. So the appearance information follows the
Color
Color can be set directly by the program or can be computed from a lighting model in case your
scene is defined in terms of lights and materials. Most graphics APIs now support what is called
RGBA color: color defined in terms of the emissive primaries red, green, and blue, and with an
alpha channel that allows you to blend items with the background when they are drawn. These
systems also allow a very large number of colors, typically on the order of 16 million. So there are
a large number of possibilities for color use, as described in later chapters on color and on lighting.
Texture mapping
Among the most powerful ways to add visual interest to a scene is texture mapping, a capability
that allows you to add information to objects in a scene from either natural or synthetic complex
images. With texture mapping you can achieve photographic surface effects or other kinds of
images that will make your images much more interesting and realistic. This is discussed in a later
chapter and should be an important facility for you.
Depth buffering
As your scene is developed, you want only the objects nearest the eye to be seen; anything that is
behind these will be hidden by the nearer objects. This is managed in the rendering stage by
keeping track of the distance of each vertex from the eye. If an object is nearer than the previously
drawn part of the scene for the same pixels, then the object will replace the previous part; otherwise
the previous part is retained. This is a straightforward computation that is supported by essentially
all modern graphics systems.
Let’s look at the overall operations on the geometry you define for a scene as the graphics system
works on that scene and eventually displays it to your user. Referring again to Figure 0.1 and
omitting the clipping and window-to-viewport process, we see that we start with geometry, apply
the modeling transformation(s), apply the viewing transformation, and finally apply the projection
to the screen. This can be expressed in terms of function composition as the sequence
projection(viewing(transformation(geometry))))
or, with the associative law for functions and writing function composition as multiplication,
(projection * viewing * transformation) (geometry).
In the same way we saw that the operations nearest the geometry were performed before operations
further from the geometry, then, we will want to define the projection first, the viewing next, and
the transformations last before we define the geometry they are to operate on. This is independent
of whether we want to use a perspective or parallel projection. We will see this sequence as a key
factor in the way we structure a scene through the scene graph in the modeling chapter later in these
notes.
Warning! To this point, our discussion has only shown the concept of how a vertex travels
through the geometry pipeline, but we not given any details on how this actually is done. There
are several ways of implementing this travel, any of which will produce a correct display. Do not
be surprised if you find out a graphics system does not manage the overall geometry pipeline
process exactly as shown here. The basic principles and stages of the operation are still the same.
In many cases, we simply will not be concerned about the details of how the stages are carried out.
Our goal will be to represent the geometry correctly at the modeling and world coordinate stages, to
specify the eye position appropriately so the transformation to eye coordinates will be correct, and
to define our window and projections correctly so the transformations down to 2D and to screen
space will be correct. Other details will be left to a more advanced graphics course.
One of the classic questions beginners have about viewing a computer graphics image is whether to
use perspective or orthographic projections. Each of these has its strengths and its weaknesses.
As a quick guide to start with, here are some thoughts on the two approaches:
In fact, when you have some experience with each, and when you know the expectations of the
audience for which you’re preparing your images, you will find that the choice is quite natural and
will have no problem knowing which is better for a given image.
Our example programs that use OpenGL have some strong similarities. Each is based on the
GLUT utility toolkit that usually accompanies OpenGL systems, so all the sample codes have this
fundamental similarity. (If your version of OpenGL does not include GLUT, its source code is
available online; check the page at
http://www.reality.sgi.com/opengl/glut3/glut3.h
and you can find out where to get it. You will need to download the code, compile it, and install it
in your system.) Similarly, when we get to the section on event handling, we will use the MUI
(micro user interface) toolkit, although this is not yet developed or included in this first draft
release.
Like most worthwhile APIs, OpenGL is complex and offers you many different ways to express a
solution to a graphical problem in code. Our examples use a rather limited approach that works
well for interactive programs, because we believe strongly that graphics and interaction should be
learned together. When you want to focus on making highly realistic graphics, of the sort that
takes a long time to create a single image, then you can readily give up the notion of interactive
work.
In the code below, you will see that the main function involves mostly setting up the system. This
is done in two ways: first, setting up GLUT to create and place the system window in which your
work will be displayed, and second, setting up the event-handling system by defining the callbacks
to be used when events occur. After this is done, main calls the main event loop that will drive all
the program operations, as described in the chapter below on event handling.
The full code example that follows this outline also discusses many of the details of these functions
and of the callbacks, so we will not go into much detail here. For now, the things to note are that
the reshape callback sets up the window parameters for the system, including the size, shape, and
location of the window, and defines the projection to be used in the view. This is called first when
the main event loop is entered as well as when any window activity happens (such as resizing or
dragging). The reshape callback requests a redisplay when it finishes, which calls the display
callback function, whose task is to set up the view and define the geometry for the scene. When
this is finished, OpenGL is finished and goes back to your computer system to see if there has
been any other graphics-related event. If there has, your program should have a callback to
manage it; if there has not, then the idle event is generated and the idle callback function is called;
this may change some of the geometry parameters and then a redisplay is again called.
// initialization function
void doMyInit(void) {
set up basic OpenGL parameters and environment
set up projection transformation (ortho or perspective)
}
// reshape function
void reshape(int w, int h) {
set up projection transformation with new window
dimensions w and h
post redisplay
}
// idle function
void idle(void) {
update anything that changes between steps of the program
post redisplay
}
Now that we have seen a basic structure for an OpenGL program, we will present a complete,
working program and will analyze the way it represents the geometry pipeline described earlier in
this chapter, while describing the details of OpenGL that it uses. The program is the simple
simulation of temperatures in a uniform metal bar that is described in the later chapter on graphical
problem-solving in science, and we will only analyze the program structure, not its function. It
creates the image shown in Figure 0.5.
void myinit(void);
void cube(void);
void display(void);
void setColor(float);
void reshape(int, int);
void animate(void);
void iterationStep(void);
void myinit(void) {
int i,j;
glEnable (GL_DEPTH_TEST);
glClearColor(0.6, 0.6, 0.6, 1.0);
point v[8] = {
{0.0, 0.0, 0.0}, {0.0, 0.0, 1.0},
{0.0, 1.0, 0.0}, {0.0, 1.0, 1.0},
{1.0, 0.0, 0.0}, {1.0, 0.0, 1.0},
{1.0, 1.0, 0.0}, {1.0, 1.0, 1.0} };
glBegin (GL_QUAD_STRIP);
glVertex3fv(v[4]);
glVertex3fv(v[5]);
glVertex3fv(v[0]);
glVertex3fv(v[1]);
glVertex3fv(v[2]);
glVertex3fv(v[3]);
glVertex3fv(v[6]);
glVertex3fv(v[7]);
glEnd();
glBegin (GL_QUAD_STRIP);
glVertex3fv(v[1]);
glVertex3fv(v[3]);
glVertex3fv(v[5]);
glVertex3fv(v[7]);
glVertex3fv(v[4]);
glVertex3fv(v[6]);
glVertex3fv(v[0]);
glVertex3fv(v[2]);
glEnd();
}
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
void setColor(float t) {
float r, g, b;
r = t/HOT; g = 0.0; b = 1.0 - t/HOT;
glColor3f(r, g, b);
}
void animate(void) {
// This function is called whenever the system is idle; it calls
// iterationStep() to change the data so the next image is changed
iterationStep();
glutPostRedisplay();
}
void iterationStep(void) {
int i, j, m, n;
The main() program in an OpenGL-based application looks somewhat different from the
programs we probably have seen before. This function has several key operations: it sets up the
display mode, defines the window in which the display will be presented, and does whatever
initialization is needed by the program. It then does something that may not be familiar to you: it
defines a set of event callbacks, which are functions that are called by the system when an event
occurs.
When you set up the display mode, you indicate to the system all the special features that your
program will use at some point. In the example here,
glutInitDisplayMode (GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH);
you tell the system that you will be working in double-buffered mode, will use the RGB color
model, and will be using depth testing. Some of these have to be enabled before they are actually
used, as the depth testing is in the myInit() function with
glEnable(GL_DEPTH_TEST) .
Details on depth testing and a discussion of how this is managed in OpenGL are found in the next
chapter.
Setting up the window (or windows—OpenGL will let you have multiple windows open and
active) is handled by a set of GLUT function calls that position the window, define the size of the
window, and give a title to the window. As the program runs, an active window may be reshaped
by the user using the standard techniques of whatever window system is being used, and this is
handled by the reshape() function.
The way OpenGL handles event-driven programming is described in much more detail in a later
chapter, but for now you need to realize that GLUT-based OpenGL (which is all we will describe
in this book) operates entirely from events. For each event that the program is to handle, you need
to define a callback function here in main(). When the main event loop is started, a reshape
Model space
The function cube() above defines a unit cube with sides parallel to the coordinate axes, one
vertex at the origin, and one vertex at (1,1,1). This cube is created by defining an array of points
that are the eight vertices of such a cube, and then using the glBegin()...glEnd()
construction to draw the six squares that make up the cube through two quad strips. This is
discussed in the chapter on modeling with OpenGL; for now, note that the cube uses its own set of
coordinates that may or may not have anything to do with the space in which we will define our
metallic strip to simulate heat transfer.
Modeling transformation
Modeling transformations are found in the display() function or functions called from it, and
are quite simple: they define the fundamental transformations that are to be applied to the basic
geometry units as they are placed into the world. In our example, the basic geometry unit is a unit
cube, and the cube is scaled in Z (but not in X or Y ) to define the height of each cell and is then
translated by X and Y (but not Z) to place the cell in the right place. The order of the
transformations, the way each is defined, and the pushMatrix()/popMatrix() operations
you see in the code are described in the later chapter on modeling in OpenGL. For now it suffices
to see that the transformations are defined in order to make a rectangular object of the proper height
to represent the temperature.
3D world space
The 3D world space for this program is the space in which the graphical objects live after they have
been placed by the modeling transformations. The translations give us one hint as to this space; we
see that the x-coordinates of the translated cubes will lie between -ROWS/2 and ROWS/2, while
the y-coordinates of these cubes will lie between -COLS/2 and COLS/2. Because ROWS and
COLS are 30 and 10, respectively, the x-coordinates will lie between -15 and 15 and the y-
coordinates will lie between -5 and 5. The low z-coordinate is 0 because that is never changed
when the cubes are scaled, while the high z-coordinate is never larger then 4. Thus the entire bar
lies in the region between -15 and 15 in x, -5 and 5 in y, and 0 and 4 in z. (Actually, this is not
quite correct, but it is adequate for now; you are encouraged to find the small error.)
Viewing transformation
The viewing transformation is defined at the beginning of the display() function above. The
code identifies that it is setting up the modelview matrix, sets that matrix to the identity (a
transformation that makes no changes to the world), and then specifies the view. A view is
specified in OpenGL with the gluLookAt() call:
gluLookAt( ex, ey, ez, lx, ly, lz, ux, uy, uz );
with parameters that include the coordinates of eye position (ex, ey, ez), the coordinates of the
point at which the eye is looking (lx, ly, lz), and the coordinates of a vector that defines the
“up” direction for the view (ux, uy, uz). This is discussed in more detail in the chapter below
on viewing.
There is no specific representation of the 3D eye space in the program, because this is simply an
intermediate stage in the production of the image. We can see, however, that we had set the center
of view to the origin, which is the center of our image, and we had set our eye point to look at the
origin from a point somewhat above and to the right of the center, so after the viewing
transformation the object seems to be tilted up and to the side. This is the representation in the final
3D space that will be used to project the scene to the eye.
Projections
The projection operation is defined here in the reshape() function. It may be done in other
places, but this is a good location and clearly separates the operation of projection from the
operation of viewing.
Projections are specified fairly easily in the OpenGL system. An orthographic (or parallel)
projection is defined with the function call:
glOrtho( left, right, bottom, top, near, far );
where left and right are the x-coordinates of the left and right sides of the orthographic view
volume, bottom and top are the y-coordinates of the bottom and top of the view volume, and
near and far are the z-coordinates of the front and back of the view volume. A perspective
projection can be defined with the function call:
gluPerspective( fovy, aspect, near, far );
In this function, the first parameter is the field of view in degrees, the second is the aspect ratio for
the window, and the near and far parameters are as above. In this projection, it is assumed that
your eye is at the origin so there is no need to specify the other four clipping planes; they are
determined by the field of view and the aspect ratio.
When the window is reshaped, it is useful to take the width and height from the reshape event and
define your projection to have the same aspect ratio (ratio of width to height) that the window has.
That way there is no distortion introduced into the scene as it is seen through the newly-shaped
window. If you use a fixed aspect ratio and change the window’s shape, the original scene will be
distorted to be seen through the new window, which can be confusing to the user.
2D eye space
This is the real 2D space to which the 3D world is projected, and it corresponds to the forward
plane of the view volume. In order to provide uniform dimensions for mapping to the screen, the
eye space is scaled so it has dimension -1 to 1 in each coordinate.
2D screen space
When the system was initialized, the window for this program was defined to be 500x500 pixels in
size with a top corner at (50, 50), or 50 pixels down and 50 pixels over from the upper-left corner
of the screen. Thus the screen space for the window is the set of pixels in that area of the screen.
In fact, though, the window maintains its coordinate system independently of its location, so the
point that had been (0, 0, 0) in 3D eye space is now (249, 249) in screen space. Note that screen
space has integer coordinates that represent individual pixels and is discrete, not continuous, and
its coordinates start at 0.
The appearance of the objects in this program is defined by the function setColor(), called
from the display() function. If you recall that display() is also the place where modeling is
defined, you will see that appearance is really part of modeling—you model both the geometry of
an object and its appearance. The value of the temperature in each cell is used to compute a color
for the cell’s object as it is displayed, using the OpenGL glColor3f() function. This is about
the simplest way to define the color for an object’s appearance, but it is quite effective.
Another way to see how this program works is to consider the code function-by-function instead
of by the properties of the geometry pipeline. We will do this briefly here.
The task of the function myinit() is to set up the environment for the program so that the
scene’s fundamental environment is set up. This is a good place to compute values for arrays that
define the geometry, to define specific named colors, and the like. At the end of this function you
should set up the initial projection specifications.
The task of the function display() is to do everything needed to create the image. This can
involve manipulating a significant amount of data, but the function does not allow any parameters.
Here is the first place where the data for graphics problems must be managed through global
variables. As we noted above, we treat the global data as a programmer-created environment, with
some functions manipulating the data and the graphical functions using that data (the graphics
environment) to define and present the display. In most cases, the global data is changed only
through well-documented side effects, so this use of the data is reasonably clean. (Note that this
argues strongly for a great deal of emphasis on documentation in your projects, which most people
believe is not a bad thing.) Of course, some functions can create or receive control parameters, and
it is up to you to decide whether these parameters should be managed globally or locally, but even
in this case the declarations are likely to be global because of the wide number of functions that
may use them. You will also find that your graphics API maintains its own environment, called its
system state, and that some of your functions will also manipulate that environment, so it is
important to consider the overall environment effect of your work.
The task of the function reshape() is to respond to user manipulation of the window in which
the graphics are displayed. The function takes two parameters, which are the width and height of
the window in screen space (or in pixels) as it is resized by the user’s manipulation, and should be
used to reset the projection information for the scene. GLUT interacts with the window manager
of the system and allows a window to be moved or resized very flexibly without the programmer
having to manage any system-dependent operations directly. Surely this kind of system
independence is one of the very good reasons to use the GLUT toolkit!
The task of the function animate() is to respond to the “idle” event — the event that nothing has
happened. This function defines what the program is to do without any user activity, and is the
way we can get animation in our programs. Without going into detail that should wait for our
general discussion of events, the process is that the idle() function makes any desired changes
in the global environment, and then requests that the program make a new display (with these
changes) by invoking the function glutPostRedisplay() that simply requests the display
function when the system can next do it by posting a “redisplay” event to the system.
The execution sequence of a simple program with no other events would then look something like
is shown in Figure 0.7. Note that main() does not call the display() function directly;
instead main() calls the event handling function glutMainLoop() which in turn makes the
main() display()
idle()
So we see that in the absence of any other event activity, the program will continues to apply the
activity of the idle() function as time progresses, leading to an image that changes over
time—that is, to an animated image.
A few words on the details of the idle() function might help in seeing what it does. The whole
program presents the behavior of heat in a bar, and the transfer of heat from one place to another is
described by the heat equation. In this program we model heat transfer by a diffusion process.
This uses a filter that sets the current heat at a position to a weighted average of the heat of the
cell’s neighbors, modeled by the filter array in this function. A full description of this is in the
chapter on science applications. At each time step—that is, at each time when the program
becomes idle—this diffusion process is applied to compute a new set of temperatures, and the
angle of rotation of the display is updated. The call to glutPostRedisplay() at the end of
this function then generates a call to the display() function that draws the image with the new
temperatures and new angle.
In looking at the execution sequence for the functions in this simple program, it can be useful to
consider a graph that shows which functions are called by which other functions. Bearing in mind
that the program is event-driven and so the event callback functions (animate(), display(),
and reshape()) are not called directly by the program, we have the function caller-callee graph
in Figure 0.8.
Note that the graph is really a tree: functions are called only by event callbacks or the init()
function, the init() function is called only once from main(), and all the event callbacks are
called from the event handler. For most OpenGL programs, this is the general shape of the graph:
a callback function may use several functions, but any function except a callback will only be called
as part of program initialization or from an event callback.
Figure 0.8: the function caller/callee graph for the example program
OpenGL extensions
In this chapter, and throughout these notes, we take a fairly limited view of the OpenGL graphics
API. Because this is an introductory text, we focus on the basic features of computer graphics and
of the graphics API, so we do not work with most of the advanced features of the system and we
only consider the more straightforward uses of the parts we cover. But OpenGL is capable of very
sophisticated kinds of graphics, both in its original version and in versions that are available for
specific kinds of graphics, and you should know of these because as you develop your graphics
skills, you may find that the original “vanilla” OpenGL that we cover here will not do everything
you want.
In addition to standard OpenGL, there are a number of extensions that support more specialized
kinds of operations. These include the ARB imaging subset extension for image processing, the
ARB multitexturing extension, vertex shader extensions, and many others. Some of these might
have just the tools you need to do the very special things you want, so it would be useful for you
to keep up to date on them. You can get information on extensions at the standard OpenGL Web
site, http://www.opengl.org.
Summary
In this chapter we have discussed the geometry pipeline and have indicated what each step involves
and how it contributes to creating the final image you are programming. We have also shown how
appearance fits into the geometry pipeline, although it is actually implemented in a separate
pipeline, and how all of this is implemented through a complete sample OpenGL program. In fact,
you actually have a significant tool in this sample program, because it can be modified and adapted
to serve as a basis for a great deal of other graphics programming. We do not have any
programming projects in this chapter, but these will come along quickly and you will be able to use
this sample program to get started on them.
Questions
1. There are other ways to do graphics besides API-based programming. You can use a number
of different modeling, painting, and other end-user tools. Distinguish between API-based
graphics and graphics done with a tool such as Photoshop™ or a commercial paint program.
The sample program in this chapter can give you an idea of how API-based graphics can look,
although it is only a simple program and much more complex programs are discussed in later
chapters.
2. Trace the behavior of the 3D geometry in the sample program through the steps of the geometry
pipeline from the point where you define a unit cube in model space, through the
transformations that place that point into world space, through the viewing transformation that
Exercises
3. Compile and execute the full sample program in the chapter so you can become familiar with
the use of your compiler for graphics programming. Exercise the reshape() function in the
code by dragging and resizing the window. As you manipulate the window, change the shape
of the window (make it narrower but not shorter, for example, or make it shorter but not
narrower) and see how the window and image respond.
Experiments
4. There are many ways you can experiment with the full sample program in this chapter. A few
of them include
(a) changing the size and upper left corner coordinates of the window [function main()]
(b) changing the locations of the hot and cold spots in the bar [function myinit()]
(c) changing the way the color of each bar is computed by changing the function that
determines the color [function setColor()] (see the later chapter on color for more on
this topic)
(d) changing the rate at which the image rotates by changing the amount the angle is increased
[function animate()]
(e) changing the values in the filter array to change the model of how heat diffuses in the bar
[function iterationStep()]
(f) changing the way the edge of the bar is treated, so that instead of simply repeating the
values at the edge, you get the values at the opposite edge of the bar, effectively allowing
temperatures to move from one edge to the other as if the bar were a torus [function
iterationStep()]
(g) changing the view of the bar from a perspective view to an orthogonal view [function
reshape()] (you will probably need to look up the details of orthogonal projections in
the appropriate chapter below).
Take as many of them as you can and add appropriate code changes to the code of the previous
exercise and observe the changes in the program behavior that result. Draw as many
conclusions as you can about the role of these various functions in creating the final image.
5. Continuing with the earlier theme of the reshape() function, look at the code of the
reshape() function and think about how you might make it respond differently. The current
version uses the window dimensions w and h in the perspective projection definition to ensure
that the aspect ratio of the original image are preserved, but the window may cut off part of the
image if it is too narrow. You can change the projection angle to increase as the window is
narrower, for example. Change the code in reshape() to try to change the behavior in the
window.
6. There are some sample programs available for this book, and there are an enormous number of
OpenGL programs available on the Web. Find several of these and create the graph of function
calls described in this chapter to verify (or refute) the claim there that functions tend to operate
in either program initialization or a single event callback. What does this tell you about the way
you develop a graphics program with OpenGL? Where do you find most of the user-defined
functions within the program?
Besides discussing viewing and projection, this chapter includes some topics that are related to
basic steps in the graphics pipeline. These include clipping that is performed during the projection
(as well as the more concept of clipping in general), defining the screen window on which the
image is presented, and specifying the viewport in the window that is to contain the actual image.
Other topics include double buffering (creating the image in an invisiblewindow and then swapping
it with the visible window) and managing hidden surfaces. Finally, we show how you can create a
stereo view with two images, computed from viewpoints that represent the left and right eyes,
presented in adjacent viewports so they may be fused by someone with appropriate vision skills.
After discussing the pipeline as a general feature of computer graphics, the chapter moves on to
discuss how each stage is created in OpenGL. We discuss the OpenGL functions that allow you to
define the viewing transformation and the orthogonal and perspective projections, and show how
they are used in a program and how they can respond to window manipulation. We also show
how the concepts of clipping, double buffering, and hidden surfaces are implemented, and show
how to implement the stereo viewing described above.
When the reader is finished with this chapter, he or she should be able to choose an appropriate
view and projection for a scene and should be able to define the view and projection and write the
necessary code to implement them in OpenGL. The reader should also understand the function of
double buffering and hidden surfaces in 3D graphics and be able to use them in graphics
programming.
Introduction
We emphasize 3D computer graphics consistently in this book because we believe that computer
graphics should be encountered through 3D processes and that 2D graphics can be considered
effectively as a special case of 3D graphics. But almost all of the viewing technologies that are
readily available to us are 2D—certainly monitors, printers, video, and film—and eventually even
the active visual retina of our eyes presents a 2D environment. So in order to present the images of
the scenes we define with our modeling, we must create a 2D representation of the 3D scenes. As
we saw in the graphics pipeline in the previous chapter, you begin by developing a set of models
that make up the elements of your scene and set up the way the models are placed in the scene,
resulting in a set of objects in a common world space. You then define the way the scene will be
viewed and the way that view is presented on the screen. In this early chapter, we are concerned
with the way we move from the world space to a 2D image with the tools of viewing and
projection.
We set the scene for this process in the last chapter, when we defined the geometry pipeline. We
begin at the point where we have the 3D world coordinates—that is, where we have a complete
scene fully defined in a 3D world. This point comes after we have done the modeling and model
transformations noted in the previous chapter and discussed in more detail in the two chapters that
follow this one. This chapter is about creating a view of that scene in our display space of a
computer monitor, a piece of film or video, or a printed page, whatever we want. To remind
ourselves of the steps in this process, the geometry pipeline (without the modeling stage) is again
shown in Figure 1.1.
3D World 3D Eye 3D Eye 2D Eye 2D Screen
Coordinates Coordinates Coordinates Coordinates Coordinates
Let’s consider an example of a world space and look at just what it means to have a view as a
presentation of that space. One of the author’s favorite places is Yosemite National Park, which is
a wonderful example of a 3D world. Certainly there is a basic geometry in the park, made up of
stone, wood, and water; this geometry can be seen from a number of points. In Figure 1.2 we see
the classic piece of Yosemite geometry, the Half Dome monolith, from below in the valley and
from above at Glacier Point. This gives us an excellent example of two views of the same
geometry.
If you think about this area shown in these photographs, we can see the essential components of
viewing. First, you notice that your view depends first on where you are standing. If you are
standing on the valley floor, you see the face of the monolith in the classic view; if you are
standing on the rim of Yosemite Valley at about the same height as Half Dome, you get a view that
shows the profile of the rock. So your view depends on your position, which we call your eye
point. Second, the view also depends on the point you are looking at, which we will call the view
reference point. Both photos look generally towards the Half Dome monolith, or more
specifically, towards a point in space behind the dome. This makes a difference not only in the
view of the dome, but in the view of the region around the dome. In the classic Half Dome view
from the valley, if you look off to the right you see the south wall of the valley; in the view from
Glacier Point, if you look off to the right you see Vernal and Nevada falls on the Merced River
and, farther to the right, the high Sierra in the south of the park. The view also depends on the
breadth of field of your view, whether you are looking at a wide part of the scene or a narrow part;
again, the photograph at the left is a view of just Half Dome, while the one at the right is a
Once you have determined your view, it must then be translated into an image that can be presented
on your computer monitor. You may think of this in terms of recording an image on a digital
camera, because the result is the same: each point of the view space (each pixel in the image) must
be given a specific color. Doing that with the digital camera involves only capturing the light that
comes through the lens to that point in the camera’s sensing device, but doing it with computer
graphics requires that we calculate exactly what will be seen at that particular point when the view
is presented. We must define the way the scene is transformed into a two-dimensional space,
which involves a number of steps: taking into account all the questions of what parts are in front of
what other parts, what parts are out of view from the camera’s lens, and how the lens gathers light
from the scene to bring it into the camera. The best way to think about the lens is to compare two
very different kinds of lenses: one is a wide-angle lens that gathers light in a very wide cone, and
the other is a high-altitude photography lens that gathers light only in a very tight cylinder and
processes light rays that are essentially parallel as they are transferred to the sensor. Finally, once
the light from the continuous world comes into the camera, it is recorded on a digital sensor that
only captures a discrete set of points.
This model of viewing is paralleled quite closely by a computer graphics system, and it follows the
graphics pipeline that we discussed in the last chapter. You begin your work by modeling your
scene in an overall world space (you may actually start in several modeling spaces, because you
may model the geometry of each part of your scene in its own modeling space where it can be
defined easily, then place each part within a single consistent world space to define the scene).
This is very different from the viewing we discuss here but is covered in detail in the next chapter.
The fundamental operation of viewing is to define an eye within your world space that represents
the view you want to take of your modeling space. Defining the eye implies that you are defining a
coordinate system relative to that eye position, and you must then transform your modeling space
into a standard form relative to this coordinate system by defining, and applying, a viewing
transformation. The fundamental operation of projection, in turn, is to define a plane within 3-
dimensional space, define a mapping that projects the model into that plane, and displays that plane
in a given space on the viewing surface (we will usually think of a screen, but it could be a page, a
video frame, or a number of other spaces).
We will think of the 3D space we work in as the traditional X -Y -Z Cartesian coordinate space,
usually with the X - and Y -axes in their familiar positions and with the Z-axis coming toward the
viewer from the X -Y plane. This is a right-handed coordinate system, so-called because if you
orient your right hand with your fingers pointing from the X -axis towards the Y -axis, your thumb
will point towards the Z-axis. This orientation is commonly used for modeling in computer
graphics because most graphics APIs define the plane onto which the image is projected for
viewing as the X -Y plane, and project the model onto this plane in some fashion along the Z-axis.
The mechanics of the modeling transformations, viewing transformation, and projection are
managed by the graphics API, and the task of the graphics programmer is to provide the API with
Finally, it is sometimes useful to “cut away” part of an image so you can see things that would
otherwise be hidden behind some objects in a scene. We include a brief discussion of clipping
planes, a technique for accomplishing this action, because the system must clip away parts of the
scene that are not visible in the final image.
As a physical model, we can think of the viewing process in terms of looking through a rectangular
frame that is held in front of your eye. You can move yourself around in the world, setting your
eye into whatever position and orientation from you wish to see the scene. This defines your
viewpoint and view reference point. The shape of the frame and the orientation you give it
determine the aspect ratio and the up direction for the image. Once you have set your position in
the world, you can hold up the frame to your eye and this will set your projection; by changing the
distance of the frame from the eye you change the breadth of field for the projection. Between
these two operations you define how you see the world in perspective through the hole. And
finally, if you put a piece of transparent material that is ruled in very small squares behind the
cardboard (instead of your eye) and you fill in each square to match the brightness you see in the
square, you will create a copy of the image that you can take away from the original location. Of
course, you only have a perspective projection instead of an orthogonal projection, but this model
of viewing is a good place to start in understanding how viewing and projection work.
As we noted above, the goal of the viewing process is to rearrange the world so it looks as it
would if the viewer’s eye were in a standard position, depending on the API’s basic model. When
we define the eye location, we give the API the information it needs to do this rearrangement. In
the next chapter on modeling, we will introduce the important concept of the scene graph, which
will integrate viewing and modeling. Here we give an overview of the viewing part of the scene
graph.
Figure 1.3: the eye coordinate system within the world coordinate system
In effect, you have defined a coordinate system within the world space relative to the eye. There
are many ways to create this definition, but basically they all involve specifying three pieces of data
in 3D space. Once this eye coordinate system is defined, we can apply an operation that changes
the coordinates of everything in the world into equivalent representations in the eye coordinate
system. This change of coordinates is a straightforward mathematical operation, performed by
creating a change-of-basis matrix for the new system and then applying it to the world-space
geometry. The transformation places the eye at the origin, looking along the Z-axis, and with the
Y -axis pointed upwards; this view is similar to that shown in Figure 1.4. The specifications allow
us to define the viewing transformation needed to move from the world coordinate system to the
eye coordinate system. Once the eye is in standard position, and all your geometry is adjusted in
the same way, the system can easily move on to project the geometry onto the viewing plane so the
view can be presented to the user.
In the next chapter we will discuss modeling, and part of that process is using transformations to
place objects that are defined in one position into a different position and orientations in world
space. This can be applied to defining the eye point, and we can think of starting with the eye in
standard position and applying transformations to place the eye where you want it. If we do that,
then the viewing transformation is defined by computing the inverse of the transformation that
placed the eye into the world. (If the concept of computing the inverse seems difficult, simply
think of undoing each of the pieces of the transformation; we will discuss this more in the chapter
on modeling).
Once you have organized the viewing information as we have described, you must organize the
information you send to the graphics system to define the way your scene is projected to the
screen. The graphics system provides ways to define the projection and, once the projection is
defined, the system will carry out the manipulations needed to map the scene to the display space.
These operations will be discussed later in this chapter.
Definitions
There are a small number of things that you must consider when thinking of how you will view
your scene. These are independent of the particular API or other graphics tools you are using, but
later in the chapter we will couple our discussion of these points with a discussion of how they are
handled in OpenGL. The things are:
• Your world must be seen, so you need to say how the view is defined in your model including
the eye position, view direction, field of view, and orientation. This defines the viewing
transformation that will be used to move from 3D world space to 3D eye space.
• In general, your world must be seen on a 2D surface such as a screen or a sheet of paper, so
you must define how the 3D world is projected into a 2D space. This defines the 3D clipping
and projection that will take the view from 3D eye space to 2D eye space.
• The region of the viewing device where the image is to be visible must be defined. This is the
window, which should not be confused with the concept of window on your screen, though
they often will both refer to the same space.
• When your world is seen in the window on the 2D surface, it must be seen at a particular place,
so you must define the location where it will be seen. This defines the location of the viewport
We will call these three things setting up your viewing environment, defining your projection, and
defining your window and viewport, respectively, and they are discussed in that order in the
sections below.
When you define a scene, you will want to do your work in the most natural world that would
contain the scene, which we called the model space in the graphics pipeline discussion of the
previous chapter. Objects defined in their individual model spaces are then placed in the world
space with modeling transformations, as described in the next chapter on modeling. This world
space is then transformed by the viewing transformation into a 3D space with the eye in standard
position. To define the viewing transformation, you must set up a view by putting your eyepoint
in the world space. This world is defined by the coordinate space you assumed when you modeled
your scene as discussed earlier. Within that world, you define four critical components for your
eye setup: where your eye is located, what point your eye is looking towards, how wide your field
of view is, and what direction is vertical with respect to your eye. When these are defined to your
graphics API, the geometry in your modeling is transformed with the viewing transformation to
create the view as it would be seen with the environment that you defined.
A graphics API defines the computations that transform your geometric model as if it were defined
in a standard position so it could be projected in a standard way onto the viewing plane. Each
graphics API defines this standard position and has tools to create the transformation of your
geometry so it can be viewed correctly. For example, OpenGL defines its viewing to take place in
a right-handed coordinate system and transforms all the geometry in your scene (and we do mean
all the geometry, including lights and directions, as we will see in later chapters) to place your eye
point at the origin, looking in the negative direction along the Z-axis. The eye-space orientation is
illustrated in Figure 1.4.
Y
Eye at origin, X
looking along
the Z-axis in
Z negative direction
Of course, no graphics API assumes that you can only look at your scenes with this standard view
definition. Instead, you are given a way to specify your view very generally, and the API will
convert the geometry of the scene so it is presented with your eyepoint in this standard position.
This conversion is accomplished through the viewing transformation that is defined from your
view definition as we discussed earlier.
The information needed to define your view includes your eye position (its (x, y, z) coordinates),
the direction your eye is facing or the coordinates of a point toward which it is facing, and the
The viewing transformation, then, is the transformation that takes the scene as you define it in
world space and aligns the eye position with the standard model, giving you the eye space we
discussed in the previous chapter. The key actions that the viewing transformation accomplishes
are to rotate the world to align your personal up direction with the direction of the Y -axis, to rotate
it again to put the look-at direction in the direction of the negative Z-axis (or to put the look-at point
in space so it has the same X - and Y -coordinates as the eye point and a Z-coordinate less than the
Z-coordinate of the eye point), to translate the world so that the eye point lies at the origin, and
finally to scale the world so that the look-at point or look-at vector has the value (0, 0, –1). This is
a very interesting transformation because what it really does is to invert the set of transformations
that would move the eye point from its standard position to the position you define with your API
function as above. This is very important in the modeling chapter below, and is discussed in some
depth later in this chapter in terms of defining the view environment for the OpenGL API.
The viewing transformation above defines the 3D eye space, but that cannot be viewed on our
standard devices. In order to view the scene, it must be mapped to a 2D space that has some
correspondence to your display device, such as a computer monitor, a video screen, or a sheet of
paper. The technique for moving from the three-dimensional world to a two-dimensional world
uses a projection operation that you define based on some straightforward fundamental principles.
When you (or a camera) view something in the real world, everything you see is the result of light
that comes to the retina (or the film) through a lens that focuses the light rays onto that viewing
surface. This process is a projection of the natural (3D) world onto a two-dimensional space.
These projections in the natural world operate when light passes through the lens of the eye (or
camera), essentially a single point, and have the property that parallel lines going off to infinity
seem to converge at the horizon so things in the distance are seen as smaller than the same things
when they are close to the viewer. This kind of projection, where everything is seen by being
projected onto a viewing plane through or towards a single point, is called a perspective projection.
Standard graphics references show diagrams that illustrate objects projected to the viewing plane
through the center of view; the effect is that an object farther from the eye are seen as smaller in the
projection than the same object closer to the eye.
On the other hand, there are sometimes situations where you want to have everything of the same
size show up as the same size on the image. This is most common where you need to take careful
measurements from the image, as in engineering drawings. An orthographic projection
accomplishes this by projecting all the objects in the scene to the viewing plane by parallel lines.
For orthographic projections, objects that are the same size are seen in the projection with the same
size, no matter how far they are from the eye. Standard graphics texts contain diagrams showing
how objects are projected by parallel lines to the viewing plane.
In Figure 1.5 we show two images of a wireframe house from the same viewpoint. The left-hand
image of the figure is presented with a perspective projection, as shown by the difference in the
apparent sizes of the front and back ends of the building, and by the way that the lines outlining the
sides and roof of the building get closer as they recede from the viewer. The right-hand image of
the figure is shown with an orthogonal projection, as shown by the equal sizes of the front and
Figure 1.5: perspective image (left) and orthographic image (right) of a simple model
These two projections operate on points in 3D space in rather straightforward ways. For the
orthographic projection, all points are projected onto the (X ,Y )-plane in 3D eye space by simply
omitting the Z-coordinate. Each point in 2D eye space is the image of a line parallel to the Z-axis,
so the orthographic projection is sometimes called a parallel projection. For the perspective
projection, any point is projected onto the plane Z=1 in 3D eye space at the point where the line
from the point to the origin in 3D eye space meets that plane. Because of similar triangles, if the
point (x,y,z) is projected to the point ( x ′, y ′), we must have x ′ = x /z and ( y ′ = y / z). Here each
point in 2D eye space is the image of a line through that point and the origin in 3D eye space.
After a projection is applied, your scene is mapped to 2D eye space, as we discussed in the last
chapter. However, the z-values in your scene are not lost. As each point is changed by the
projection transformation, its z-value is retained for later computations such as depth tests or
perspective-corrected textures. In some APIs such as OpenGL, the z-value is not merely retained
but its sign is changed so that positive z-values will go away from the origin in a left-handed way.
This convention allows the use of positive numbers in depth operations, which makes them more
efficient.
View Volumes
A projection is often thought of in terms of its view volume, the region of space that is to be visible
in the scene after the projection. With any projection, the fact that the projection creates an image
on a rectangular viewing device implicitly defines a set of boundaries for the left, right, top, and
bottom sides of the scene; these correspond to the left, right, top, and bottom of the viewing space.
In addition, the conventions of creating images include not including objects that are too close to or
too far from the eye point, and these give us the idea of front and back sides of the region of the
scene that can be viewed. Overall, then, the projection defines a region in three-dimensional space
that will contain all the parts of the scene that can be viewed. This region is called the viewing
volume for the projection. The viewing volumes for the perspective and orthogonal projections are
shown in Figure 1.6, with the eye point at the origin; this region is the space within the rectangular
volume (left, for the orthogonal projection) or the pyramid frustum (right, for the perspective
transformation). Note how these view volumes match the definitions of the regions of 3D eye
Z Y Z
Y Zfar
Znear
Zfar
X X
Znear
Figure 1.6: the viewing volumes for the orthogonal (left) and perspective (right) projections
While the orthographic view volume is defined only in a specified place in your model space, the
orthogonal view volume may be defined wherever you need it because, being independent of the
calculation that makes the world appear from a particular point of view, an orthogonal view can
take in any part of space. This allows you to set up an orthogonal view of any part of your space,
or to move your view volume around to view any part of your model. In fact, this freedom to
place your viewing volume for the orthographic projection is not particularly important because
you could always use simple translations to center the region you will see.
One of the reasons we pay attention to the view volume is that only objects that are inside the view
volume for your projection will be displayed; anything else in the scene will be clipped, that is, be
identified in the projection process as invisible, and thus will not be handled further by the graphics
system. Any object that is partly within and partly outside the viewing volume will be clipped so
that precisely those parts inside the volume are seen, and we discuss the general concept and
process of clipping later in this chapter. The sides of the viewing volume correspond to the
projections of the sides of the rectangular space that is to be visible, but the front and back of the
volume are less obvious—they correspond to the nearest and farthest space that is to be visible in
the projection. These allow you to ensure that your image presents only the part of space that you
want, and prevent things that might lie behind your eye from being projected into the visible space.
The perspective projection is quite straightforward to compute, and although you do not need to
carry this out yourself we will find it very useful later on to understand how it works. Given the
general setup for the perspective viewing volume, let’s look at a 2D version in Figure 1.7. Here
Point (X,Y,Z)
Y
Y'
Point (0,0,0)
Z 1
Plane Z = 1
1/ Z 1 1
1 1/ Z 1
1 1 1
This matrix represents a transformation called the perspective transformation, but because this
matrix involves a variable in the denominator this transformation is not a linear mapping. That will
have some significance later on when we realize that we must perform perspective corrections on
some interpolations of object properties. Note here that we do not make any change in the value of
Z, so that if we have the transformed values of X ’ and Y ’ and keep the original value of Z, we can
reconstruct the original values as X = X ’*Z and Y = Y ’*Z. The perspective projection then is
done by applying the perspective transformation and using only the values of X’ and Y’ as output.
We noted just a bit earlier that parts of an image outside the view volume are clipped, or removed
from the active scene, before the scene is displayed. Clipping for an orthogonal projection is quite
straightforward because the boundary planes are defined by constant values of single coordinates:
X = Xleft, X = Xright, Y = Ybottom, Y = Ytop, Z = Znear, and Z = Zfar. Clipping a line segment
against any of these planes checks to see whether the line crosses the plane and, if it does, replaces
the entire line segment with the line segment that does not include the part outside the volume.
Algorithms for clipping are very well known and we do not include them here because we do not
want to distract the reader from the ideas of the projection.
On the other hand, clipping on the view volume for the perspective projection would require doing
clipping tests against the side planes that slope, and this is more complex. We can avoid this by
applying a bit of cleverness: apply the perspective transformation before you carry out the
clipping. Because each of the edges of the perspective view volume projects into a single point,
each edge is transformed by the perspective transformation into a line parallel to the Z-axis. Thus
the viewing volume for the perspective projection is transformed into a rectangular volume and the
clipping can be carried out just as it was for the orthogonal projection.
The scene as presented by the projection is still in 2D eye space, and the objects are all defined by
real numbers. However, the display space is discrete, so the next step is a conversion of the
geometry in 2D eye coordinates to discrete coordinates. This required identifying discrete screen
points to replace the real-number eye geometry points, and introduces some sampling issues that
must be handled carefully, but graphics APIs do this for you. The actual display space used
depends on the window and the viewport you have defined for your image.
To a graphics system, a window is a rectangular region in your viewing space in which all of the
drawing from your program will be done. It is usually defined in terms of the physical units of the
drawing space. The window will be placed in your overall display device in terms of the device’s
coordinate system, which will vary between devices and systems. The window itself will have its
own coordinate system, and the window space in which you define and manage your graphics
content will be called screen space, and is identified with integer coordinates. The smallest
displayed unit in this space will be called a pixel, a shorthand for picture element. Note that the
window for drawing is a distinct concept from the window in a desktop display window system,
You will recall that we have a final transformation in the graphics pipeline from the 2D eye
coordinate system to the 2D screen coordinate system. In order to understand that transformation,
you need to understand the relation between points in two corresponding rectangular spaces. In
this case, the rectangle that describes the scene to the eye is viewed as one space, and the rectangle
on the screen where the scene is to be viewed is presented as another. The same processes apply
to other situations that are particular cases of corresponding points in two rectangular spaces that
we will see later, such as the relation between the position on the screen where the cursor is when a
mouse button is pressed, and the point that corresponds to this in the viewing space, or points in
the world space and points in a texture space.
YMAX T
(x,y) (u,v)
XMIN XMAX L R
YMIN B
In Figure 1.8, we show two rectangles with boundaries and points named as shown. In this
example, we assume that the lower left corner of each rectangle has the smallest coordinate values
in the rectangle. So the right-hand rectangle has a smallest X -value of L and a largest X -value of
R, and a smallest Y -value of B and a largest Y -value of T, for example (think left, right, top, and
bottom in this case).
With the names that are used in the figures, we have the proportions
X : XMIN:: XMAX: XMIN = u : L :: R : L
Y :YMIN :: YMAX: YMIN = v : B :: T : B
from which we can derive the equations:
(x − XMIN)/( XMAX− XMIN) = (u − L)/(R − L)
(y − YMIN)/(YMAX− YMIN) = (v − B)/(T − B)
and finally these two equations can be solved for the variables of either point in terms of the other,
giving x and y in terms of u and v as:
x = XMIN+ (u − L)(XMAX− XMIN)/( R − L)
y = YMIN + (v − B)(YMAX− YMIN)/(T − B)
or the dual equations that solve for (u,v) in terms of (x, y).
This discussion was framed in very general terms with the assumption that all our values are real
numbers, because we were taking arbitrary ratios and treating them as exact values. This would
hold if we were talking about 2D eye space, but a moment’s thought will show that these relations
cannot hold in general for 2D screen space because integer ratios are only rarely exact. In the case
of interest to us, one of these is 2D eye space and one is 2D screen space, so we must stop to ask
We noted that the window has a separate coordinate system, but we were not more specific about
it. Your graphics API may use either of two conventions for window coordinates. The window
may have its origin, or (0,0) value, at either the upper left or lower left corner. In the discussion
above, we assumed that the origin was at the lower left because that is the standard mathematical
convention, but graphics hardware often puts the origin at the top left because that corresponds to
the lowest address of the graphics memory. If your API puts the origin at the upper left, you can
make a simple change of variable as Y ′ = YMAX− Y and using the Y ′ values instead of Y will put
you back into the situation described in the figure.
When you create your image, you can choose to present it in a distinct sub-rectangle of the window
instead of the entire window, and this part is called a viewport. A viewport is a rectangular region
within that window to which you can restrict your image drawing. In any window or viewport,
the ratio of its width to its height is called its aspect ratio. A window can have many viewports,
even overlapping if needed to manage the effect you need, and each viewport can have its own
image. Mapping an image to a viewport is done with exactly the same calculations we described
above, except that the boundaries of the drawing area are the viewport’s boundaries instead of the
window’s. The default behavior of most graphics systems is to use the entire window for the
viewport. A viewport is usually defined in the same terms as the window it occupies, so if the
window is specified in terms of physical units, the viewport probably will be also. However, a
viewport can also be defined in terms of its size relative to the window, in which case its boundary
points will be calculated from the window’s.
If your graphics window is presented in a windowed desktop system, you may want to be able to
manipulate your graphics window in the same way you would any other window on the desktop.
You may want to move it, change its size, and click on it to bring it to the front if another window
has been previously chosen as the top window. This kind of window management is provided by
the graphics API in order to make the graphics window behavior on your system compatible with
the behavior on all the other kinds of windows available. When you manipulate the desktop
window containing the graphics window, the contents of the window need to be managed to
maintain a consistent view. The graphics API tools will give you the ability to manage the aspect
ratio of your viewports and to place your viewports appropriately within your window when that
window is changed. If you allow the aspect ratio of a new viewport to be different than it was
when defined, you will see that the image in the viewport seems distorted, because the program is
trying to draw to the originally-defined viewport.
A single program can manage several different windows at once, drawing to each as needed for the
task at hand. Individual windows will have different identifiers, probably returned when the
window is defined, and these identifiers are used to specify which window will get the drawing
commands as they are given. Window management can be a significant problem, but most
graphics APIs have tools to manage this with little effort on the programmer’s part, producing the
kind of window you are accustomed to seeing in a current computing system—a rectangular space
that carries a title bar and can be moved around on the screen and reshaped. This is the space in
which all your graphical image will be seen. Of course, other graphical outputs such as video will
handle windows differently, usually treating the entire output frame as a single window without
any title or border.
Once you have defined the basic features for viewing your model, there are a number of other
things you can consider that affect how the image is created and presented. We will talk about
many of these over the next few chapters, but here we talk about hidden surfaces, clipping planes,
and double buffering.
Hidden surfaces
Most of the things in our world are opaque, so we only see the things that are nearest to us as we
look in any direction. This obvious observation can prove challenging for computer-generated
images, however, because a graphics system simply draws what we tell it to draw in the order we
tell it to draw them. In order to create images that have the simple “only show me what is nearest”
property we must use appropriate tools in viewing our scene.
Most graphics systems have a technique that uses the geometry of the scene in order to decide what
objects are in front of other objects, and can use this to draw only the part of the objects that are in
front as the scene is developed. This technique is generally called Z-buffering because it uses
information on the z-coordinates in the scene, as shown in Figure 1.4. In some systems it goes by
other names; for example, in OpenGL this is called the depth buffer. This buffer holds the z-value
of the nearest item in the scene for each pixel in the scene, where the z-values are computed from
the eye point in eye coordinates. This z-value is the depth value after the viewing transformation
has been applied to the original model geometry.
This depth value is not merely computed for each vertex defined in the geometry of a scene. When
a polygon is processed by the graphics pipeline, an interpolation process is applied as described in
the interpolation discussion in the chapter on the rendering pipeline. If a perspective projection is
selected, the interpolation can take perspective into account as described there. This process will
define a z-value, which is also the distance of that point from the eye in the z-direction, for each
pixel in the polygon as it is processed. This allows a comparison of the z-value of the pixel to be
plotted with the z-value that is currently held in the depth buffer. When a new point is to be
plotted, the system first makes this comparison to check whether the new pixel is closer to the
viewer than the current pixel in the image buffer and if it is, replaces the current point by the new
point. This is a straightforward technique that can be managed in hardware by a graphics board or
in software by simple data structures. There is a subtlety in this process for some graphics APIs
that should be understood, however. Because it is more efficient to compare integers than floating-
point numbers, the depth values in the buffer may be kept as unsigned integers, scaled to fit the
range between the near and far planes of the viewing volume with 0 as the front plane. This
integer conversion can cause a phenomenon called “Z-fighting” because of aliasing when floating-
point numbers are converted to integers. This can cause the depth buffer to show inconsistent
values for things that are supposed to be at equal distances from the eye. Integer conversion is
particularly a problem if the dear and far planes are far apart, because in that case the integer depth
is coarser than if the planes are close. This problem is best controlled by trying to fit the near and
far planes of the view as closely as possible to the actual items being displayed. This makes each
integer value represent a smaller real number and so there is less likelihood of two real depths
getting the same integer representation.
There are other techniques for ensuring that only the genuinely visible parts of a scene are
presented to the viewer, however. If you can determine the depth (the distance from the eye) of
each object in your model, then you may be able to sort a list of the objects so that you can draw
them from back to front—that is, draw the farthest first and the nearest last. In doing this, you will
replace anything that is hidden by other objects that are nearer, resulting in a scene that shows just
the visible content. This is a classical technique called the painter’s algorithm (because it mimics
the way a painter could create an image using opaque paints) that was widely used in more limited
Double buffering
A buffer is a set of memory that is used to store the result of computations, and most graphics
APIs allow you to use two image buffers to store the results of your work. These are called the
color buffer and the back buffer; the contents of the color buffer are what you see on your graphics
screen. If you use only a single buffer, it is the color buffer, and as you generate your image, is
written into the color buffer. Thus all the processes of clearing the buffer and writing new content
to the buffer—new parts of your image—will all be visible to your audience.
Because it can take time to create an image, and it can be distracting for your audience to watch an
image being built, it is unusual to use a single image buffer unless you are only creating one image.
Most of the time you would use both buffers, and write your graphics to the back buffer instead of
the color buffer. When your image is completed, you tell the system to switch the buffers so that
the back buffer (with the new image) becomes the color buffer and the viewer sees the new image.
When graphics is done this way, we say that we are using double buffering.
Because it can be disconcerting to actually watch the pixels changing as the image is created,
particularly if you were creating an animated image by drawing one image after another, double
buffering is essential to animated images. In fact, is used quite frequently for other graphics
because it is more satisfactory to present a completed image instead of a developing image to a
user. You must remember, however, that when an image is completed you must specify that the
buffers are to be swapped, or the user will never see the new image!
Clipping planes
Clipping is the process of drawing with the portion of an image on one side of a plane drawn and
the portion on the other side omitted. Recall from the discussion of geometric fundamentals that a
plane is defined by a linear equation
Ax + By + Cz + D = 0
so it can be represented by the 4-tuple of real numbers (A,B,C,D). The plane divides the space
into two parts: those points (x,y,z) for which Ax + By + Cz + D is positive and those points for
which it is negative. When you define the clipping plane for your graphics API with the functions
it provides, you will probably specify it to the API by giving the four coefficients of the equation
above. The operation of the clipping process is that any points for which this value is negative will
not be displayed; any points for which it is positive or zero will be displayed.
Clipping defines parts of the scene that you do not want to display—parts that are to be left out for
any reason. Any projection operation automatically includes clipping, because it must leave out
objects in the space to the left, right, above, below, in front, and behind the viewing volume. In
effect, each of the planes bounding the viewing volume for the projection is also a clipping plane
for the image. You may also want to define other clipping planes for an image. One important
reason to include clipping might be to see what is inside an object instead of just seeing the object’s
surface; you can define clipping planes that go through the object and display only the part of the
object on one side or another of the plane. Your graphics API will probably allow you to define
other clipping planes as well.
In actual practice, there are often techniques for handling clipping that are even simpler than that
described above. For example, you might make only one set of comparisons to establish the
relationship between a vertex of an object and a set of clipping planes such as the boundaries of a
standard viewing volume. You would then be able to use these tests to drive a set of clipping
operations on the line segment. We could then extend the work of clipping on line segments to
clipping on the segments that are the boundaries of a polygon in order to clip parts of a polygon
against one or more planes. We leave the details to the standard literature on graphics techniques.
Stereo viewing
Stereo viewing gives us an opportunity to see some of these viewing processes in action. Let us
say quickly that stereo viewing should not be your first goal in creating images; it requires a bit of
experience with the basics of viewing before it makes sense. Here we describe binocular
viewing—viewing that requires you to converge your eyes beyond the computer screen or printed
image, but that gives you the full effect of 3D when the images are converged. Other techniques
are described in later chapters.
Stereo viewing is a matter of developing two views of a model from two viewpoints that represent
the positions of a person’s eyes, and then presenting those views in a way that the eyes can see
individually and resolve into a single image. This may be done in many ways, including creating
two individual printed or photographed images that are assembled into a single image for a viewing
system such as a stereopticon or a stereo slide viewer. (If you have a stereopticon, it can be very
interesting to use modern technology to create the images for this antique viewing system!) Later
in this chapter we describe how to present these as two viewports in a single window on the screen
with OpenGL.
When you set up two viewpoints in this fashion, you need to identify two eye points that are offset
by a suitable value in a plane perpendicular to the up direction of your view. It is probably
simplest is you define your up direction to be one axis (perhaps the z-axis) and your overall view
to be aligned with one of the axes perpendicular to that (perhaps the x-axis). You can then define
an offset that is about the distance between the eyes of the observer (or perhaps a bit less, to help
the viewer’s eyes converge), and move each eyepoint from the overall viewpoint by half that
offset. This makes it easier for each eye to focus on its individual image and let the brain’s
convergence create the merged stereo image. It is also quite important to keep the overall display
small enough so that the distance between the centers of the images in the display is not larger than
the distance between the viewer’s eyes so that he or she can focus each eye on a separate image.
The result can be quite startling if the eye offset is large so the pair exaggerates the front-to-back
differences in the view, or it can be more subtle if you use modest offsets to represent realistic
A significant number of people have physical limitations that do not allow their eyes to perform the
kind of convergence that this kind of stereo viewing requires. Some people have general
convergence problems which do not allow the eyes to focus together to create a merged image, and
some simply cannot seem to see beyond the screen to the point where convergence would occur.
In addition, if you do not get the spacing of the stereo pair right, or have the sides misaligned, or
allow the two sides to refresh at different times, or ... well, it can be difficult to get this to work
well for users. If some of your users can see the converged image and some cannot, that’s
probably as good as it’s going to be.
The OpenGL code below captures much of the code needed in the discussion that follows in this
section. It could be taken from a single function or could be assembled from several functions; in
the sample structure of an OpenGL program in the previous chapter we suggested that the viewing
and projection operations be separated, with the first part being at the top of the display()
function and the latter part being at the end of the init() and reshape() functions.
// Define the projection for the scene
glViewport(0,0,(GLsizei)w,(GLsizei)h);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(60.0,(GLsizei)w/(GLsizei)h,1.0,30.0);
Defining a window and viewport: The window was defined in the previous chapter by a set of
functions that initialize the window size and location and create the window. The details of
window management are intentionally hidden from the programmer so that an API can work across
many different platforms. In OpenGL, it is easiest to delegate the window setup to the GLUT
toolkit where much of the system-dependent parts of OpenGL are defined; the functions to do this
are:
The integer value thisWindow that is returned by the glutCreateWindow can be used later to set
the window you just created as the active window to which you will draw. This is done with the
glutSetWindow function, as in
glutSetWindow(thisWindow);
which sets the window identified with thisWindow as the current window. If you are need to
check which window is active, you can use the glutGetWindow() function that returns the
window’s value. In any case, no window is active until the main event loop is entered, as
described in the previous chapter.
A viewport is defined by the glViewport function that specifies the lower left coordinates and the
upper right coordinates for the portion of the window that will be used by the display. This
function will normally be used in your initialization function for the program.
glViewport(VPLowerLeftX,VPLowerLeftY,VPUpperRightX,VPUpperRightY);
You can see the use of the viewport in the stereo viewing example below to create two separate
images within one window.
Reshaping the window: The window is reshaped when it initially created or whenever is moved it
to another place or made larger or smaller in any of its dimensions. These reshape operations are
handled easily by OpenGL because the computer generates an event whenever any of these
window reshapes happens, and there is an event callback for window reshaping. We will discuss
events and event callbacks in more detail later, but the reshape callback is registered by the function
glutReshapeFunc(reshape) which identifies a function reshape(GLint w,GLint h)
that is to be executed whenever the window reshape event occurs and that is to do whatever is
necessary to regenerate the image in the window.
The work that is done when a window is reshaped can involve defining the projection and the
viewing environment and updating the definition of the viewport(s) in the window, or can delegate
some of these to the display function. The reshape callback gets the dimensions of the window as
it has been reshaped, and you can use these to control the way the image is presented in the
reshaped window. For example, if you are using a perspective projection, the second parameter of
the projection definition is the aspect ratio, and you can set this with the ratio of the width and
height you get from the callback, as
gluPerspective(60.0,(GLsizei)w/(GLsizei)h,1.0,30.0);
This will let the projection compensate for the new window shape and retain the proportions of the
original scene. On the other hand, if you really only want to present the scene in a given aspect
ratio, then you can simply take the width and height and define a viewport in the window that has
the aspect ratio you want. If you want a square presentation, for example, then simply take the
smaller of the two values and define a square in the middle of the window as your viewport, and
then do all your drawing to that viewport.
Any viewport you may have defined in your window probably needs either to be defined inside the
reshape callback function so it can be redefined for resized windows or to be defined in the display
function where the changed window dimensions can be taken into account when it is defined. The
viewport probably should be designed directly in terms relative to the size or dimensions of the
window, so the parameters of the reshape function should be used. For example, if the window is
defined to have dimensions (width, height) as in the definition above, and if the viewport is
to comprise the right-hand side of the window, then the viewport’s coordinates are
(width/2, 0, width, height)
and the aspect ratio of the window is width/(2*height). If the window is resized, you will
probably want to make the width of the viewport no larger than the larger of half the new window
Defining a viewing environment: To define what is usually called the viewing projection, you
must first ensure that you are working with the GL_MODELVIEW matrix, then setting that matrix to
be the identity, and finally define the viewing environment by specifying two points and one
vector. The points are the eye point, the center of view (the point you are looking at), and the
vector is the up vector—a vector that will be projected to define the vertical direction in your image.
The only restrictions are that the eye point and center of view must be different, and the up vector
must not be parallel to the vector from the eye point to the center of view. As we saw earlier,
sample code to do this is:
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
// eye point center of view up
gluLookAt(10.0, 10.0, 10.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0);
The gluLookAt function may be invoked from the reshape function, or it may be put inside
the display function and variables may be used as needed to define the environment. In general,
we will lean towards including the gluLookAt operation at the start of the display function, as
we will discuss below. See the stereo view discussion below for an idea of what that can do.
The effect of the gluLookAt(...) function is to define a transformation that moves the eye
point from its default position and orientation. That default position and orientation has the eye at
the origin and looking in the negative z-direction, and oriented with the y-axis pointing upwards.
This is the same as if we invoked the gluLookAt function with the parameters
gluLookAt(0., 0., 0., 0., 0., -1., 0., 1., 0.).
When we change from the default value to the general eye position and orientation, we define a set
of transformations that give the eye point the position and orientation we define. The overall set of
transformations supported by graphics APIs will be discussed in the modeling chapter, but those
used for defining the eyepoint are:
1. a rotation about the Z-axis that aligns the Y -axis with the up vector,
2. a scaling to place the center of view at the correct distance along the negative Z-axis,
3. a translation that moves the center of view to the origin,
4. two rotations, about the X - and Y -axes, that position the eye point at the right point relative to
the center of view, and
5. a translation that puts the center of view at the right position.
In order to get the effect you want on your overall scene, then, the viewing transformation must be
the inverse of the transformation that placed the eye at the position you define, because it must act
on all the geometry in your scene to return the eye to the default position and orientation. Because
functions have the property that the inverse of a product is the product of the inverses in reverse
order, as in
( f ∗ g)−1 = g−1 ∗ f −1
for any f and g, the viewing transformation is built by inverting each of these five transformations
in the reverse order. And because this must be done on all the geometry in the scene, it must be
applied last, so it must be specified before any of the geometry is defined. Because of this we will
usually see the gluLookAt(...) function as one of the first things to appear in the
display() function, and its operation is the same as applying the transformations
1. translate the center of view to the origin,
2. rotate about the X - and Y -axes to put the eye point on the positive Z-axis,
3. translate to put the eye point at the origin,
You may wonder why we are discussing at this point how the gluLookAt(...) function
defines the viewing transformation that goes into the modelview matrix, but we will need to know
about this later when we need to control the eye point as part of our modeling in more advanced
kinds of scenes.
A perspective projection is defined by first specifying that you want to work on the
GL_PROJECTION matrix, and then setting that matrix to be the identity. You then specify the
properties that will define the perspective transformation. In order, these are the field of view (an
angle, in degrees, that defines the width of your viewing area), the aspect ratio (a ratio of width to
height in the view; if the window is square this will probably be 1.0 but if it is not square, the
aspect ratio will probably be the same as the ratio of the window width to height), the zNear
value (the distance from the viewer to the plane that will contain the nearest points that can be
displayed), and the zFar value (the distance from the viewer to the plane that will contain the
farthest points that can be displayed). This sounds a little complicated, but once you’ve set it up a
couple of times you’ll find that it’s very simple. It can be interesting to vary the field of view,
though, to see the effect on the image.
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(60.0,1.0,1.0,30.0);
It is also possible to define your perspective projection by using the glFrustum function that
defines the projection in terms of the viewing volume containing the visible items, as was shown in
Figure 1.4 above. This call is
glFrustum( left, right, bottom, top, near, far );
Perhaps the gluPerspective function is more natural, so we will not discuss the glFrustum function
further and leave it to the student who wants to explore it.
In the Getting Started chapter, we introduced the structure of a program that uses OpenGL and saw
the glutInitDisplayMode function, called from main, is a way to define properties of the
display. This function also allows the use of hidden surfaces if you specify GLUT_DEPTH as one
of its parameters.
glutInitDisplayMode (GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH);
If you want to turn off the depth test, there is a glDisable function as well as the glEnable
function. Note the use of the enable and disable functions in enabling and disabling the clipping
plane in the example code for stereo viewing.
Double buffering is a standard facility, and you will note that the function above that initializes the
display mode includes a parameter GLUT_DOUBLE to set up double buffering. This indicates that
you will use two buffers, called the back buffer and the front buffer, in your drawing. The content
of the front buffer is displayed, and all drawing will take place to the back buffer. So in your
display() function, you will need to call glutSwapBuffers() when you have finished
creating the image; that will cause the back buffer to be exchanged with the front buffer and your
new image will be displayed. An added advantage of double buffering is that there are a few
techniques that use drawing to the back buffer and examination of that buffer’s contents without
swapping the buffers, so the work done in the back buffer will not be seen.
In addition to the clipping OpenGL performs on the standard view volume in the projection
operation, OpenGL allows you to define at least six clipping planes of your own, named
GL_CLIP_PLANE0 through GL_CLIP_PLANE5. The clipping planes are defined by the
function glClipPlane(plane, equation) where plane is one of the pre-defined clipping
planes above and equation is a vector of four GLfloat values. Once you have defined a
clipping plane, it is enabled or disabled by a glEnable(GL_CLIP_PLANEn) function or
equivalent glDisable(...) function. Clipping is performed when any modeling primitive is
called when a clip plane is enabled; it is not performed when the clip plane is disabled. They are
then enabled or disabled as needed to take effect in the scene. Specifically, some example code
looks like
GLfloat myClipPlane[] = { 1.0, 1.0, 0.0, -1.0 };
glClipPlane(GL_CLIP_PLANE0, myClipPlane);
glEnable(GL_CLIP_PLANE0);
...
glDisable(GL_CLIP_PLANE0);
The stereo viewing example at the end of this chapter includes the definition and use of clipping
planes.
In this section we describe the implementation of binocular viewing as described earlier in this
chapter. The technique we will use is to generate two views of a single model as if they were seen
from the viewer’s separate eyes, and present these in two viewports in a single window on the
screen. These two images are then manipulated together by manipulating the model as a whole,
while viewer resolves these into a single image by focusing each eye on a separate image.
This latter process is fairly simple. First, create a window that is twice as wide as it is high, and
whose overall width is twice the distance between your eyes. Then when you display your model,
do so twice, with two different viewports that occupy the left and right half of the window. Each
display is identical except that the eye points in the left and right halves represent the position of the
left and right eyes, respectively. This can be done by creating a window with space for both
viewports with the window initialization function
#define W 600
#define H 300
width = W; height = H;
glutInitWindowSize(width,height);
Here the initial values set the width to twice the height, allowing each of the two viewports to be
initially square. We set up the view with the overall view at a distance of ep from the origin in the
x-direction and looking at the origin with the z-axis pointing up, and set the eyes to be at a given
offset distance from the overall viewpoint in the y-direction. We then define the left- and right-
hand viewports in the display() function as follows
// left-hand viewport
glViewport(0,0,width/2,height);
...
// eye point center of view up
gluLookAt(ep, -offset, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0);
... code for the actual image goes here
...
// right-hand viewport
glViewport(width/2,0,width/2,height);
...
// eye point center of view up
gluLookAt(ep, offset, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0);
... the same code as above for the actual image goes here
...
Summary
This chapter has discussed a number of topics in viewing and projection that are basic to computer
graphics and must be addressed in graphics programming. The viewing transformation is
determined by the eye point, view reference point, and up direction; the perspective projection is
determined by the width and height of the view (often expressed as the angle of the view and the
The viewing and projection operations can be expressed in terms of OpenGL functions and these
were presented along with a number of other OpenGL functions to provide window and viewport
management, double buffering, depth testing, and more general clipping operations.
With these concepts and operations, you can write a graphics program that has all its modeling
done in world space, and you can implement such techniques as stereo viewing. In the next few
chapters we will introduce general modeling techniques that will extend these abilities to be able to
write very general and capable graphics programs.
Questions
This set of questions covers your recognition of issues in viewing and projections as you see them
in your personal environment. They will help you see the effects of defining views and applying
projections and the other topics in this chapter
1. Find a comfortable environment and examine the ways your view of that environment depend
on your eyepoint and your viewing direction. Note how objects seem to move in front of and
behind other objects as you move your eyepoing, and notice how objects move into the view
from one side and out of the view on the other side as you rotate your viewing direction. (It
may help if you make a paper or cardboard rectangle to look through as you do this.)
2. Because of the way our eyes work, we cannot see an orthogonal view of a scene. However, if
we keep our eyes oriented in a fixed direction and move around in a scene, the view directly
ahead of us will approximate a piece of an orthogonal view. For your familiar environment as
above, try this and see if you can sketch what you see at each point and put them together into a
single image.
3. Consider a painter’s algorithm approach to viewing your environment; write down the objects
you see in the order of farthest to nearest to your eye. Now move to another position in the
environment and imagine drawing the things you see in the same order you wrote them down
from the other viewpoint. What things are out of order and so would have the farther thing
drawn on top of the nearer thing? What conclusions can you draw about the calculations you
would need to do for the painter’s algorithm?
4. Imagine defining a plane through the middle of your environment so that everything on one
side of the plane is not drawn. Make this plane go through some of the the objects you would
see, so that one part of the object would be visible and another part invisible. What would the
view of the environment look like? What would happen to the view if you switched the
visibility of the two sides of the plane?
Exercises
These exercises ask you to carry out some calculations that are involved in creating a view of a
scene and in doing the projection of the scene to the screen.
5. Take a standard perspective viewing definition with, say, a 45° field of view, an aspect ratio of
1.0, a distance to the front plane of the viewing frustum of 1.0, and a distance to the back
plane of the viewing frustum of 20.0. For a point P=(x,y,1.) in the front plane, derive the
parametric equation for the line segment within the frustum that all projects to P. Hint: the
7. In the numerically-modeled environment above, place your eyepoint in the (X ,Z)-center of the
space (the middle of the space left to right and front to back), and have your eye face the origin
at floor height. Calculate the coordinates of each point in the space relative to the eye
coordinate system, and try to identify a common process for each of these calculations.
Experiments
8. In the first chapter you saw the complete code for a simple program to display the concept of
heat transfer in a bar, and in the exercises you saw some discussion of the behavior of the
program when the window was manipulated. Working with the projection in the reshape()
function in that program, create other displays for the program: create an orthogonal
projection, and create a perspective projection that will always fit the image within the window.
9. In the chapter you saw the glEnable(...) function for depth testing, and you saw the
effect of depth testing in creating images where the objects that are nearer to the eye obscure
objects that are farther from the eye. In this experiment, disable depth testing with the function
glDisable(GL_DEPTH_TEST) and draw the same scene that you drew with depth testing
enabled. View the scene from several points of view and draw conclusions about why you
will get very different images from the same scene with different viewpoints.
In the next two experiments, we will work with the very simple model of the house in Figure 1.5,
though you are encouraged to replace that with a more interesting model of your own. The code
for a function to create the house centered around the origin that you can call from your
display() is given below to help you get started.
10.Create a program to draw the house with this function or to draw your own scene, and note
what happens to the view as you move your eyepoint around the scene, always looking at the
origin (0,0,0).
11.With the same program as above and a fixed eye point, experiment with the other parameters of
the perspective view: the front and back view planes, the aspect ratio of the view, and the field
of view of the projection. For each, note the effect so you can control these when you create
more sophisticated images later.
In the next two experiments, you will consider the matrices for the projection and viewing
transformations described in this chapter. For more on transformation matrices, see Chapter 4.
In OpenGL, we have the general glGet*v(...) inquiry functions that return the value of a
number of different system parameters. We can use these to retrieve the values of some of the
transformations that are discussed in this chapter. Specifically, we can get the values of the
projection transformation and viewing transformation for any projection or any view that we
define. In the following two problems we explore this possibility. In order to make this most
effective, you should write a small function to display a 4x4 matrix so that you can see its
components clearly.
12.To get the value of the projection transformation, we use the function
glGetFloatv(GL_PROJECTION_MATRIX, v)
where v is an array of 16 floats that could be defined as
GLfloat v[4][4];
To see the matrix for any projection, whether perspective or orthogonal, simply insert the
function call above into your code any time after you have defined your projection, and print
out the matrix that is returned. If the projection is orthogonal, you should be able to identify
the parameters of the projection from components of the matrix; if the projection is parallel, this
will be harder but you should start with the simple discussion of the perspective matrix in this
chapter. The experiment, then, is to take the matrix returned by this process from your
projection definition and change some values, reset the projection transformation with this new
matrix by
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glMultMatrixf(v);
This will redefine the projection transformation to the transformation whose matrix is v. You
13.To get the value of the viewing transformation, you can get the value of the OpenGL
modelview matrix, which is a product of the viewing and modeling transformations. So if you
have a view but no modeling transformations (that is, you have defined the view but have not
yet applied any scaling, rotation, or translation to any geometry) then you can use the function
glGetFloatv(GL_MODELVIEW_MATRIX, v)
where v is the same as defined above. This matrix can be fairly complicated, but if you set
only a few simple parameters for the view (only change the view point, only change the up
vector, etc.) then you should be able to identify the components of the viewing transformation
matrix that come from each part of the view definition. As in the previous experiment, take the
viewing transformation matrix returned by this process from your viewing definition, change
some values, reset the modelview matrix with this new matrix by the process above but using
the GL_MODELVIEW_MATRIX instead of GL_PROJECTION, and then observe the difference
between your original view and the new view.
The chapter has three distinct parts because there are three distinct levels of modeling that you will
use to create images. We begin with simple geometric modeling directly in world space: modeling
where you define the coordinates of each vertex of each component you will use at the point where
that component will reside in the final scene. This is straightforward but can be very time-
consuming to do for complex scenes, so we will also discuss importing models from various kinds
of modeling tools that can allow you to create parts of a scene more easily.
The second section describes the next step in modeling: creating your simple objects in standard
positions in their own individual model space and using modeling transformations to place them in
world space and give them any size, any orientation, and any position. This allows you to create a
set of simple models and extends their value by allowing you to use them very generally. This
involves a standard set of modeling transformations you can apply to your simple geometry in
order to create more general model components in your scene. This is a very important part of the
modeling process because it allows you to use appropriate transformations applied to standard
template objects, allowing you to create modest number of graphic objects and then generalize them
and place them in your scene as needed. These transformations are also critical to the ability to
define and implement motion in your scenes because it is typical to move parts of your scene, such
as objects, lights, and the eyepoint, with transformations that are controlled by parameters that
change with time. This can allow you to extend your modeling to define animations that can
represent time-varying concepts. This second section is presented without any reference to a
particular graphics API, but in the next chapter we will show how the concepts here are expressed
in OpenGL.
In the third section of the chapter we give you an important tool to organize complex images by
introducing the concept of the scene graph, a modeling tool that gives you a unified approach to
defining all the objects and transformations that are to make up a scene and to specifying how they
are related and presented. We then describe how you work from the scene graph to write the code
that implements your model. This concept is new to the introductory graphics course but has been
used in some more advanced graphics tools, and we believe you will find it to make the modeling
process much more straightforward for anything beyond a very simple scene. In the second level
of modeling discussed in this section, we introduce hierarchical modeling in which objects are
designed by assembling other objects to make more complex structures. These structures can
allow you to simulate actual physical assemblies and develop models of structures like physical
machines. Here we develop the basic ideas of scene graphs introduced earlier to get a structure that
allows individual components to move relative to each other in ways that would be difficult to
define from first principles.
This chapter requires an understanding of simple 3-dimensional geometry, knowledge of how to
represent points in 3-space, enough programming experience to be comfortable writing code that
calls API functions to do required tasks, ability to design a program in terms of simple data
structures such as stacks, and an ability to organize things in 3D space. When you have finished
this chapter you should be able to understand how to organize the geometry for a scene based on
simple model components and how to combine them with modeling transformations. You should
also be able to understand how to create complex, hierarchical scenes with a scene graph and how
to express such a scene graph in terms of graphics primitives.
Computer graphics deals with geometry and its representation in ways that allow it to be
manipulated and displayed by a computer. Because these notes are intended for a first course in
the subject, you will find that the geometry will be simple and will use familiar representations of
3-dimensional space. When you work with a graphics API, you will need to work with the kinds
of object representations that API understands, so you must design your image or scene in ways
that fit the API’s tools. For most APIs, this means using only a few simple graphics primitives,
such as points, line segments, and polygons.
The application programmer starts by defining a particular object with respect to a coordinate
system that includes a local origin lying somewhere in or around the object. This would naturally
happen if the object was created with some sort of modeling or computer-aided design system or
was defined by a mathematical function, and is described for some of these cases in subsequent
chapters. Modeling an object about its local origin involves defining it in terms of model
coordinates, a coordinate system that is used specifically to define a particular graphical object.
The model coordinates are defined by specifying the coordinates of each point by either defining
constant coordinates or by computing them from some known geometric object. This is done with
code that will look something like
vertex(x1, y1, z1);
vertex(x2, y2, z2);
...
vertex(xN, yN, zN);
Because the coordinate system is part of an object’s design, it may be different for every part of a
scene. In order to integrate each object, built with its own coordinates, into a single overall 3D
world space, the object must be placed in the world space by using an appropriate modeling
transformation. Modeling transformations, like all the transformations we will describe throughout
the book, are functions that move objects while preserving their geometric properties. The
transformations that are available to us in a graphics system are rotations, translations, and scaling.
Rotations hold a line through the origin of a coordinate system fixed and rotate all the points in a
scene by a fixed angle around the line, translations add a fixed value to each of the coordinates of
each point in a scene, and scaling multiplies each coordinate of a point by a fixed value. These will
be discussed in much more detail in the chapter on modeling below. All transformations may be
represented as matrices, so sometimes in a graphics API you will see a mention of a matrix; this
almost always means that a transformation is involved.
In practice, graphics programmers use a relatively small set of simple, built-in transformations and
build up the model transformations through a sequence of these simple transformations. Because
each transformation works on the geometry it sees, we see the effect of the associative law for
functions; in a piece of code represented by metacode such as
transformOne(...);
transformTwo(...);
transformThree(...);
geometry(...);
we see that transformThree is applied to the original geometry, transformTwo to the
results of that transformation, and transformOne to the results of the second transformation.
Letting t1, t2, and t3 be the three transformations, respectively, we see by the application of
the associative law for function composition that
t1(t2(t3(geometry))) = (t1*t2*t3)(geometry)
This shows us that in a product of transformations, applied by multiplying on the left, the
transformation nearest the geometry is applied first, and that this principle extends across multiple
The modeling transformation for an object in a scene can change over time to create motion in a
scene. For example, in a rigid-body animation, an object can be moved through the scene just by
changing its model transformation between frames. This change can be made through standard
built-in facilities in most graphics APIs, including OpenGL; we will discuss how this is done later.
Definitions
We need to have some common terminology as we talk about modeling. We will think of
modeling as the process of defining the objects that are part of the scene you want to view in an
image. There are many ways to model a scene for an image; in fact, there are a number of
commercial programs you can buy that let you model scenes with very high-level tools. However,
for much graphics programming, and certainly as you are beginning to learn about this field, you
will probably want to do your modeling by defining your geometry in terms of relatively simple
primitive terms so you may be fully in control of the modeling process.
The space we will use for our modeling is simple Euclidean 3-space with standard coordinates,
which we will call the X -, Y -, and Z-coordinates. Figure 2.1 illustrates a point, a line segment, a
polygon, and a polyhedron—the basic elements of the computer graphics world that you will use
for most of your graphics. In this space a point is simply a single location in 3-space, specified by
its coordinates and often seen as a triple of real numbers such as (px, py, pz). A point is drawn on
the screen by lighting a single pixel at the screen location that best represents the location of that
point in space. To draw the point you will specify that you want to draw points and specify the
point’s coordinates, usually in 3-space, and the graphics API will calculate the coordinates of the
point on the screen that best represents that point and will light that pixel. Note that a point is
usually presented as a square, not a dot, as indicated in the figure. A line segment is determined by
its two specified endpoints, so to draw the line you indicate that you want to draw lines and define
the points that are the two endpoints. Again, these endpoints are specified in 3-space and the
graphics API calculates their representations on the screen, and draws the line segment between
them. A polygon is a region of space that lies in a plane and is bounded in the plane by a collection
of line segments. It is determined by a sequence of points (called the vertices of the polygon) that
specify a set of line segments that form its boundary, so to draw the polygon you indicate that you
want to draw polygons and specify the sequence of vertex points. A polyhedron is a region of 3-
space bounded by polygons, called the faces of the polyhedron. A polyhedron is defined by
specifying a sequence of faces, each of which is a polygon. Because figures in 3-space determined
by more than three vertices cannot be guaranteed to line in a plane, polyhedra are often defined to
have triangular faces; a triangle always lies in a plane (because three points in 3-space determine a
plane. As we will see when we discuss lighting and shading in subsequent chapters, the direction
in which we go around the vertices of each face of a polygon is very important, and whenever you
design a polyhedron, you should plan your polygons so that their vertices are ordered in a
sequence that is counterclockwise as seen from outside the polyhedron (or, to put it another way,
Before you can create an image, you must define the objects that are to appear in that image
through some kind of modeling process. Perhaps the most difficult—or at least the most time-
consuming—part of beginning graphics programming is creating the models that are part of the
image you want to create. Part of the difficulty is in designing the objects themselves, which may
require you to sketch parts of your image by hand so you can determine the correct values for the
points used in defining it, for example, or it may be possible to determine the values for points
from some other technique. Another part of the difficulty is actually entering the data for the points
in an appropriate kind of data structure and writing the code that will interpret this data as points,
line segments, and polygons for the model. But until you get the points and their relationships
right, you will not be able to get the image right.
Besides defining a single point, line segment, or polygon, graphics APIs provide modeling
support for defining larger objects that are made up of several simple objects. These can involve
disconnected sets of objects such as points, line segments, quads, or triangles, or can involve
connected sets of points, such as line segments, quad strips, triangle strips, or triangle fans. This
allows you to assemble simpler components into more complex groupings and is often the only
way you can define polyhedra for your scene. Some of these modeling techniques involve a
concept called geometry compression, which allow you to define a geometric object using fewer
vertices than would normally be needed. The OpenGL support for geometry compression will be
discussed as part of the general discussion of OpenGL modeling processes. The discussions and
examples below will show you how to build your repertoire of techniques you can use for your
modeling.
Before going forward, however, we need to mention another way to specify points for your
models. In some cases, it can be helpful to think of your 3-dimensional space as embedded as an
affine subspace of 4-dimensional space. If we think of 4-dimensional space as having X , Y , Z,
and W components, this embedding identifies the three-dimensional space with the subspace W=1
of the four-dimensional space, so the point (x, y, z) is identified with the four-dimensional point
(x, y, z, 1). Conversely, the four-dimensional point (x, y, z, w) is identified with the three-
dimensional point (x/w, y/w, z/w) whenever w≠0. The four-dimensional representation of points
with a non-zero w component is called homogeneous coordinates, and calculating the three-
dimensional equivalent for a homogeneous representation by dividing by w is called homogenizing
the point. When we discuss transformations, we will sometimes think of them as 4x4 matrices
because we will need them to operate on points in homogeneous coordinates.
Not all points in 4-dimensional space can be identified with points in 3-space, however. The point
(x, y, z, 0) is not identified with a point in 3-space because it cannot be homogenized, but it is
identified with the direction defined by the vector <x, y, z>. This can be thought of as a “point at
infinity” in a certain direction. This has an application in the chapter below on lighting when we
discuss directional instead of positional lights, but in general we will not encounter homogeneous
coordinates often in these notes.
Some examples
In this section we will describe the kinds of simple objects that are directly supported by most
graphics APIs. We begin with very simple objects and proceed to more complex ones, but you
will find that both simple and complex objects will be needed in your work. With each kind of
primitive object, we will describe how that object is specified, and in later examples, we will create
a set of points and will then show the function call that draws the object we have defined.
To draw a single point, we will simply define the coordinates of the point and give them to the
graphics API function that draws points. Such a function can typically handle one point or a
number of points, so if we want to draw only one point, we provide only one vertex; if we want to
draw more points, we provide more vertices. Points are extremely fast to draw, and it is not
unreasonable to draw tens of thousands of points if a problem merits that kind of modeling. On a
very modest-speed machine without any significant graphics acceleration, a 50,000 point model
can be re-drawn in a small fraction of a second.
Line segments
To draw a single line segment, we must simply supply two vertices to the graphics API function
that draws lines. Again, this function will probably allow you to specify a number of line
segments and will draw them all; for each segment you simply need to provide the two endpoints
of the segment. Thus you will need to specify twice as many vertices as the number of line
segments you wish to produce.
The simple way that a graphics API handles lines hides an important concept, however. A line is a
continuous object with real-valued coordinates, and it is displayed on a discrete object with integer
screen coordinates. This is, of course, the difference between model space and eye space on one
hand and screen space on the other. While we focus on geometric thinking in terms that overlook
the details of conversions from eye space to screen space, you need to realize that algorithms for
such conversions lie at the foundation of computer graphics and that your ability to think in higher-
level terms is a tribute to the work that has built these foundations.
Connected lines
Connected lines—collections of line segments that are joined “head to tail” to form a longer
connected group—are shown in Figure 2.2. These are often called line strips and line loops, and
your graphics API will probably provide a function for drawing them. The vertex list you use will
define the line segments by using the first two vertices for the first line segment, and then by using
each new vertex and its predecessor to define each additional segment. The difference between a
line strip and a line loop is that the former does not connect the last vertex defined to the first
vertex, leaving the figure open; the latter includes this extra segment and creates a closed figure.
Thus the number of line segments drawn by the a line strip will be one fewer than the number of
vertices in the vertex list, while a line loop will draw the same number of segments as vertices.
This is a geometry compression technique because to define a line strip with N segments you only
specify N+1 vertices instead of 2N vertices; instead of needing to define two points per line
segment, each segment after the first only needs one vertex to be defined.
Triangle
To draw one or more unconnected triangles, your graphics API will provide a simple triangle-
drawing function. With this function, each set of three vertices will define an individual triangle so
Sequence of triangles
Triangles are the foundation of most truly useful polygon-based graphics, and they have some very
useful capabilities. Graphics APIs often provide two different geometry-compression techniques
to assemble sequences of triangles into your image: triangle strips and triangle fans. These
techniques can be very helpful if you are defining a large graphic object in terms of the triangles
that make up its boundaries, when you can often find ways to include large parts of the object in
triangle strips and/or fans. The behavior of each is shown in Figure 2.3 below. Note that this
figure and similar figures that show simple geometric primitives are presented as if they were
drawn in 2D space. In fact they are not, but in order to make them look three-dimensional we
would need to use some kind of shading, which is a separate concept discussed in a later chapter
(and which is used to present the triangle fan of Figure 2.18). We thus ask you to think of these as
three-dimensional, even though they look flat.
Most graphics APIs support both techniques by interpreting the vertex list in different ways. To
create a triangle strip, the first three vertices in the vertex list create the first triangle, and each
vertex after that creates a new triangle with the two vertices immediately before it. We will see in
later chapters that the order of points around a polygon is important, and we must point out that
these two techniques behave quite differently with respect to polygon order; for triangle fans, the
orientation of all the triangles is the same (clockwise or counterclockwise), while for triangle
strips, the orientation of alternate triangles is reversed. This may require some careful coding
when lighting models are used. To create a triangle fan, the first three vertices create the first
triangle and each vertex after that creates a new triangle with the point immediately before it and the
first point in the list. In each case, the number of triangles defined by the vertex list is two less
than the number of vertices in the list, so these are very efficient ways to specify triangles.
Quadrilateral
A convex quadrilateral, often called a “quad” to distinguish it from a general quadrilateral because
the general quadrilateral need not be convex, is any convex 4-sided figure. The function in your
graphics API that draws quads will probably allow you to draw a number of them. Each
quadrilateral requires four vertices in the vertex list, so the first four vertices define the first
quadrilateral, the next four the second quadrilateral, and so on, so your vertex list will have four
times as many points as there are quads. The sequence of vertices is that of the points as you go
around the perimeter of the quadrilateral. In an example later in this chapter, we will use six
quadrilaterals to define a cube that will be used in later examples.
You can frequently find large objects that contain a number of connected quads. Most graphics
APIs have functions that allow you to define a sequence of quads. The vertices in the vertex list
are taken as vertices of a sequence of quads that share common sides. For example, the first four
vertices can define the first quad; the last two of these, together with the next two, define the next
quad; and so on The order in which the vertices are presented is shown in Figure 2.4. Note the
order of the vertices; instead of the expected sequence around the quads, the points in each pair
have the same order. Thus the sequence 3-4 is the opposite order than would be expected, and this
same sequence goes on in each additional pair of extra points. This difference is critical to note
when you are implementing quad strip constructions. It might be helpful to think of this in terms
of triangles, because a quad strip acts as though its vertices were specified as if it were really a
triangle strip — vertices 1/2/3 followed by 2/3/4 followed by 3/4/5 etc.
7
8
5
3
1
2 4
As an example of the use of quad strips and triangle fans, let’s create your a model of a sphere. As
we will see in the next chapter, both the GLU and GLUT toolkits include pre-built sphere models,
but the sphere is a familiar object and it can be helpful to see how to create familiar things with new
tools. There may also be times when you need to do things with a sphere that are difficult with the
pre-built objects, so it is useful to have this example in your “bag of tricks.”
In the chapter on mathematical fundamentals, we will describe the use of spherical coordinates in
modeling. We can use spherical coordinates to model the sphere at first, and then we can later
convert to Cartesian coordinates as we describe in that chapter to present the model to the graphics
system for actual drawing. Let’s think of creating a model of the sphere with N divisions around
the equator and N/2 divisions along the prime meridian. In each case, then, the angular division
will be theta = 360/N degrees. Let’s also think of the sphere as having a unit radius, so it will be
easier to work with later when we have transformations. Then the basic structure would be:
Because we’re working with a sphere, the quad strips as we have defined them are planar, so there
is no need to divide each quad into two triangles to get planar surfaces as we might want to do for
other kinds of objects. Note the order in which we set the points in the triangle fans and in the
quad strips, as we described when we introduced these concepts; this is not immediately an
obvious order and you may want to think about it a bit. When you do, you will find that the point
sequence for a quad strip is exactly the same as the point sequence for a triangle strip.
General polygon
Some images need to include more general kinds of polygons. While these can be created by
constructing them manually as collections of triangles and/or quads, it might be easier to define and
display a single polygon. A graphics API will allow you to define and display a single polygon by
specifying its vertices, and the vertices in the vertex list are taken as the vertices of the polygon in
sequence order. As we will note in the chapter on mathematical fundamentals, many APIs can
only handle convex polygons—polygons for which any two points in the polygon also have the
entire line segment between them in the polygon. We refer you to that later discussion for more
details, but we include Figure 2.5 below to illustrate the difference.
1 2 1 2
3 3
5 5
4 4
An interesting property of convex polygons is that if you take two adjacent vertices and then write
the remaining vertices in order as they occur around the polygon, you have exactly the same vertex
sequence as if you were defining a triangle fan. Thus just as was the case for quad strips and
triangle strips, you can see a way to implement a convex polygon as a triangle fan.
Polyhedron
In Figure 2.1 we saw that a polyhedron is one of the basic objects we use in our modeling,
especially when we will focus almost exclusively on 3D computer graphics. We specify a
polyhedron by specifying all the polygons that make up its boundary. In general, most graphics
APIs leave the specification of polyhedrons up to the user, which can make them fairly difficult
objects to define as you are learning the subject. With experience, however, you will develop a set
of polyhedra that you’re familiar with and can use them with comfort.
When you create a point, line, or polygon in your image, the system will define the pixels on the
screen that represent the geometry within the discrete integer-coordinate 2D screen space. The
standard way of selecting pixels is all-or-none: a pixel is computed to be either in the geometry, in
which case it is colored as the geometry specifies, or not in the geometry, in which case it is left in
whatever color it already was. Because of the relatively coarse nature of screen space, this all-or-
nothing approach can leave a great deal to be desired because it created jagged edges along the
space between geometry and background. This appearance is called aliasing, and it is shown in the
left-hand image of Figure 2.6.
There are a number of techniques to reduce the effects of aliasing, and collectively the techniques
are called antialiasing. They all work by recognizing that the boundary of a true geometry can go
through individual pixels in a way that only partially covers a pixel. Each technique finds a way to
account for this varying coverage and then lights the pixel according to the amount of coverage of
the pixel with the geometry. Because the background may vary, this variable lighting is often
managed by controlling the blending value for the pixel’s color, using the color (R, G, B, A)
where (R, G, B) is the geometry color and A is the proportion of the pixel covered by the object’s
geometry. An image that uses antialiasing is shown in the right-hand image of Figure 2.6. For
more detail on color blending, see the later chapter on color.
As we said, there are many ways to determine the coverage of a pixel by the geometry. One way
that is often used for very high-quality images is to supersample the pixel, that is, to assume a
much higher image resolution than is really present and to see how many of these “subpixels” lie in
the geometry. The proportion of subpixels that would be covered will serve as the proportion for
the antialiasing value. However, supersampling is not an ordinary function of a graphics API, so
we would expect a simpler approach to be used. Because APIs use linear geometries—all the basic
geometry is polygon-based—it is possible to calculate exactly how the 2D world space line
intersects each pixel and then how much of the pixel is covered. This is a more standard kind of
API computation, though the details will certainly vary between APIs and even between different
implementations of an API. You may want to look at your API’s manuals for more details.
When you define the geometry of an object, you may also want or need to define the direction the
object faces as well as the coordinate values for the point. This is done by defining a normal for
the object. Normals are often fairly easy to obtain. In the appendix to this chapter you will see
ways to calculate normals for plane polygons fairly easily; for many of the kinds of objects that are
available with a graphics API, normals are built into the object definition; and if an object is defined
by mathematical formulas, you can often get normals by doing some straightforward calculations.
The sphere described above is a good example of getting normals by calculation. For a sphere, the
normal to the sphere at a given point is the radius vector at that point. For a unit sphere with center
at the origin, the radius vector to a point has the same components as the coordinates of the point.
So if you know the coordinates of the point, you know the normal at that point.
To add the normal information to the modeling definition, then, you can simply use functions that
set the normal for a geometric primitive, as you would expect to have from your graphics API, and
you would get code that looks something like the following excerpt from the example above:
for j = -90+180/M to 90-180/M // latitude without sphere caps
doQuadStrip()
// one quad strip per band around the sphere at any latitude
for i = 0 to 360 // longitude
set normal to (1, i, j)
set vertex at (1, i, j )
set vertex at (1, i, j+180/M)
set vertex at (1, i+360/N, j )
set vertex at (1, i+360/N, j+180/M)
endQuadStrip()
When you define a polyhedron for your graphics work, as we discussed above, there are many
ways you can hold the information that describes a polyhedral graphics object. One of the simplest
is the triangle list—an array of triples, with each set of three triples representing a separate triangle.
Drawing the object is then a simple matter of reading three triples from the list and drawing the
triangle. A good example of this kind of list is the STL graphics file format discussed in the
chapter below on graphics hardcopy and whose formal specifications are in the Appendix.
A more effective, though a bit more complex, approach is to create three lists. The first is a vertex
list, and it is simply an array of triples that contains all the vertices that would appear in the object.
If the object is a polygon or contains polygons, the second list is an edge list that contains an entry
As an example, let’s consider the classic cube, centered at the origin and with each side of length
two. For the cube let’s define the vertex array, edge array, and face array that define the cube, and
let’s outline how we could organize the actual drawing of the cube. We will return to this example
later in this chapter and from time to time as we discuss other examples throughout the notes.
We begin by defining the data and data types for the cube. The vertices are points, which are
arrays of three points, while the edges are pairs of indices of points in the point list and the faces
are quadruples of indices of faces in the face list. The normals are vectors, one per face, but these
are also given as arrays of three points. In C, these would be given as follows:
typedef float point3[3];
typedef int edge[2];
typedef int face[4]; // each face of a cube has four edges
edge edges[24] = {{ 0, 1 }, { 1, 3 }, { 3, 2 }, { 2, 0 },
{ 0, 4 }, { 1, 5 }, { 3, 7 }, { 2, 6 },
{ 4, 5 }, { 5, 7 }, { 7, 6 }, { 6, 4 },
{ 1, 0 }, { 3, 1 }, { 2, 3 }, { 0, 2 },
{ 4, 0 }, { 5, 1 }, { 7, 3 }, { 6, 2 },
{ 5, 4 }, { 7, 5 }, { 6, 7 }, { 4, 6 }};
Notice that in our edge list, each edge is actually listed twice—once for each direction the in which
the edge can be drawn. We need this distinction to allow us to be sure our faces are oriented
properly, as we will describe in the discussion on lighting and shading in later chapters. For now,
we simply ensure that each face is drawn with edges in a counterclockwise direction as seen from
outside that face of the cube. Drawing the cube, then, proceeds by working our way through the
face list and determining the actual points that make up the cube so they may be sent to the generic
(and fictitious) setVertex(...) and setNormal(...) functions. In a real application we
We added a simple structure for a list of normals, with one normal per face, which echoes the
structure of the faces. This supports what is often called flat shading, or shading where each face
has a single color. In many applications, though, you might want to have smooth shading, where
colors blend smoothly across each face of your polygon. For this, each vertex needs to have its
individual normal representing the perpendicular to the object at that vertex. In this case, you often
need to specify the normal each time you specify a vertex, and a normal list that follows the vertex
list would allow you to do that easily. For the code above, for example, we would not have a per-
face normal but instead each setVertex operation could be replaced by the pair of operations
setNormal(normals[edges[cube[face][0]][0]);
setVertex(vertices[edges[cube[face][0]][0]);
Neither the simple triangle list nor the more complex structure of vertex, normal, edge, and face
lists takes into account the very significant savings in memory you can get by using geometry
compression techniques. There are a number of these techniques, but we only talked about line
strips, triangle strips, triangle fans, and quad strips above because these are more often supported
by a graphics API. Geometry compression approaches not only save space, but are also more
effective for the graphics system as well because they allow the system to retain some of the
information it generates in rendering one triangle or quad when it goes to generate the next one.
Interesting and complex graphic objects can be difficult to create, because it can take a lot of work
to measure or calculate the detailed coordinates of each vertex needed. There are more automatic
techniques being developed, including 3D scanning techniques and detailed laser rangefinding to
measure careful distances and angles to points on an object that is being measured, but they are out
of the reach of most college classrooms. So what do we do to get interesting objects? There are
four approaches.
The first way to get models is to buy them: to go is to the commercial providers of 3D models.
There is a serious market for some kinds of models, such as medical models of human structures,
from the medical and legal worlds. This can be expensive, but it avoids having to develop the
expertise to do professional modeling and then putting in the time to create the actual models. If
you are interested, an excellent source is viewpoint.com; they can be found on the Web.
A second way to get models is to find them in places where people make them available to the
public. If you have friends in some area of graphics, you can ask them about any models they
know of. If you are interested in molecular models, the protein data bank (with URL
http://www.pdb.bnl.gov) has a wide range of structure models available at no charge. If
A third way to get models is to digitize them yourself with appropriate kinds of digitizing devices.
There are a number of these available with their accuracy often depending on their cost, so if you
need to digitize some physical objects you can compare the cost and accuracy of a number of
possible kinds of equipment. The digitizing equipment will probably come with tools that capture
the points and store the geometry in a standard format, which may or may not be easy to use for
your particular graphics API. If it happens to be one that your API does not support, you may
need to convert that format to one you use or to find a tool that does that conversion.
A fourth way to get models is to create them yourself. There are a number of tools that support
high-quality interactive 3D modeling, and it is perfectly reasonable to create your models with such
tools. This has the same issue as digitizing models in terms of the format of the file that the tools
produce, but a good tool should be able to save the models in several formats, one of which you
should be able to use fairly easily with your graphics API. It is also possible to create interesting
models analytically, using mathematical approaches to generate the vertices. This is perhaps
slower than getting them from other sources, but you have final control over the form and quality
of the model, so perhaps it might be worth the effort. This will be discussed in the chapter on
interpolation and spline modeling, for example.
If you get models from various sources, you will probably find that they come in a number of
different kinds of data format. There are a large number of widely used formats for storing
graphics information, and it sometimes seems as though every graphics tool uses a file format of
its own. Some available tools will open models with many formats and allow you to save them in
a different format, essentially serving as format converters as well as modeling tools. In any case,
you are likely to end up needing to understand some model file formats and writing your own tools
to read these formats and produce the kind of internal data that you need for your models, and it
may take some work to write filters that will read these formats into the kind of data structures you
want for your program. Perhaps things that are “free” might cost more than things you buy if you
can save the work of the conversion, but that’s up to you to decide. An excellent resource on file
formats is the Encyclopedia of Graphics File Formats, published by O’Reilly Associates, and we
refer you to that book for details on particular formats.
As we said above, modeling can be the most time-consuming part of creating an image, but you
simply aren’t going to create a useful or interesting image unless the modeling is done carefully and
well. If you are concerned about the programming part of the modeling for your image, it might be
best to create a simple version of your model and get the programming (or other parts that we
haven’t talked about yet) done for that simple version. Once you are satisfied that the
programming works and that you have gotten the other parts right, you can replace the simple
model—the one with just a few polygons in it—with the one that represents what you really want
to present.
Introduction
Transformations are probably the key point in creating significant images in any graphics system.
It is extremely difficult to model everything in a scene in the place where it is to be placed, and it is
even worse if you want to move things around in real time through animation and user control.
Transformations let you define each object in a scene in any space that makes sense for that object,
and then place it in the world space of a scene as the scene is actually viewed. Transformations can
also allow you to place your eyepoint and move it around in the scene.
Among the modeling transformations, there are three fundamental kinds: rotations, translations,
and scaling. These all maintain the basic geometry of any object to which they may be applied, and
are fundamental tools to build more general models than you can create with only simple modeling
techniques. Later in this chapter we will describe the relationship between objects in a scene and
how you can build and maintain these relationships in your programs.
The real power of modeling transformation, though, does not come from using these simple
transformations on their own. It comes from combining them to achieve complete control over
your modeled objects. The individual simple transformations are combined into a composite
modeling transformation that is applied to your geometry at any point where the geometry is
specified. The modeling transformation can be saved at any state and later restored to that state to
allow you to build up transformations that locate groups of objects consistently. As we go through
the chapter we will see several examples of modeling through composite transformations.
Finally, the use of simple modeling and transformations together allows you to generate more
complex graphical objects, but these objects can take significant time to display. You may want to
store these objects in pre-compiled display lists that can execute much more quickly.
Definitions
In this section we outline the concept of a geometric transformation and describe the fundamental
transformations used in computer graphics, and describe how these can be used to build very
general graphical object models for your scenes.
Transformations
A transformation is a function that takes geometry and produces new geometry. The geometry can
be anything a computer graphics systems works with—a projection, a view, a light, a direction, or
Our vehicle for looking at transformations will be the creation and movement of a rugby ball. This
ball is basically an ellipsoid (an object that is formed by rotating an ellipse around its major axis),
so it is easy to create from a sphere using scaling. Because the ellipsoid is different along one axis
from its shape on the other axes, it will also be easy to see its rotations, and of course it will be
easy to see it move around with translations. So we will first discuss scaling and show how it is
used to create the ball, then discuss rotation and show how the ball can be rotated around one of its
short axes, then discuss translations and show how the ball can be moved to any location we wish,
and finally will show how the transformations can work together to create a rotating, moving ball
like we might see if the ball were kicked. The ball is shown with some simple lighting and shading
as described in the chapters below on these topics.
Figure 2.8: a sphere a scaled by 2.0 in the y-direction to make a rugby ball (left)
and the same sphere is shown unscaled (right)
Scaling changes the entire coordinate system in space by multiplying each of the coordinates of
each point by a fixed value. Each time it is applied, this changes each dimension of everything in
the space. A scaling transformation requires three values, each of which controls the amount by
which one of the three coordinates is changed, and a graphics API function to apply a scaling
transformation will take three real values as its parameters. Thus if we have a point (x, y, z) and
specify the three scaling values as Sx, Sy, and Sz, then applying the scaling transformation
changes the point to (x ∗ Sx,y ∗ Sy,z • Sz) . If we take a simple sphere that is centered at the origin
and scale it by 2.0 in one direction (in our case, the y-coordinate or vertical direction), we get the
rugby ball that is shown in Figure 2.8 next to the original sphere. It is important to note that this
scaling operates on everything in the space, so if we happen to also have a unit sphere at position
farther out along the axis, scaling will move the sphere farther away from the origin and will also
multiply each of its coordinates by the scaling amount, possibly distorting its shape. This shows
that it is most useful to apply scaling to an object defined at the origin so only the dimensions of the
object will be changed.
Translation takes everything in your space and changes each point’s coordinates by adding a fixed
value to each coordinate. The effect is to move everything that is defined in the space by the same
amount. To define a translation transformation, you need to specify the three values that are to be
added to the three coordinates of each point. A graphics API function to apply a translation, then,
will take these three values as its parameters. A translation shows a very consistent treatment of
everything in the space, so a translation is usually applied after any scaling or rotation in order to
take an object with the right size and right orientation and place it correctly in space.
Figure 2.9: a sequence of images of the rugby ball as transformations move it through space
Finally, we put these three kinds of transformations together to create a sequence of images of the
rugby ball as it moves through space, rotating as it goes, shown in Figure 2.9. This sequence was
created by first defining the rugby ball with a scaling transformation and a translation putting it on
the ground appropriately, creating a composite transformation as discussed in the next section.
Then rotation and translation values were computed for several times in the flight of the ball,
allowing us to rotate the ball by slowly-increasing amounts and placing it as if it were in a standard
gravity field. Each separate image was created with a set of transformations that can be generically
described by
translate( Tx, Ty, Tz )
rotate( angle, x-axis )
scale( 1., 2., 1. )
drawBall()
Transformations are mathematical operations that map 3D space to 3D space, and so mathematics
has standard ways to represent them. This is discussed in the next chapter, and processes such as
composite transformations are linked to the standard operations on these objects.
Composite transformations
In order to achieve the image you want, you may need to apply more than one simple
transformation to achieve what is called a composite transformation. For example, if you want to
create a rectangular box with height A, width B, and depth C, with center at (C1, C2, C3), and
oriented at an angle α relative to the Z-axis, you could start with a cube one unit on a side and with
center at the origin, and get the box you want by applying the following sequence of operations:
first, scale the cube to the right size to create the rectangular box with dimensions A, B, and C,
second, rotate the cube by the angle α to the right orientation, and
third, translate the cube to the position C1, C2, C3.
This sequence is critical because of the way transformations work in the whole space. For
example, if we rotated first and then scaled with different scale factors in each dimension, we
would introduce distortions in the box. If we translated first and then rotated, the rotation would
move the box to an entirely different place. Because the order is very important, we find that there
are certain sequences of operations that give predictable, workable results, and the order above is
the one that works best: apply scaling first, apply rotation second, and apply translation last.
The order of transformations is important in ways that go well beyond the translation and rotation
example above. In general, transformations are an example of noncommutative operations,
operations for which f*g ≠ g*f (that is, f(g(x)) ≠ g(f(x)) ). Unless you have some experience with
noncommutative operations from a course such as linear algebra, this may be a new idea. But let’s
look at the operations we described above: if we take the point (1, 1, 0) and apply a rotation by 90°
around the Z-axis, we get the point (-1, 1, 0). If we then apply a translation by (2, 0, 0) we get
the point (1, 1, 0) again. However, if we start with (1, 1, 0) and first apply the translation, we get
(3, 1, 0) and if then apply the rotation, we get the point (-1, 3, 0) which is certainly not the same as
(1, 1, 0). That is, using some pseudocode for rotations, translations, and point setting, the two
code sequences
rotate(90, 0, 0, 1) translate(2, 0, 0)
translate (2, 0, 0) rotate(90, 0, 0, 1)
setPoint(1, 1, 0) setPoint(1, 1, 0)
produce very different results; that is, the rotate and translate operations are not commutative.
This behavior is not limited to different kinds of transformations. Different sequences of rotations
can result in different images as well. Again, if you consider the sequence of rotations in two
different orders
rotate(60, 0, 0, 1) rotate(90, 0, 1, 0)
rotate(90, 0, 1, 0) rotate(60, 0, 0, 1)
scale(3, 1, .5) scale(3, 1, .5)
cube() cube()
Figure 2.10: the results from two different orderings of the same rotations
Transformations are implemented as matrices for computational purposes. Recall that we are able
to represent points as 4-tuples of real numbers; transformations are implemented as 4x4 matrices
that map the space of 4-tuples into itself. Although we will not explicitly use this representation in
our work, it is used by graphics APIs and helps explain how transformations work; for example,
you can understand why transformations are not commutative by understanding that matrix
multiplication is not commutative. (Try it out for yourself!) And if we realize that a 4x4 matrix is
equivalent to an array of 16 real numbers, we can think of transformation stacks as stacks of such
matrices. While this book does not require matrix operations for transformations, there may be
times when you’ll need to manipulate transformations in ways that go beyond your API, so be
aware of this.
When it comes time to apply transformations to your models, we need to think about how we
represent the problem for computational purposes. Mathematical notation can be applied in many
ways, so your previous mathematical experience may or may not help you very much in deciding
how you can think about this problem. In order to have a good model for thinking about complex
transformation sequences, we will define the sequence of transformations as last-specified, first-
applied, or in another way of thinking about it, we want to apply our functions so that the function
nearest to the geometry is applied first. We can also think about this is in terms of building
composite functions by multiplying the individual functions, and with the convention above we
want to compose each new function by multiplying it on the right of the previous functions. So the
standard operation sequence we see above would be achieved by the following algebraic sequence
of operations:
translate * rotate * scale * geometry
or, thinking of multiplication as function composition, as
translate(rotate(scale(geometry)))
This might be implemented by a sequence of function calls like that below that is not intended to
represent any particular API:
translate(C1,C2,C3); // translate to the desired point
rotate(A, Z); // rotate by A around the Z-axis
scale(A, B, C); // scale by the desired amounts
cube(); // define the geometry of the cube
At first glance, this sequence looks to be exactly the opposite of the sequence noted above. In fact,
however, we readily see that the scaling operation is the function closest to the geometry (which is
expressed in the function cube()) because of the last-specified, first-applied nature of
transformations. In Figure 2.11 we see the sequence of operations as we proceed from the plain
cube (at the left), to the scaled cube next, then to the scaled and rotated cube, and finally to the cube
that uses all the transformations (at the right). The application is to create a long, thin, rectangular
bar that is oriented at a 45° angle upwards and lies above the definition plane.
In general, the overall sequence of transformations that are applied to a model by considering the
total sequence of transformations in the order in which they are specified, as well as the geometry
on which they work:
In defining a scene, we often want to define some standard pieces and then assemble them in
standard ways, and then use the combined pieces to create additional parts, and go on to use these
parts in additional ways. To do this, we need to create individual parts through functions that do
not pay any attention to ways the parts will be used later, and then be able to assemble them into a
whole. Eventually, we can see that the entire image will be a single whole that is composed of its
various parts.
The key issue is that there is some kind of transformation in place when you start to define the
object. When we begin to put the simple parts of a composite object in place, we will use some
transformations but we need to undo the effect of those transformations when we put the next part
in place. In effect, we need to save the state of the transformations when we begin to place a new
part, and then to return to that transformation state (discarding any transformations we may have
added past that mark) to begin to place the next part. Note that we are always adding and
discarding at the end of the list; this tells us that this operation has the computational properties of a
stack. We may define a stack of transformations and use it to manage this process as follows:
• as transformations are defined, they are multiplied into the current transformation in the order
noted in the discussion of composite transformations above, and
Compiling geometry
Creating a model and transforming it into world space can take a good deal of work. You may
need to compute vertex coordinates in model space and you will need to apply modeling
transformations to these coordinates to get the final vertex coordinates in world space that are
finally sent to the projection and rendering processes. If the model is used frequently, and if it must
be re-calculated each time it is drawn, it can make a scene quite slow to display. As we will see
later, applying a transformation involves a matrix multiplication that could involve as many as 16
operations for each transformation and each vertex, although in practice many transformations can
be done with many fewer.
As a way to save time in displaying the image, many graphics APIs allow you to “compile” the
geometry in a model in a way that will allow it to be displayed much more quickly. This compiled
geometry is basically what is sent to the rendering pipeline as the display list, as described in
Chapter 8 below. When the compiled model is displayed, no re-calculation of vertices and no
computation of transformations are needed, and only the saved results of these computations is
sent to the graphics system. Geometry that is to be compiled should be carefully chosen so that it
is not changed between displays; if changes are needed, you will need to re-compile the object.
Once you have seen what parts you can compile, you can compile them and use the compiled
versions to make the display faster. We will discuss how OpenGL compiles geometry in the next
chapter. If you use another graphics API, look for details in its documentation.
An example
To help us see how you can make useful graphical objects from simple modeling, let’s consider a
3D arrow that we could use to point out things in a 3D scene. Our goal is to make an arrow such
as the one in Figure 2.12, with the arrow oriented downward and aligned with the Y -axis. An
In order to make this arrow, we start with two simpler shapes that are themselves useful. (These
are sufficiently useful that they are provided as built-in functions in the GLU and GLUT toolkits
we describe in the next chapter.) These simpler shapes are designed to be in standard positions
and have standard sizes.
The first of these simple shapes is a cylinder. We will design this as a template that can be made to
have any size and any orientation by using simple transformations. Our template orientation will
be to have the centerline of the cylinder parallel as X -axis and our template size will be to have the
cylinder have radius 1 and length 1. We will design the cylinder to have the cross-section of a
regular polygon with NSIDES sides. This will look strange, but will be easy to scale. The
template is shown in the left-hand side of Figure 2.13, and a sketch of the code for the cylinder
template function cylinder() will look like:
angle = 0.;
anglestep = 360./(float)NSIDES;
for (i = 0; i < NSIDES; i++) {
nextangle = angle + anglestep;
beginQuad();
vertex(0., cos(angle), sin(angle));
vertex(1., cos(angle), sin(angle));
vertex(1., cos(nextangle), sin(nextangle));
vertex(0., cos(nextangle), sin(nextangle));
endQuad();
angle = nextangle;
}
Figure 2.13: the templates of the parts of the arrow: the cylinder (left) and the cone (right)
The second simple shape is a cone whose centerline is the Y -axis with a base of radius 1 and a
height of 1. As with the cylinder, this template will be easy to scale and orient as needed for
various uses. We will again use a regular polygon of NSIDES sides for the base of the cone. The
template shape is shown in the right-hand side of Figure 2.13, and a sketch of the code for the
cone template cone() will look like:
angle = 0.;
anglestep = 360./(float)NSIDES;
beginTriangleFan();
vertex(0., 1., 0.);
With both template shapes implemented, we can then build a template arrow function arrow3D()
as sketched below. We will use a cylinder twice as long and with half the radius of the original
cylinder template, oriented along the Z-axis, and the cone in its original form, to form the shape of
an arrow. We will then move that shape so its point is at the origin, and will rotate the shape to lie
along the Z-axis as defined.
As we noted above, you must take a great deal of care with transformation order. It can be very
difficult to look at an image that has been created with mis-ordered transformations and understand
just how that erroneous example happened. In fact, there is a skill in what we might call “visual
debugging”—looking at an image and seeing that it is not correct, and figuring out what errors
might have caused the image as it is seen. It is important that anyone working with images become
skilled in this kind of debugging. However, obviously you cannot tell than an image is wrong
unless you know what a correct image should be, so you must know in general what you should
be seeing. As an obvious example, if you are doing scientific images, you must know the science
well enough to know when an image makes no sense.
In this chapter, we have defined modeling as the process of defining and organizing a set of
geometry that represents a particular scene. While modern graphics APIs can provide you with a
great deal of assistance in rendering your images, modeling is usually supported less well and
programmers may find considerable difficulty with modeling when they begin to work in computer
graphics. Organizing a scene with transformations, particularly when that scene involves
hierarchies of components and when some of those components are moving, involves relatively
complex concepts that need to be organized very systematically to create a successful scene. This
is even more difficult when the eye point is one of the moving or hierarchically-organized parts.
Hierarchical modeling has long been done by using trees or tree-like structures to organize the
components of the model, and we will find this kind of approach to be very useful.
Recent graphics systems, such as Java3D and VRML 2, have formalized the concept of a scene
graph as a powerful tool both for modeling scenes and for organizing the drawing process for
those scenes. By understanding and adapting the structure of the scene graph, we can organize a
careful and formal tree approach to both the design and the implementation of hierarchical models.
This can give us tools to manage not only modeling the geometry of such models, but also
animation and interactive control of these models and their components. In this section we will
introduce the scene graph structure and will adapt it to a slightly simplified modeling graph that you
can use to design scenes. We will also identify how this modeling graph gives us the three key
transformations that go into creating a scene: the projection transformation, the viewing
transformation, and the modeling transformation(s) for the scene’s content. This structure is very
general and lets us manage all the fundamental principles in defining a scene and translating it into a
graphics API. Our terminology is based on with the scene graph of Java3D and should help
anyone who uses that system understand the way scene graphs work there.
The fully-developed scene graph of the Java3D API has many different aspects and can be complex
to understand fully, but we can abstract it somewhat to get an excellent model to help us think
about scenes that we can use in developing the code to implement our modeling. A brief outline of
the Java3D scene graph in Figure 2.14 will give us a basis to discuss the general approach to
graph-structured modeling as it can be applied to beginning computer graphics. Remember that we
will be simplifying some aspects of this graph before applying it to our modeling.
A virtual universe holds one or more (usually one) locales, which are essentially positions in the
universe to put scene graphs. Each scene graph has two kinds of branches: content branches,
which are to contain shapes, lights, and other content, and view branches, which are to contain
viewing information. This division is somewhat flexible, but we will use this standard approach to
build a framework to support our modeling work.
The content branch of the scene graph is organized as a collection of nodes that contains group
nodes, transform groups, and shape nodes, as seen in the left-hand branch of Figure 2.14. A
group node is a grouping structure that can have any number of children; besides simply
organizing its children, a group can include a switch that selects which children to present in a
scene. A transform group is a collection of modeling transformations that affect all the geometry
that lies below it. The transformations will be applied to any of the transform group’s children
with the convention that transforms “closer” to the geometry (geometry that is defined in shape
nodes lower in the graph) are applied first. A shape node includes both geometry and appearance
data for an individual graphic unit. The geometry data includes standard 3D coordinates, normals,
Virtual Universe
Locale
Content View
Branch Branch
Group node
Transform Group
Shape node
Transform Group
Group node
View
Shape nodes
The view branch of the scene graph includes the specification of the display device, and thus the
projection appropriate for that device, as shown in the right-hand branch of Figure 2.14. It also
specifies the user’s position and orientation in the scene and includes a wide range of abstractions
of the different kinds of viewing devices that can be used by the viewer. It is intended to permit
viewing the same scene on any kind of display device, including sophisticated virtual reality
devices. This is a much more sophisticated approach than we need for our relatively simple
modeling. We will simply consider the eye point as part of the geometry of the scene, so we set
the view by including the eye point in the content branch and get the transformation information for
the eye point in order to create the view transformations in the view branch.
In addition to the modeling aspect of the scene graph, Java3D also uses it to organize the
processing as the scene is rendered. Because the scene graph is processed from the bottom up, the
content branch is processed first, followed by the viewing transformation and then the projection
transformation. However, the system does not guarantee any particular sequence in processing the
node’s branches, so it can optimize processing by selecting a processing order for efficiency, or
can distribute the computations over a networked or multiprocessor system. Thus the Java3D
programmer must be careful to make no assumptions about the state of the system when any shape
node is processed. We will not ask the system to process the scene graph itself, however, because
we will only use the scene graph to develop our modeling code.
We will develop a scene graph to design the modeling for an example scene to show how this
process can work. To begin, we present an already-completed scene so we can analyze how it can
be created, and we will take that analysis and show how the scene graph can give us other ways to
present the scene. Consider the scene as shown in Figure 2.14, where a helicopter is flying above
a landscape and the scene is viewed from a fixed eye point. (The helicopter is the small green
object toward the top of the scene, about 3/4 of the way across the scene toward the right.) This
scene contains two principal objects: a helicopter and a ground plane. The helicopter is made up
of a body and two rotors, and the ground plane is modeled as a single set of geometry with a
texture map. There is some hierarchy to the scene because the helicopter is made up of smaller
components, and the scene graph can help us identify this hierarchy so we can work with it in
rendering the scene. In addition, the scene contains a light and an eye point, both at fixed
locations. The first task in modeling such a scene is now complete: to identify all the parts of the
scene, organize the parts into a hierarchical set of objects, and put this set of objects into a viewing
context. We must next identify the relationship among the parts of the landscape way so we may
create the tree that represents the scene. Here we note the relationship among the ground and the
parts of the helicopter. Finally, we must put this information into a graph form.
The initial analysis of the scene in Figure 2.15, organized along the lines of view and content
branches, leads to an initial (and partial) graph structure shown in Figure 2.16. The content branch
Scene
content branch
view branch
projection
helicopter
ground
view
rotor
Figure 2.16: a scene graph that organizes the modeling of our simple scene
This initial structure is compatible with the simple OpenGL viewing approach we discussed in the
previous chapter and the modeling approach earlier in this chapter, where the view is implemented
by using built-in function that sets the viewpoint, and the modeling is built from relatively simple
primitives. This approach only takes us so far, however, because it does not integrate the eye into
the scene graph. It can be difficult to compute the parameters of the viewing function if the eye
point is embedded in the scene and moves with the other content, and later we will address that part
of the question of rendering the scene.
While we may have started to define the scene graph, we are not nearly finished. The initial scene
graph of Figure 2.16 is incomplete because it merely includes the parts of the scene and describes
which parts are associated with what other parts. To expand this first approximation to a more
complete graph, we must add several things to the graph:
• the transformation information that describes the relationship among items in a group node, to
be applied separately on each branch as indicated,
• the appearance information for each shape node, indicated by the shaded portion of those
nodes,
• the light and eye position, either absolute (as used in Figure 2.15 and shown below in Figure
2.17) or relative to other components of the model (as described later in the chapter), and
• the specification of the projection and view in the view branch.
These are all included in the expanded version of the scene graph with transformations,
appearance, eyepoint, and light shown in Figure 2.17.
The content branch of this graph handles all the scene modeling and is very much like the content
branch of the scene graph. It includes all the geometry nodes of the graph in Figure 2.16 as well
as appearance information; includes explicit transformation nodes to place the geometry into correct
sizes, positions, and orientations; includes group nodes to assemble content into logical groupings;
Scene
helicopter
eye light eye placement
transforms transforms
ground
view
rotor geometry
Figure 2.17: the more complete graph including transformations and appearance
The view branch of this graph is similar to the view branch of the scene graph but is treated much
more simply, containing only projection and view components. The projection component
includes the definition of the projection (orthogonal or perspective) for the scene and the definition
of the window and viewport for the viewing. The view component includes the information
needed to create the viewing transformation, and because the eye point is placed in the content
branch, this is simply a copy of the set of transformations that position the eye point in the scene as
represented in the content branch.
The appearance part of the shape node is built from color, lighting, shading, texture mapping, and
several other kinds of operations. Eventually each vertex of the geometry will have not only
geometry, in terms of its coordinates, but also normal components, texture coordinates, and
several other properties. Here, however, we are primarily concerned with the geometry content of
the shape node; much of the rest of these notes is devoted to building the appearance properties of
the shape node, because the appearance content is perhaps the most important part of graphics for
building high-quality images.
Scene
content branch
default
view
inverse of eye
placement
transforms
transforms transforms
helicopter
light
transforms
ground
rotor geometry
Figure 2.18: the scene graph after integrating the viewing transformation into the content branch
When you have a well-defined set of transformation that place the eye point in a scene, we saw in
the earlier chapter on viewing how you can take advantage of that information to organize the scene
graph in a way that can define the viewing transformation explicitly and simply use the default
view for the scene. As we noted there, the real effect of the viewing transformation is to be the
The scene graph for a particular image is not unique because there are many ways to organize a
scene and many ways to organize the way you carry out the graphic operations that the scene graph
specifies. Once you have written a first scene graph for a scene, you may want to think some more
about the scene to see whether there is another way to organize the scene graph to create a more
efficient program from the scene graph or to make the scene graph present a more clear description
of the scene. Remember that the scene graph is a design tool, and there are always many ways to
create a design for any problem.
It is very important to note that the scene graph need not describe a static geometry. The
transformations in the scene graph may be defined with parameters instead of constant values, and
event callbacks can affect the graph by controlling these parameters through user interaction or
through computed values. This is discussed in the re-write guidelines in the next section. This can
permit a single graph to describe an animated scene or even alternate views of the scene. The
graph may thus be seen as having some components with external controllers, and the controllers
are the event callback functions.
We need to extract information on the three key kinds of transformations from this graph in order
to create the code that implements our modeling work. The projection transformation is
straightforward and is built from the projection information in the view branch, and this is easily
managed from tools in the graphics API. Because this is so straightforward, we really do not need
to include it in our graph. The viewing transformation is readily created from the transformation
information in the view by analyzing the eye placement transformations as we saw above, so it is
straightforward to extract this and, more important, to create this transformation from the inversed
of the eyepoint transformations. This is discussed in the next section of the chapter. Finally, the
modeling transformations for the various components are built by working with the various
transformations in the content branch as the components are drawn, and are discussed later in this
chapter.
Because all the information we need for both the primitive geometry and all the transformations is
held in this simple graph, we will call it the modeling graph for our scene. This modeling graph,
basically a scene graph without a view branch but with the viewing information organized at the
top as the inverse of the eyepoint placement transformations, will be the basis for the coding of our
scenes as we describe in the remainder of the chapter.
In a scene graph with no view specified, we assume that the default view puts the eye at the origin
looking in the negative z-direction with the y-axis upward. If we use a set of transformations to
position the eye differently, then the viewing transformation is built by inverting those
transformations to restore the eye to the default position. This inversion takes the sequence of
transformations that positioned the eye and inverts the primitive transformations in reverse order,
u u u u
so if T1 T2 T3 ...TK is the original transformation sequence, the inverse is TK ...T3 T2 T1 where the
superscript u indicates inversion, or “undo” as we might think of it.
Each of the primitive scaling, rotation, and translation transformations is easily inverted. For the
scaling transformation scale(Sx, Sy, Sz), we note that the three scale factors are used to
multiply the values of the three coordinates when this is applied. So to invert this transformation,
For the rotation transformation rotate(angle, line) that rotates space by the value angle
around the fixed line line, we must simply rotate the space by the same angle in the reverse
direction. Thus the inverse of the rotation transformation is rotate(-angle, line).
For the translation transformation translate(Tx, Ty, Tz) that adds the three translation
values to the three coordinates of any point, we must simply subtract those same three translation
values when we invert the transformation. Thus the inverse of the translation transformation is
translate(-Tx, -Ty, -Tz).
Putting this together with the information on the order of operations for the inverse of a composite
transformation above, we can see that, for example, the inverse of the set of operations (written as
if they were in your code)
translate(Tx, Ty, Tz)
rotate(angle, line)
scale(Sx, Sy, Sz)
is the set of operations
scale(1/Sx, 1/Sy, 1/Sz)
rotate(-angle, line)
translate(-Tx, -Ty, -Tz)
Now let us apply this process to the viewing transformation. Deriving the eye transformations
from the tree is straightforward. Because we suggest that the eye be considered one of the content
components of the scene, we can place the eye at any position relative to other components of the
scene. When we do so, we can follow the path from the root of the content branch to the eye to
obtain the sequence of transformations that lead to the eye point. That sequence of transformations
is the eye transformation that we may record in the view branch.
Figure 2.19: the same scene as in Figure 2.15 but with the eye point following directly behind the
helicopter
In Figure 2.19 we show the change that results in the view of Figure 2.15 when we define the eye
to be immediately behind the helicopter, and in Figure 2.20 we show the change in the scene graph
This change in the position of the eye means that the set of transformations that lead to the eye
point in the view branch must be changed, but the mechanism of writing the inverse of these
transformations before beginning to write the definition of the scene graph still applies; only the
actual transformations to be inverted will change. You might, for example, have a menu switch
that specified that the eye was to be at a fixed point or at a point following the helicopter; then the
code for inverting the eye position would be a switch statement that implemented the appropriate
transformations depending on the menu choice. This is how the scene graph will help you to
organize the viewing process that was described in the earlier chapter on viewing.
helicopter
transforms
eye
body top rotor back rotor
rotor geometry
Figure 2.20: the change in the scene graph of Figure 2.11 to implement the view in Figure 2.15
With this scene graph, we can identify the set of transformations Ta Tb Tc Td KTiTj Tk that are applied
to put the helicopter in the scene, and the transformations Tu Tv KTz that place the eye point relative
to the helicopter. The implementation of the structure of Figure 2.18, then, is to begin the display
−1 −1 −1 −1 −1 −1 −1 −1 −1 −1
code with the standard view, followed by Tz KTv Tu and then Tk T j Ti KTd Tc Tb Ta ,
before you begin to write the code for the standard scene as described in Figure 2.21 below.
The process of placing the eye point can readily be generalized. For example, if you should want
to design a scene with several possible eye points and allow a user to choose among them, you can
design the view branch by creating one view for each eye point and using the set of
transformations leading to each eye point as the transformation for the corresponding view. You
can then invert each of these sets of transformations to create the viewing transformation for each
of the eye points. The choice of eye point will then create a choice of view, and the viewing
transformation for that view can then be chosen to implement the user choice.
Because the viewing transformation is performed before the modeling transformations, we see
from Figure 2.18 that the inverse transformations for the eye must be applied before the content
branch is analyzed and its operations are placed in the code. This means that the display operation
must begin with the inverse of the eye placement transformations, which has the effect of moving
the eye to the top of the content branch and placing the inverse of the eye path at the front of each
set of transformations for each shape node.
In almost all of the images we expect to create, we would use the hidden-surface abilities provided
by our graphics API. As we described in the last chapter, this will probably use some sort of
depth buffer or Z-buffer, and the comparisons of depths for hidden surface resolution is done as
the parts of the scene are drawn.
However, there may be times when you will want to avoid depth testing and take control of the
sequence of drawing your scene components. One such time is described later in the chapter on
color and blending, where you need to create a back-to-front drawing sequence in order to simulate
transparency with blending operations. In order to do this, you will need to know the depth of
each of the pieces of your scene, or the distance of that piece from the eye point. This is easy
enough to do if the scene is totally static, but when you allow pieces to move or the eye to move, it
becomes much less simple.
The solution to this problem lies in doing a little extra work as you render your scene. Before you
actually draw anything, but after you have updated whatever transformations you will use and
whatever choices you will make to draw the current version of the scene, apply the same
operations but use a tool called a projection that you will find with most graphics APIs. The
projection operation allows you to calculate the coordinates of any point in your model space when
it is transformed by the viewing and projection transformations into a point in 3D eye space. The
depth of that point, then, is simply the Z-coordinate of the projected value. You can draw the
entire scene, then, using the projection operation instead of the rendering operation, get the depth
values for each piece of the scene, and use the depth values to determine the order in which you
will draw the parts. The scene graph will help you make sure you have the right transformations
when you project each of the parts, ensuring that you have the right depth values.
Because the modeling graph as we defined it above is intended as a learning tool and not a
production tool, we will resist the temptation to formalize its definition beyond the terms we used
there:
• shape node containing two components
- geometry content
- appearance content
• transformation node
• group node
• projection node
• view node
Because we do not want to look at any kind of automatic parsing of the modeling graph to create
the scene, we will merely use the graph to help organize the structure and the relationships in the
model to help you organize your code to implement your simple or hierarchical modeling. This is
quite straightforward and is described in detail below.
Once you know how to organize all the components of the model in the modeling graph, you next
need to write the code to implement the model. This turns out to be straightforward, and you can
use a simple set of re-write guidelines that allow you to rewrite the graph as code. In this set of
rules, we assume that transformations are applied in the reverse of the order they are declared, as
they are in OpenGL, for example. This is consistent with your experience with tree handling in
your programming courses, because you have usually discussed an expression tree which is
parsed in leaf-first order. It is also consistent with the Java3D convention that transforms that are
“ closer” to the geometry (nested more deeply in the scene graph) are applied first.
In the example for Figure 2.19 above, we would use the tree to write code as shown in skeleton
form in Figure 2.21. Most of the details, such as the inversion of the eye placement
transformation, the parameters for the modeling transformations, and the details of the appearance
of individual objects, have been omitted, but we have used indentation to show the pushing and
popping of the modeling transformation stack so we can see the operations between these pairs
easily. This is straightforward to understand and to organize.
display()
set the viewport and projection as needed
initialize modelview matrix to identity
define viewing transformation
invert the transformations that set the eye location
set eye through gluLookAt with default values
define light position // note absolute location
push the transformation stack // ground
translate
rotate
scale
define ground appearance (texture)
draw ground
pop the transformation stack
push the transformation stack // helicopter
translate
rotate
scale
push the transformation stack // top rotor
translate
rotate
scale
define top rotor appearance
draw top rotor
pop the transformation stack
push the transformation stack // back rotor
translate
rotate
scale
define back rotor appearance
draw back rotor
pop the transformation stack
// assume no transformation for the body
define body appearance
draw body
pop the transformation stack
swap buffers
Other variations in this scene could by developed by changing the position of the light from its
current absolute position to a position relative to the ground (by placing the light as a part of the
branch group containing the ground) or to a position relative to the helicopter (by placing the light
as a part of the branch group containing the helicopter). The eye point could similarly be placed
We emphasize that you should include appearance content with each shape node. Many of the
appearance parameters involve a saved state in APIs such as OpenGL and so parameters set for one
shape will be retained unless they are re-set for the new shape. It is possible to design your scene
so that shared appearances will be generated consecutively in order to increase the efficiency of
rendering the scene, but this is a specialized organization that is inconsistent with more advanced
APIs such as Java3D. Thus it is very important to re-set the appearance with each shape to avoid
accidentally retaining an appearance that you do not want for objects presented in later parts of your
scene.
Example
We want to further emphasize the transformation behavior in writing the code for a model from the
modeling graph by considering another small example. Let us consider a very simple rabbit’s head
as shown in Figure 2.22. This would have a large ellipsoidal head, two small spherical eyes, and
two middle-sized ellipsoidal ears. So we will use the ellipsoid (actually a scaled sphere, as we saw
earlier) as our basic part and will put it in various places with various orientations as needed.
The modeling graph for the rabbit’s head is shown in Figure 2.23. This figure includes all the
transformations needed to assemble the various parts (eyes, ears, main part) into a unit. The
fundamental geometry for all these parts is the sphere, as we suggested above. Note that the
transformations for the left and right ears include rotations; these can easily be designed to use a
parameter for the angle of the rotation so that you could make the rabbit’s ears wiggle back and
forth.
Head
Translate
Scale Translate Translate Translate Rotate
Scale Scale Rotate Scale
Scale
The transformation stack we have used informally above is a very important consideration in using
a scene graph structure. It may be provided by your graphics API or it may be something you
need to create yourself; even if it provided by the API, there may be limits on the depth of the stack
that will be inadequate for some projects and you may need to create your own. We will discuss
this in terms of the OpenGL API later in this chapter.
The example of transformation stacks is, in fact, a larger example—an example of using standard
objects to define a larger object. In a program that defined a scene that needed rabbits, we would
create the rabbit head with a function rabbitHead() that has the content of the code we used
Summary
You have seen all the concepts you need for polygon-based modeling, as used in many graphics
APIs. You know how to define an object in model space (that is, in a 3D space that is set up just
for the object) in terms of graphics primitives such as points, line segments, triangles, quads, and
polygons; how to apply the modeling transformations of scaling, translation, and rotation to place
objects into a common world space so that the viewing and projection operations can be applied to
them; and how to organize a hierarchy of objects in a scene with the scene graph so that the code
for the scene can be written easily. You also know how to change transformations so that you can
add motion to a scene. You are now ready to look at how the OpenGL graphics API implements
these concepts so you can begin doing solid graphics programming, and we will take you there in
the next chapter.
Questions
1. We know that we can model any polyhedron with triangles, but why can you model a sphere
with triangle fans for the polar caps and quad strips for the rest of the object?
2. Put yourself in a familiar environment, but imagine the environment simplified so that it is
made up of only boxes, cylinders, and other very basic shapes. Imagine further that your
environment has only one door and that everything in the room has to come in that door. Write
out the sequence of transformations that would have to be done to place everything in its place
in your environment. Now imagine that each of these basic shapes starts out as a standard
shape: a unit cube, a cylinder with diameter one and height one, and the like; write out the
sequence of transformations that would have to be done to make each object from these basic
objects. Finally, if the door would only admit basic objects, put together these two processes
to write out the full transformations to create the objects and place them in the space.
3. Now take the environment above and write a scene graph that describes the whole scene, using
the basic shapes and transformations you identified in the previous question. Also place your
eye in the scene graph starting with a standard view of you standing in the doorway and facing
directly into the room. Now imagine that on a table in the space there is a figure of a ballerina
spinning around and around, and identify the way the transformations in the scene graph
would handle this moving object.
Exercises
4. Calculate the coordinates of the vertices of the simpler regular polyhedra: the cube, the
tetrahedron, and the octagon. For the octagon and tetrahedron, try using spherical coordinates
and converting them to rectangular coordinates; see the chapter on mathematics for modeling.
5. Verify that for any x, y, z, and w, the point (x/w, y/w, z/w, 1) is the intersection of the line
segment from (x, y, z, w) to (0, 0, 0, 0), and the hyperplane { (a, b, c, 1) | arbitrary a, b, c }.
Show that this means that an entire line in 4D space is represented by a single point in
homogeneous coordinates in 3D space.
6. Show how you can define a cube as six quads. Show how you can refine that definition to
write a cube as two quad strips. Can you write a cube as one quad strip?
8. Define a polygon in 2D space that is reasonably large and having a side that is not parallel to
one of the axes. Find a unit square in the 2D space that intersects that side, and calculate the
proportion of the polygon that lies within the unit square. If the square represents a pixel,
draw conclusions about what proportion of the pixel’s color should come from the polygon
and what proportion from the background.
9. The code for the normals to a quad on a sphere as shown in Figure 2.7 is not accurate because
it uses the normal at a vertex instead of the normal in the middle of the quad. How should you
calculate the normal so that it is the face normal and not a vertex normal?
10.Make a basic object with no symmetries, and apply simple rotation, simple translation, and
simple scaling to it; compare the results with the original object. Then apply second and third
transformations after you have applied the first transformation and again see what you get.
Show why the order of the transformations matters by applying the same transformations in
different order and seeing the results.
11.Scene graphs are basically trees, though different branches may share common shape objects.
As trees, they can be traversed in any way that is convenient. Show how you might choose the
way you would traverse a scene graph in order to draw back-to-front if you knew the depth of
each object in the tree.
12.Add a mouth and tongue to the rabbit’s head, and modify the scene graph for the rabbit’s head
to have the rabbit stick out its tongue and wiggle it around.
13.Define a scene graph for a carousel, or merry-go-round. This object has a circular disk as its
floor, a cone as its roof, a collection of posts that connect the floor to the roof, and a collection
of animals in a circle just inside the outside diameter of the floor, each parallel to the tangent to
the floor at the point on the edge nearest the animal. The animals will go up and down in a
periodic way as the carousel goes around. You may assume that each animal is a primitive and
not try to model it, but you should carefully define all the transformations that build the
carousel and place the animals.
Experiments
14.Get some of the models from the avalon.viewpoint.com site and examine the model file
to see how you could present the model as a sequence of triangles or other graphics primitives.
15.Write the code for the scene graph of the familiar space from question 3, including the code that
manages the inverse transformations for the eye point. Now identify a simple path for the eye,
created by parametrizing some of the transformations that place the eye, and create an animation
of the scene as it would be seen from the moving eye point.
As we did in the similar experiments in Chapter 1, once you have returned the modelview matrix
for the simple transformations, you should change appropriate values in the matrix and re-set the
modelview matrix to this modified matrix. You should then re-draw the figure with the modified
matrix and compare the effects of the original and modified matrix to see the graphic effects, not
just the numerical effects.
16.Start with a simple scaling, set for example with the function glScalef(α, β, γ), and
then get the values of the modelview matrix. You should be able to see the scaling values as
the diagonal values in this matrix. Try using different values of the scale factors and first get
and then print out the matrix in good format.
17.Do as above for a rotation, set for example with the function glRotatef(α, x, y, z)
where x, y, and z are set to be able to isolate the rotation by the angle α around individual axes.
For the x–axis, for example, set x = 1 and y = z = 0. Print out the matrix in good format and
identify the components of the matrix that come from the angle a through trigonometric
functions. Hint: use some simple angles such as 30°, 45°, or 60°.
18.Do as above for a translation, set for example with the function glTranslatef(α, β, γ),
and then get the values for the modelview matrix. Identify the translation values as a column of
values in the matrix. Experiment with different translation values and see how the matrix
changes.
19.Now that you have seen the individual modeling matrices, combine them to see how making
composite transformations compares with the resulting matrix. In particular, take two of the
simple transformations you have examined above and compose them, and see if the matrix of
the composite is the product of the two original matrices. Hint: you may have to think about
the order of multiplication of the matrices.
20.We claimed that composing transformations was not commutative, and justified our statement
by noting that matrix multiplication is not commutative. However, you can verify this much
more directly by composing two transformations and getting the resulting matrix, and then
composing the transformations in reverse order and getting the resulting matrix. The two
matrices should not be equal under most circumstances; check this and see. If you happened to
get the matrices equal, check whether your simple transformations might not have been too
simple and if so, make them a bit more complex and try again.
Projects
21.(A scene graph parser) Define a scene graph data structure as a graph (or tree) with nodes that
have appropriate modeling or transformation statements. For now, these can be pseudocode as
we have used it in this chapter. Write a tree walker that generates the appropriate sequence of
statements to present a scene to a graphics API. Can you see how to make some of the
transformations parametric so you can generate motion in the scene? Can you see how to
generate the statements that invert the eyepoint placement transformations if the eye point is not
in standard position?
In defining your model for your program, you will use a single function to specify the geometry of
your model to OpenGL. This function specifies that geometry is to follow, and its parameter
defines the way in which that geometry is to be interpreted for display:
glBegin(mode);
// vertex list: point data to create a primitive object in
// the drawing mode you have indicated
// appearance information such as normals and texture
// coordinates may also be specified here
glEnd();
The vertex list is interpreted as needed for each drawing mode, and both the drawing modes and
the interpretation of the vertex list are described in the discussions below. This pattern of
glBegin(mode) - vertex list - glEnd uses different values of the mode to establish
the way the vertex list is used in creating the image. Because you may use a number of different
kinds of components in an image, you may use this pattern several times for different kinds of
drawing. We will see a number of examples of this pattern in this module.
In OpenGL, point (or vertex) information is presented to the computer through a set of functions
that go under the general name of glVertex*(…). These functions enter the numeric value of
the vertex coordinates into the OpenGL pipeline for the processing to convert them into image
information. We say that glVertex*(…)is a set of functions because there are many functions
that differ only in the way they define their vertex coordinate data. You may want or need to
specify your coordinate data in any standard numeric type, and these functions allow the system to
respond to your needs.
• If you want to specify your vertex data as three separate real numbers, or floats (we'll use
the variable names x, y, and z, though they could also be float constants), you can use
glVertex3f(x,y,z). Here the character f in the name indicates that the arguments are
floating-point; we will see below that other kinds of data formats may also be specified for
vertices.
• If you want to define your coordinate data in an array, you could declare your data in a
form such as glFloat x[3] and then use glVertex3fv(x) to specify the vertex.
Adding the letter v to the function name specifies that the data is in vector form (actually a
pointer to the memory that contains the data, but an array’s name is really such a pointer).
Other dimensions besides 3 are also possible, as noted below.
Additional versions of the functions allow you to specify the coordinates of your point in two
dimensions (glVertex2*); in three dimensions specified as integers (glVertex3i), doubles
(glVertex3d), or shorts (glVertex3s); or as four-dimensional points (glVertex4*). The
four-dimensional version uses homogeneous coordinates, as described earlier in this chapter. You
will see some of these used in the code examples later in this chapter.
One of the most important things to realize about modeling in OpenGL is that you can call your
own functions between a glBegin(mode) and glEnd() pair to determine vertices for your
vertex list. Any vertices these functions define by making a glVertex*(…) function call will be
added to the vertex list for this drawing mode. This allows you to do whatever computation you
need to calculate vertex coordinates instead of creating them by hand, saving yourself significant
effort and possibly allowing you to create images that you could not generate by hand. For
example, you may include various kind of loops to calculate a sequence of vertices, or you may
include logic to decide which vertices to generate. An example of this way to generate vertices is
given among the first of the code examples toward the end of this module.
Another important point about modeling is that a great deal of other information can go between a
glBegin(mode) and glEnd() pair. We will see the importance of including information about
vertex normals in the chapters on lighting and shading, and of including information on texture
coordinates in the chapter on texture mapping. So this simple construct can be used to do much
more than just specify vertices. Although you may carry out whatever processing you need within
the glBegin(mode) and glEnd() pair, there are a limited number of OpenGL operations that
are permitted here. In general, the available OpenGL operations here are glVertex, glColor,
glNormal, glTexCoord, glEvalCoord, glEvalPoint, glMaterial, glCallList,
and glCallLists, although this is not a complete list. Your OpenGL manual will give you
additional information if needed.
The mode for drawing points with the glBegin function is named GL_POINTS, and any vertex
data between glBegin and glEnd is interpreted as the coordinates of a point we wish to draw.
If we want to draw only one point, we provide only one vertex between glBegin and glEnd; if
we want to draw more points, we provide more vertices between them. If you use points and want
to make each point more visible, the function glPointSize(float size) allows you to set
the size of each point, where size is any nonnegative real value and the default size is 1.0.
The code below draws a sequence of points in a straight line. This code takes advantage of fact
that we can use ordinary programming processes to define our models, showing we need not
hand-calculate points when we can determine them by an algorithmic approach. We specify the
vertices of a point through a function pointAt() that calculates the coordinates and calls the
glVertex*() function itself, and then we call that function within the glBegin/glEnd pair.
The function calculates points on a spiral along the z-axis with x- and y-coordinates determined by
functions of the parameter t that drives the entire spiral.
void pointAt(int i) {
glVertex3f(fx(t)*cos(g(t)),fy(t)*sin(g(t)),0.2*(float)(5-i));
}
glBegin(GL_POINTS);
for ( i=0; i<10; i++ )
pointAt(i);
glEnd();
}
Line segments
To draw line segments, we use the GL_LINES mode for glBegin/glEnd. For each segment
we wish to draw, we define the vertices for the two endpoints of the segment. Thus between
glBegin and glEnd each pair of vertices in the vertex list defines a separate line segment.
Line strips
Connected lines are called line strips in OpenGL, and you can specify them by using the mode
GL_LINE_STRIP for glBegin/glEnd. The vertex list defines the line segments as noted in
the general discussion of connected lines above, so if you have N vertices, you will have N-1 line
segments. With either line segments or connected lines, we can set the line width to emphasize (or
de-emphasize) a line. Heavier line widths tend to attract more attention and give more emphasis
than lighter line widths. The line width is set with the glLineWidth(float width)
function. The default value of width is 1.0 but any nonnegative width can be used.
As an example of a line strip, let’s consider a parametric curve. Such curves in 3-space are often
interesting objects of study. The code below define a rough spiral in 3-space that is a good
(though simple) example of using a single parameter to define points on a parametric curve so it
can be drawn for study.
glBegin(GL_LINE_STRIP);
for ( i=0; i<=10; i++ )
glVertex3f(2.0*cos(3.14159*(float)i/5.0),
2.0*sin(3.14159*(float)i/5.0),0.5*(float)(i-5));
glEnd();
This can be made much more sophisticated by increasing the number of line segments, and the
code can be cleaned up a bit as described in the code fragment below. Simple experiments with the
step and zstep variables will let you create other versions of the spiral as experiments.
#define PI 3.14159
#define N 100
step = 2.0*PI/(float)N;
zstep = 2.0/(float)N;
glBegin(GL_LINE_STRIP);
for ( i=0; i<=N; i++)
glVertex3f(2.0*sin(step*(float)i),2.0*cos(step*(float)i),
-1.0+zstep*(float)i);
glEnd();
If this spiral is presented in a program that includes some simple rotations, you can see the spiral
from many points in 3-space. Among the things you will be able to see are the simple sine and
cosine curves, as well as one period of the generic shifted sine curve.
Line loops
A line loop is just like a line strip except that an additional line segment is drawn from the last
vertex in the list to the first vertex in the list, creating a closed loop. There is little more to be said
about line loops; they are specified by using the mode GL_LINE_LOOP.
Sequence of triangles
Because there are two different modes for drawing sequences of triangles, we’ll consider two
examples in this section. The first is a triangle fan, used to define an object whose vertices can be
seen as radiating from a central point. An example of this might be the top and bottom of a sphere,
where a triangle fan can be created whose first point is the north or south pole of the sphere. The
second is a triangle strip, which is often used to define very general kinds of surfaces, because
most surfaces seem to have the kind of curvature that keeps rectangles of points on the surface
from being planar. In this case, triangle strips are much better than quad strips as a basis for
creating curved surfaces that will show their surface properties when lighted.
The triangle fan (that defines a cone, in this case) is organized with its vertex at point
(0.0, 1.0, 0.0) and with a circular base of radius 0.5 in the X –Z plane. Thus the cone is oriented
towards the y-direction and is centered on the y-axis. This provides a surface with unit diameter
and height, as shown in Figure 3.1. When the cone is used in creating a scene, it can easily be
defined to have whatever size, orientation, and location you need by applying appropriate modeling
transformations in an appropriate sequence. Here we have also added normals and flat shading to
emphasize the geometry of the triangle fan, although the code does not reflect this.
glBegin(GL_TRIANGLE_FAN);
glVertex3f(0., 1.0, 0.); // the point of the cone
for (i=0; i < numStrips; i++) {
angle = 2. * (float)i * PI / (float)numStrips;
glVertex3f(0.5*cos(angle), 0.0, 0.5*sin(angle));
// code to calculate normals would go here
}
glEnd();
The surface rendering can then be organized as a nested loop, where each iteration of the loop
draws a triangle strip that presents one section of the surface. Each section is one unit in the X
direction that extends across the domain in the Z direction. The code for such a strip is shown
below, and the resulting surface is shown in Figure 3.2. Again, the code that calculates the
normals is omitted; this example is discussed further and the normals are developed in the later
chapter on shading. This kind of surface is explored in more detail in the chapters on scientific
applications of graphics.
for ( i=0; i<XSIZE-1; i++ )
for ( j=0; j<ZSIZE-1; j++ ) {
glBegin(GL_TRIANGLE_STRIP);
glVertex3f(XX(i),vertices[i][j],ZZ(j));
glVertex3f(XX(i+1),vertices[i+1][j],ZZ(j));
glVertex3f(XX(i),vertices[i][j+1],ZZ(j+1));
glVertex3f(XX(i+1),vertices[i+1][j+1],ZZ(j+1));
glEnd();
}
Figure 3.2: the full surface created by triangle strips, with a single strip highlighted in cyan
This example is a white surface lighted by three lights of different colors, a technique we describe
in the chapter on lighting. This surface example is also briefly revisited in the quads discussion
Quads
To create a set of one or more distinct quads you use glBegin/glEnd with the GL_QUADS
mode. As described earlier, this will take four vertices for each quad. An example of an object
based on quadrilaterals would be the function surface discussed in the triangle strip above. For
quads, the code for the surface looks like this:
Quad strips
In a fairly common application, we can create long, narrow tubes with square cross-section. This
can be used as the basis for drawing 3-D coordinate axes or for any other application where you
might want to have, say, a beam in a structure. The quad strip defined below creates the tube
oriented along the Z-axis with the cross-section centered on that axis. The dimensions given make
a unit tube—a tube that is one unit in each dimension, making it actually a cube. These dimensions
will make it easy to scale to fit any particular use.
#define RAD 0.5
#define LEN 1.0
glBegin(GL_QUAD_STRIP);
glVertex3f( RAD, RAD, LEN ); // start of first side
glVertex3f( RAD, RAD, 0.0 );
glVertex3f(-RAD, RAD, LEN );
glVertex3f(-RAD, RAD, 0.0 );
glVertex3f(-RAD,-RAD, LEN ); // start of second side
glVertex3f(-RAD,-RAD, 0.0 );
glVertex3f( RAD,-RAD, LEN ); // start of third side
glVertex3f( RAD,-RAD, 0.0 );
You can also get the same object by using the GLUT cube that is discussed below and applying
appropriate transformations to center it on the Z-axis.
General polygon
The GL_POLYGON mode for glBegin/glEnd is used to allow you to display a single convex
polygon. The vertices in the vertex list are taken as the vertices of the polygon in sequence order,
and we remind you that the polygon needs to be convex. It is not possible to display more than
one polygon with this operation because the function will always assume that whatever points it
receives go in the same polygon.
The definition of GL_POLYGON mode is that it displays a convex polygon, but what if you give it
a non-convex polygon? (Examples of convex and non-convex polygons are given in Figure 3.3,
repeated from the previous chapter.) As we saw in the previous chapter, a convex polygon can be
represented by a triangle fan so OpenGL tries to draw the polygon using a triangle fan. This will
cause very strange-looking figures if the original polygon is not complex!
Probably the simplest kind of multi-sided convex polygon is the regular N-gon, an N-sided figure
with all edges of equal length and all interior angles between edges of equal size. This is simply
created (in this case, for N=7), again using trigonometric functions to determine the vertices.
#define PI 3.14159
#define N 7
step = 2.0*PI/(float)N;
glBegin(GL_POLYGON);
for ( i=0; i<=N; i++)
glVertex3f(2.0*sin(step*(float)i),
2.0*cos(step*(float)i),0.0);
glEnd();
Note that this polygon lives in the X –Y plane; all the Z-values are zero. This polygon is also in the
default color (white) because we have not specified the color to be anything else. This is an
example of a “canonical” object—an object defined not primarily for its own sake, but as a template
that can be used as the basis of building another object as noted later, when transformations and
object color are available. An interesting application of regular polygons is to create regular
polyhedra—closed solids whose faces are all regular N-gons. These polyhedra are created by
writing a function to draw a simple N-gon and then using transformations to place these properly
in 3-space to be the boundaries of the polyhedron.
Antialiasing
To use the built-in OpenGL antialiasing, choose the various kinds of point, line, or polygon
smoothing with the glEnable(...) function. Each implementation of OpenGL will define a
default behavior for smoothing, so you may want to override that default by defining your choice
with the glHint(...) function. The appropriate pairs of enable/hint are shown here:
glEnable(GL_LINE_SMOOTH);
glHint(GL_LINE_SMOOTH_HINT,GL_NICEST);
glEnable(GL_POINT_SMOOTH);
glHint(GL_POINT_SMOOTH_HINT,GL_NICEST);
glEnable(GL_POLYGON_SMOOTH);
glHint(GL_POLYGON_SMOOTH_HINT,GL_NICEST);
There is a more sophisticated kind of polygon smoothing involving entire scene antialiasing, done
by drawing the scene into the accumulation buffer with slight offsets so that boundary pixels will
be chosen differently for each version. This is a time-consuming process and is generally
considered a more advanced use of OpenGL than we are assuming in this book. We do discuss
the accumulation buffer in a later chapter when we discuss motion blur, but we will not go into
more detail here.
Because a cube is made up of six square faces, it is very tempting to try to make the cube from a
single quad strip. Looking at the geometry, though, it is impossible to make a single quad strip go
around the cube; in fact, the largest quad strip you can create from a cube’s faces has only four
quads. It is possible to create two quad strips of three faces each for the cube (think of how a
baseball is stitched together), but here we will only use a set of six quads whose vertices are the
eight vertex points of the cube. Below we repeat the declarations of the vertices, normals, edges,
and faces of the cube from the previous chapter. We will use the glVertex3fv(…) vertex
specification function within the specification of the quads for the faces.
typedef float point3[3];
typedef int edge[2];
typedef int face[4]; // each face of a cube has four edges
edge edges[24] = {{ 0, 1 }, { 1, 3 }, { 3, 2 }, { 2, 0 },
{ 0, 4 }, { 1, 5 }, { 3, 7 }, { 2, 6 },
{ 4, 5 }, { 5, 7 }, { 7, 6 }, { 6, 4 },
{ 1, 0 }, { 3, 1 }, { 2, 3 }, { 0, 2 },
{ 4, 0 }, { 5, 1 }, { 7, 3 }, { 6, 2 },
{ 5, 4 }, { 7, 5 }, { 6, 7 }, { 4, 6 }};
As we said before, drawing the cube proceeds by working our way through the face list and
determining the actual points that make up the cube. We will expand the function we gave earlier
to write the actual OpenGL code below. Each face is presented individually in a loop within the
glBegin-glEnd pair, and with each face we include the normal for that face. Note that only the
first vertex of the first edge of each face is identified, because the GL_QUADS drawing mode
takes each set of four vertices as the vertices of a quad; it is not necessary to close the quad by
including the first point twice.
void cube(void) {
int face, edge;
glBegin(GL_QUADS);
for (face = 0; face < 6; face++) {
glNormal3fv(normals[face];
for (edge = 0; edge < 4; edge++)
glVertex3fv(vertices[edges[cube[face][edge]][0]]);
}
glEnd();
}
This cube is shown in Figure 3.4, presented through the six steps of adding individual faces (the
faces are colored in the typical RGBCMY sequence so you may see each added in turn). This
approach to defining the geometry is actually a fairly elegant way to define a cube, and takes very
little coding to carry out. However, this is not the only approach we could take to defining a cube.
Because the cube is a regular polyhedron with six faces that are squares, it is possible to define the
cube by defining a standard square and then using transformations to create the faces from this
master square. Carrying this out is left as an exercise for the student.
This approach to modeling an object includes the important feature of specifying the normals (the
vectors perpendicular to each face) for the object. We will see in the chapters on lighting and
shading that in order to get the added realism of lighting on an object, we must provide information
on the object’s normals, and it was straightforward to define an array that contains a normal for
each face. Another approach would be to provide an array that contains a normal for each vertex if
you would want smooth shading for your model; see the chapter on shading for more details. We
will not pursue these ideas here, but you should be thinking about them when you consider
modeling issues with lighting.
Modeling with polygons alone would require you to write many standard graphics elements that
are so common, any reasonable graphics system should include them. OpenGL includes the
OpenGL Utility Library, GLU, with many useful functions, and most releases of OpenGL also
include the OpenGL Utility Toolkit, GLUT. We saw in the first chapter that GLUT includes
window management functions, and both GLU and GLUT include a number of built-in graphical
elements that you can use. This chapter describes a number of these elements.
The objects that these toolkits provide are defined with several parameters that define the details,
such as the resolution in each dimension of the object with which the object is to be presented.
Many of these details are specific to the particular object and will be described in more detail when
we describe each of these.
The GLU toolkit provides several general quadric objects, which are objects defined by quadric
equations (polynomial equations in three variables with degree no higher than two in any term),
including Spheres (gluSphere), cylinders (gluCylinder), and disks (gluDisk). Each
GLU primitive is declared as a GLUquadric and is allocated with the function
GLUquadric* gluNewQuadric( void )
Each quadric object is a surface of revolution around the Z-axis. Each is modeled in terms of
subdivisions around the Z-axis, called slices, and subdivisions along the Z-axis, called stacks.
Figure 3.5 shows an example of a typical pre-built quadric object, a GLUT wireframe sphere,
modeled with a small number of slices and stacks so you can see the basis of this definition.
The GLU quadrics are very useful in many modeling circumstances because you can use scaling
and other transformations to create many common objects from them. The GLU quadrics are also
useful because they have capabilities that support many of the OpenGL rendering capabilities that
support creating interesting images. You can determine the drawing style with the
gluQuadricDrawStyle() function that lets you select whether you want the object filled,
wireframe, silhouette, or drawn as points. You can get normal vectors to the surface for lighting
models and smooth shading with the gluQuadricNormals() function that lets you choose
whether you want no normals, or normals for flat or smooth shading. Finally, with the
Below we describe each of the GLU primitives by listing its function prototype; more details may
be found in the GLU section of your OpenGL manual.
GLU cylinder:
void gluCylinder(GLUquadric* quad, GLdouble base, GLdouble top,
GLdouble height, GLint slices, GLint stacks)
quad identifies the quadrics object you previously created with gluNewQuadric
base is the radius of the cylinder at z = 0, the base of the cylinder
top is the radius of the cylinder at z = height, and
height is the height of the cylinder.
GLU disk:
The GLU disk is different from the other GLU primitives because it is only two-dimensional, lying
entirely within the X –Y plane. Thus instead of being defined in terms of stacks, the second
granularity parameter is loops, the number of concentric rings that define the disk.
void gluDisk(GLUquadric* quad, GLdouble inner, GLdouble outer,
GLint slices, GLint loops)
quad identifies the quadrics object you previously created with gluNewQuadric
inner is the inner radius of the disk (may be 0).
outer is the outer radius of the disk.
GLU sphere:
void gluSphere(GLUquadric* quad, GLdouble radius, GLint slices,
GLint stacks)
quad identifies the quadrics object you previously created with gluNewQuadric
radius is the radius of the sphere.
Models provided by GLUT are more oriented to geometric solids, except for the teapot object.
They do not have as wide a usage in general situations because they are of fixed shape and many
cannot be modeled with varying degrees of complexity. They also do not include shapes that can
readily be adapted to general modeling situations. Finally, there is no general way to create a
texture map for these objects, so it is more difficult to make scenes using them have stronger visual
interest. The GLUT models include a cone (glutSolidCone), cube (glutSolidCube),
dodecahedron (12-sided regular polyhedron, glutSolidDodecahedron), icosahedron (20-
sided regular polyhedron, glutSolidIcosahedron), octahedron (8-sided regular polyhedron,
glutSolidOctahedron), a sphere (glutSolidSphere), a teapot (the Utah teapot, an icon
of computer graphics sometimes called the “teapotahedron”, glutSolidTeapot), a tetrahedron
(4-sided regular polyhedron, glutSolidTetrahedron), and a torus (glutSolidTorus).
Figure 3.6: several GLU and GLUT objects as described in the text
While we only listed the “solid” versions of the GLUT primitives, they include both solid and
wireframe versions. Each object has a canonical position and orientation, typically being centered
at the origin and lying within a standard volume and, if it has an axis of symmetry, that axis is
aligned with the z-axis. As with the GLU standard primitives, the GLUT cone, sphere, and torus
allow you to specify the granularity of the primitive’s modeling, but the others do not. You should
not take the term “solid” for the GLUT objects too seriously; they are not actually solid but are
simply bounded by polygons. “Solid” merely means that the shapes are filled in, in contrast with
the glutWireSphere and similar constructs. If you clip the “solid” objects you will find that
they are, in fact, hollow.
If you have GLUT with your OpenGL, you should check the GLUT manuals for the details on
these solids and on many other important capabilities that GLUT will add to your OpenGL system.
If you do not already have it, you can download the GLUT code from the OpenGL Web site for
many different systems and install it in your OpenGL area so you may use it readily with your
system.
Selections from the overall collection of GLU and GLUT objects are shown in Figure 3.6 to show
the range of items you can create with these tools. From top left and moving clockwise, we see a
gluCylinder, a gluDisk, a glutSolidCone, a glutSolidIcosahedron, a glutSolidTorus, and a
glutSolidTeapot. You should think about how you might use various transformations to create
other figures from these basic parts.
An example
...
switch (selectedObject) {
case (1): {
myQuad=gluNewQuadric();
slices = stacks = resolution;
gluSphere( myQuad , radius , slices , stacks );
break;
}
case (2): {
myQuad=gluNewQuadric();
slices = stacks = resolution;
gluCylinder( myQuad, 1.0, 1.0, 1.0, slices, stacks );
break;
}
case (3): {
glutSolidDodecahedron(); break;
}
case (4): {
nsides = rings = resolution;
glutSolidTorus( 1.0, 2.0, nsides, rings);
break;
}
case (5): {
glutSolidTeapot(2.0); break;
}
}
...
}
One of the differences between student programming and professional programming is that
students are often asked to create applications or tools for the sake of learning creation, not for the
sake of creating working, useful things. The graphics primitives that are the subject of the first
section of this module are the kind of tools that students are often asked to use, because they
require more analysis of fundamental geometry and are good learning tools. However, working
programmers developing real applications will often find it useful to use pre-constructed templates
and tools such as the GLU or GLUT graphics primitives. You are encouraged to use the GLU and
Transformations in OpenGL
In OpenGL, there are only two kinds of transformations: projection transformations and
modelview transformations. The latter includes both the viewing and modeling transformations.
We have already discussed projections and viewing, so here we will focus on the transformations
used in modeling.
Among the modeling transformations, there are three fundamental kinds: rotations, translations,
and scaling. In OpenGL, these are applied with the built-in functions (actually function sets)
glRotate, glTranslate, and glScale, respectively. As we have found with other
OpenGL function sets, there are different versions of each of these, varying only in the kind of
parameters they take.
This rotation follows the right-hand rule, so the rotation will be counterclockwise as viewed from
the direction of the vector (x, y, z). The simplest rotations are those around the three coordinate
axes, so that glRotatef(angle, 1., 0., 0.) will rotate the model space around the X -
axis.
As we saw earlier in the chapter, there are many transformations that go into defining exactly how
a piece of geometry is presented in a graphics scene. When we consider the overall order of
transformations for the entire model, we must consider not only the modeling transformations but
also the projection and viewing transformations. If we consider the total sequence of
transformations in the order in which they are specified, we will have the sequence:
P V T0 T1 … Tn Tn+1 … Tlast
with P being the projection transformation, V the viewing transformation, and T0, T1, … Tlast the
transformations specified in the program to model the scene, in order (T0 is first, Tlast is last and is
closest to the actual geometry). The projection transformation is defined in the reshape()
function; the viewing transformation is defined in the init() function, in the reshape function,
or at the beginning of the display() function so it is defined at the beginning of the modeling
process. But the sequence in which the transformations are applied is actually the reverse of the
sequence above: Tlast is actually applied first, and V and finally P are applied last. You need to
understand this sequence very well, because it’s critical to understand how you build complex,
hierarchical models.
Simple transformations:
All the code examples use a standard set of axes, which are not included here, and the following
definition of the simple square:
void square (void)
{
typedef GLfloat point [3];
point v[8] = {{12.0, -1.0, -1.0},
{12.0, -1.0, 1.0},
{12.0, 1.0, 1.0},
{12.0, 1.0, -1.0} };
glBegin (GL_QUADS);
glVertex3fv(v[0]);
glVertex3fv(v[1]);
glVertex3fv(v[2]);
glVertex3fv(v[3]);
glEnd();
}
To display the simple rotations example, we use the following display function:
void display( void )
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
axes(10.0);
for (i=0; i<8; i++) {
glPushMatrix();
glRotatef(theta, 0.0, 0.0, 1.0);
if (i==0)
glColor3f(1.0, 0.0, 0.0);
else
glColor3f(1.0, 1.0, 1.0);
square();
theta += 45.0;
glPopMatrix();
}
glutSwapBuffers();
}
To display the simple translations example, we use the following display function:
void display( void )
{ int i;
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
axes(10.0);
for (i=0; i<=12; i++) {
glPushMatrix();
glTranslatef(-2.0*(float)i, 0.0, 0.0);
if (i==0) glColor3f(1.0, 0.0, 0.0);
else glColor3f(1.0, 1.0, 1.0);
square();
glPopMatrix();
}
glutSwapBuffers();
}
To display the simple scaling example, we use the following display function:
void display( void )
{ int i;
float s;
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
axes(10.0);
for (i=0; i<6; i++) {
glPushMatrix();
s = (6.0-(float)i)/6.0;
glScalef( s, s, s );
if (i==0)
glColor3f(1.0, 0.0, 0.0);
else
glColor3f(1.0, 1.0, 1.0);
square();
glPopMatrix();
}
glutSwapBuffers();
}
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glPushMatrix();
// model the head
glColor3f(0.4, 0.4, 0.4); // dark gray head
glScalef(3.0, 1.0, 1.0);
myQuad = gluNewQuadric();
gluSphere(myQuad, 1.0, 10, 10);
glPopMatrix();
glPushMatrix();
// model the left eye
glColor3f(0.0, 0.0, 0.0); // black eyes
glTranslatef(1.0, -0.7, 0.7);
glScalef(0.2, 0.2, 0.2);
myQuad = gluNewQuadric();
gluSphere(myQuad, 1.0, 10, 10);
glPopMatrix();
glPushMatrix();
// model the right eye
glTranslatef(1.0, 0.7, 0.7);
glScalef(0.2, 0.2, 0.2);
myQuad = gluNewQuadric();
gluSphere(myQuad, 1.0, 10, 10);
glPopMatrix();
glPushMatrix();
// model the left ear
glColor3f(1.0, 0.6, 0.6); // pink ears
glTranslatef(-1.0, -1.0, 1.0);
glRotatef(-45.0, 1.0, 0.0, 0.0);
glScalef(0.5, 2.0, 0.5);
myQuad = gluNewQuadric();
gluSphere(myQuad, 1.0, 10, 10);
glPopMatrix();
glPushMatrix();
// model the right ear
glColor3f(1.0, 0.6, 0.6); // pink ears
glTranslatef(-1.0, 1.0, 1.0);
glRotatef(45.0, 1.0, 0.0, 0.0);
In OpenGL, the stack for the modelview matrix is to be at least 32 deep, but this can be inadequate
to handle some complex models if the hierarchy is more than 32 layers deep. In this case, as we
mentioned in the previous chapter, you need to know that a transformation is a 4x4 matrix of
GLfloat values that is stored in a single array of 16 elements. You can create your own stack of
these arrays that can have any depth you want, and then push and pop transformations as you wish
on that stack. To deal with the modelview transformation itself, there are functions that allow you
to save and to set the modelview transformation as you wish. You can capture the current value of
the transformation with the function
glGetFloatv(GL_MODELVIEW_MATRIX, viewProj);
(here we have declared GLfloat viewProj[16]), and you can use the functions
glLoadIdentity();
glMultMatrixf( viewProj );
to set the current modelview matrix to the value of the matrix viewProj, assuming that you were
in modelview mode when you execute these functions.
In an example somewhat like the more complex eye-following-helicopter example above, we built
a small program in which the eye follows a red sphere at a distance of 4 units as the sphere flies in
a circle above some geometry. In this case, the geometry is a cyan plane on which are placed
several cylinders at the same distance from the center as the sphere flies, along with some
coordinate axes. A snapshot from this very simple model is shown in Figure 3.7. The display
Figure 3.7 the eye following a sphere flying over some cylinders on a plane
function code that implements this viewing is shown after the figure; you will note that the display
function begins with the default view and is followed by the transformations
translate by –4 in Z
translate by –5 in X and –.75 in Y
rotate by -theta around Y
that are the inverse of the cylinder placement and eye placement, first for the sphere
rotate by theta around Y
translate by 5 in X and .75 in Y
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
// Define eye position relative to the sphere it is to follow
// place eye in scene with default definition
gluLookAt(0.,0.,0., 0.,0.,-1., 0.,1.,0.);
glutSwapBuffers();
}
In the previous chapter we discussed the idea of compiling geometry in order to make display
operations more efficient. In OpenGL, graphics objects can be compiled into what is called a
display list, which will contain the final geometry of the object as it is ready for display. Sample
code and an explanation of display lists is given below.
Display lists are relatively easy to create in OpenGL. First, choose an unsigned integer (often you
will just use small integer constants, such as 1, 2, ...) to serve as the name of your list. Then
before you create the geometry for your list, call the function glNewList. Code whatever
geometry you want into the list, and at the end, call the function glEndList. Everything
between the new list and the end list functions will be executed whenever you call glCallList
with a valid list name as parameter. All the operations between glNewList and glEndList
will be carried out, and only the actual set of instructions to the drawing portion of the OpenGL
system will be saved. When the display list is executed, then, those instructions are simply sent to
the drawing system; any operations needed to generate these instructions are omitted.
Because display lists are generally intended to be defined only once but used often, you do not
want to create a list in a function such as the display() function that is called often. Thus it is
common to create them in the init() function or in a function called from within init().
void Display(void) {
...
glCallList(1);
...
}
You will note that the display list was created in GL_COMPILE mode, and it was not executed (the
object was not displayed) until the list was called. It is also possible to have the list displayed as it
is created if you create the list in GL_COMPILE_AND_EXECUTE mode.
OpenGL display lists are named by nonzero unsigned integer values (technically, GLuint values)
and there are several tools available in OpenGL to manage these name values. We will assume in a
first graphics course that you will not need many display lists and that you can manage a small
number of list names yourself, but if you begin to use a number of display lists in a project, you
should look into the glGenLists, glIsList, and glDeleteLists functions to help you
manage the lists properly.
Summary
In this chapter we have presented the way the OpenGL graphics API allows you to define
geometry and transformations to support the geometric modeling presented in the previous chapter.
We have seen the OpenGL implementations of the point, line segment, triangle, quad, and polygon
primitives as well as the line strip, triangle strip, triangle fan, and quad strip geometry compression
techniques. We have also seen a number of geometric objects that are available through the GLU
and GLUT graphics utilities and how they are used in creating scenes; while this chapter focuses
only on geometry, we will see in later chapters that these GLU and GLUT primitives are also very
easy to use with the OpenGL appearance tools.
We have also seen the way OpenGL implements the scaling, translation, and rotation
transformations and how OpenGL manages the modelview transformation stack, allowing you to
implement scene graphs easily with OpenGL tools. Finally, OpenGL allows you to compile parts
of your geometry into display lists that you can re-use much more efficiently than would be done if
you simply compute the geometry in immediate mode.
At this point you should have a good understanding of all the steps in the graphics pipeline for
most polygon-based graphics, so you should be able to write complete graphics programs that
include sound geometry. The only additional modeling presented in this book will be the surface
Questions
1. The OpenGL primitives include quads as well as triangles, but is it really necessary to have a
quad primitive? Is there anything you can do with quads that you couldn’t do with triangles?
2. The GLU objects are handy, but they can be created pretty easily from the standard OpenGL
triangle and quad primitives. Describe how to do this for as many of the objects as you can,
including at least the gluSphere, gluCylinder, gluDisk, glutSolidCone, and glutCube.
3. The GLU objects and some GLUT objects use parameters, often called slices and stacks or
loops, to define the granularity of the objects. For those objects that are defined with these
parameters, can you think of another way you could define the granularity? What are the
advantages of using small values for these parameters? What are the advantages of using large
values for these parameters? Are there any GLU or GLUT objects that do not use slices and
stacks? Why is this so?
4. Are the values for the parameters on the scaling and translation transformations in OpenGL in
model space, world space, or 3D eye space? Give reasons for your answer.
5. The angle of the OpenGL rotation transformation is given in degrees around an arbitrary fixed
line. What is the space in which the fixed line for the rotation is defined? In many application
areas, angles are expressed in radians instead of degrees; what is the conversion formula that
must be applied to radians to get degrees?
6. In the heat diffusion example in the Getting Started chapter, print the source code and highlight
each OpenGL function that it uses. For each, explain what it does and why it’s at this point in
the code.
7. Consider the model in the display() function in the example of the previous question.
Compare the number of operations needed to create and display this model, including all the
transformations, with the number of glVertex(...) function calls that would be used in a
display list. Draw conclusions about the relative efficiency of display lists and simple
modeling.
8. The question above is actually misleading because the model in the heat diffusion example
changes with each idle() callback. Why does it not make sense to try to use display lists for
the heat diffusion problem?
9. In the carousel exercise in the previous chapter, place the eyepoint on one of the objects in the
carousel and change the scene graph from the previous exercise to include this eye placement.
Then write the scene graph that inverts the eyepoint to place it in standard position.
Exercises
Modeling is all about creating graphical objects, so the following sequence of exercises involves
making some general graphical objects you can use.
10.Define a “unit cartoon dumbbell” as a thin cylinder on the x-axis with endpoints at 1.0 and
–1.0, and with two spherical ends of modest size, each centered at one end of the cylinder. We
11.Let’s make a more realistic set of weights with solid disk weights of various sizes. Define a set
of standard disks with standard weights (5kg, 10kg, 20kg, say) with the weight of the disk as
a parameter to determine the thickness and/or radius of the weight, assuming that the weight is
proportional to the volume. Define a function that creates a barbell carrying a given weight that
is a combination of the standard weights, placing the disks appropriately on the bar. (Note that
we are not asking you to create realistic disks with a hole in the middle – yet.)
12.The 3D arrow that we used as an example in the previous chapter used general modeling
concepts, not specific functions. Use the GLU and GLUT modeling tools to implement the 3D
arrow as a working function you could use in an arbitrary scene.
13.Let’s proceed to create an object that is a cylinder with a cylindrical hole, both having the same
center line. Define the object to have unit length with inside and outside cylinders of defined
radii and with disks at the ends to close the pipe. Show how you could use this object to create
a more realistic kind of weight for the previous exercise.
14.All the cylinder-based objects we’ve defined so far have a standard orientation, but clearly we
will want to have cylinders with any starting point and any ending point, so we will need to be
able to give the cylinder any orientation. Let’s start by considering a cylinder with one end at
the origin and the other end at a point P = (x, y, z) that is one unit from the origin. Write a
function that rotates a unit cylinder with one end at the origin so the other end is at P. (Hint –
you will need to use two rotations, and you should think about arctangent functions.)
15.With the orientation problem for a cylinder solved by the previous exercise, write a function to
create a tube strip that connects points p0 , p1 , p2 , ... pn with tubes of radius r and with spheres
of the same radius at the points in order to join the tubes to make a smooth transition from one
tube to another. Use this to create a flexible bar between the two weights in the cartoon
dumbbell in an earlier exercise, and show the bar supported in the middle with a bending bar in
between.
16.Create a curve in the X –Y plane by defining a number of vertices and connecting them with a
line strip. Then create a general surface of revolution by rotating this curve around the Y -axis.
Make the surface model some object you know, such as a glass or a baseball bat, that has
rotational symmetry. Can you use transformations to generalize this to rotate a curve around
any line? How?
Experiments
17.OpenGL give you the ability to push and pop the transformation stack, but you can actually do
a little better. Create a way to “mark” the stack, saving the top of the transformation stack
when you do, and return to any marked stack point by restoring the transformation to the one
that was saved when the stack was marked.
18.The previous experiment is nice, but it can easily lead to destroying the stack and leaving only
the saved transformation on the stack. Can you find a way to return to a marked stack point
and keep the stack as it was below that point?
20.(The small house) Design and create a simple house, with both exterior and interior walls,
doors, windows, ceilings in the rooms, and a simple gable roof. Use different colors for
various walls so they can be distinguished in any view. Set several different viewpoints
around and within the house and show how the house looks from each.
21.(A scene graph parser) Implement the scene graph parser that was designed in this project in
the previous chapter. Each transformation and geometry node is to contain the OpenGL
function names and arguments needed to carry out the transformations and implement the
geometry. The parser should be able to write the display() function for your scene.
We begin with an overview of rectangular 3D Cartesian space and the way points, lines, line
segments, and parametric are represented in a way that is compatible with computation. These are
extended to a discussion of vectors and vector computation, including dot and cross products and
their geometric meanings. Based on the vector view of 3D space, we introduce linear
transformations and show how the basic modeling transformations of scaling, translation, and
rotation are represented in matrix form and how the composition of transformations can be done as
matrix operations. We then go on to discuss planes and computations based on planes, and
polygons and convexity are introduced. Other coordinate systems, such as polar and cylindrical
coordinates are introduced because they can be more natural for some modeling than rectangular
coordinates. The chapter closes with some considerations in collision detection.
Coordinate systems
The set of real numbers—often thought of as the set of all possible distances between points—is a
mathematical abstraction that is effectively modeled as a Euclidean straight line with two uniquely-
identified points. One point is identified with the number 0.0 (we will write all real numbers with
decimals, to meet the expectations of programming languages), called the origin, and the other is
identified with the number 1.0, which we call the unit point. The direction of the line from 0.0 to
1.0 is called the positive direction; the opposite direction of the line is called the negative direction.
These directions identify the parts of the lines associated with positive and negative numbers,
respectively.
If we have two straight lines that are perpendicular to each other and meet in a point, we can define
that point to be the origin for both lines, and choose two points the same distance from the origin
on each line as the unit points. A distance unit is defined to be used by each of the two lines, and
the points at this distance from the intersection point are marked, one to the right of the intersection
and one above it. This gives us the classical 2D coordinate system, often called the Cartesian
coordinate system. The vectors from the intersection point to the right-hand point (respectively the
point above the intersection) are called the unit X - and Y -direction vectors and are indicated by i
and j respectively. More formally, i = <1, 0> and j = <0, 1>. Points in this system are
represented by an ordered pair of real numbers, (X , Y ), and this is probably the most familiar
coordinate system to most people. These points may also be represented by a vector <X,Y> from
the origin to the point, and this vector may be expressed in terms of the unit direction vectors as
Xi+Yj.
In 2D Cartesian coordinates, any two lines that are not parallel will meet in a point. The lines make
four angles when they meet, and the acute angle is called the angle between the lines. If two line
segments begin at the same point, they make a single angle that is called the angle between the line
segments. These angles are measured with the usual trigonometric functions, and we assume that
the reader will have a modest familiarity with trigonometry. Some of the reasons for this
assumption can be found in the discussions below on polar and spherical coordinates, and in the
description of the dot product and cross product. We will discuss more about the trigonometric
aspects of graphics when we get to that point in modeling or lighting.
The 3D world in which we will do most of our computer graphics work is based on 3D Cartesian
coordinates that extend the ideas of 2D coordinates above. This is usually presented in terms of
three lines that meet at a single point, which is identified as the origin for all three lines and is called
the origin, that have their unit points the same distance from that point, and that are mutually
perpendicular. Each point represented by an ordered triple of real numbers (x,y,z). The three lines
correspond to three unit direction vectors, each from the origin to the unit point of its respective
line; these are named i, j, and k for the X -, Y -, and Z-axis, respectively. As above, these
are i = <1, 0, 0>, j = <0, 1, 0>, and k = <0, 0, 1>. These are called the canonical basis for the
space, and the point can be represented as xi+yj+zk. Any ordered triple of real numbers is
identified with the point in the space that lies an appropriate distance from the two-axis planes, with
the first (x) coordinate being the distance from the Y -Z plane, the second (y) coordinate being the
distance from the X -Z plane, and the third (z) coordinate being the distance from the X -Y plane.
This is all illustrated in Figure 4.1.
Y Y
Z
X X
Z
Figure 4.1: right-hand coordinate system with origin (left) and with a point identified
by its coordinates; left-hand coordinate system (right)
3D coordinate systems can be either right-handed or left-handed: the third axis can be the cross
product of the first two axes, or it can be the negative of that cross product, respectively. (We will
talk about cross products a little later in this chapter.) The “handed-ness” comes from a simple
technique: if you hold your hand in space with your fingers along the first axis and curl your
fingers towards the second axis, your thumb will point in a direction perpendicular to the first two
axis. If you do this with the right hand, the thumb points in the direction of the third axis in a
right-handed system. If you do it with the left hand, the thumb points in the direction of the third
axis in a left-handed system.
Some computer graphics systems use right-handed coordinates, and this is probably the most
natural coordinate system for most uses. For example, this is the coordinate system that naturally
fits electromagnetic theory, because the relationship between a moving current in a wire and the
magnetic field it generates is a right-hand coordinate relationship. The modeling in Open GL is
based on a right-hand coordinate system.
On the other hand, there are other places where a left-handed coordinate system is natural. If you
consider a space with a standard X -Y plane as the front of the space and define Z as the distance
back from that plane, then the values of Z naturally increase as you move back into the space. This
is a left-hand relationship.
In 2D space, you may be familiar with the division of Cartesian space into four quadrants, each
containing points with the same algebraic signs in each component. The first quadrant contains
The same idea applies in 3D space, but there are eight such regions and they are called octants.
The names for each octant are less standardized than are the four quadrants in 2D space, but the
region in which all of x, y, and z are positive is called the first octant, and we will sometimes think
about views in which the eye point is in this octant.
In this model, any real number is identified with the unique point on the line that is
• at the distance from the origin which is that number times the distance from 0.0 to 1.0, and
• in the direction of the number’s sign.
We have heard that a line is determined by two points; let’s see how that can work. Let the first
point be P0 = (X0,Y 0,Z 0) and the second point be P1 = (X1,Y1,Z1). Let’s call P0 the origin and
P1 the unit point. Points on the segment are obtained by starting at the “first” point P0 offset by a
fraction of the difference vector P1-P0. This difference vector is sometimes called the direction
vector for the line, especially if it has been normalized (made to have length 1). Then any point
P = (X,Y,Z ) on the line can be expressed in vector terms by
P = P0 + t(P1− P0) = (1− t)P0 + tP1
for a single value of a real variable t. This computation is actually done for each coordinate, with a
separate equation for each of X , Y , and Z as follows:
X = X0 + t(X1− X 0) = (1− t)X 0 + tX1
Y = Y 0 + t(Y1− Y 0) = (1− t)Y 0 + tY1
Z = Z 0 + t(Z1− Z0) = (1− t)Z 0 + tZ1
Thus any line segment can be determined by a single parameter, which is why a line is called a
one–dimensional object. Points along the line are determined by values of the parameter, as
illustrated in Figure 4.2 below that shows the coordinates of the points along a line segment
determined by value of t from 0 to 1 in increments of 0.25.
t=0
t = .25
P0
t = .5
t = .75
t=1
P1
Figure 4.2: a parametric line segment with points determined by some values of the parameter
This representation for a line segment (or an entire line, if you place no restrictions on the value of
the parameter t) also allows you to compute intersections involving lines. The reverse concept is
also useful, so if you have a known point on the line, you can calculate the value of the parameter t
that would produce that point. For example, if a line intersects a plane or another geometric object
at a point Q, a vector calculation of the form P0 + t ∗(P1− P0) = Q would allow you to calculate
the value of the parameter t that gives the intersection point on the line. This calculation might
involve only a single equation or all three equations, depending on the situation, but your goal is to
compute the value of t that represents the point in question. This is often the basis for geometric
computations such as the intersection of a line and a plane.
As an application of the concept of parametric lines, let’s see how we can compute the distance
from a point in 3-space to a line. If the point is P0=(u,v,w) and the line is given by parametric
equations
x = a + bt
y = c + dt
z = e + ft
then for any point P=(x,y,z) on the line, the square of thedistance from P to P0 is given by
(a + bt − u)2 + (c + dt − v) 2 + (e + ft − w) 2
which is a quadratic equation in t. This quadratic is minimized by taking its derivative and looking
for the point where the derivative is 0:
2b(a + bt − u) + 2d(c + dt − v) + 2 f (e + ft − w) = 0
which is a simple linear equation in t, and its unique solution for t allows you to calculate the point
P on the line which is nearest point P0.
In standard Euclidean geometry, two points determine a line as we noted above. In fact, in the
same way we talked about any line having unique origin identified with 0.0 and unit point
identified with 1.0, a line segment—the points on the line between these two particular points—can
be identified as the points corresponding to values between 0 and 1. It is done by much the same
process as we used to illustrate the 1-dimensional nature of a line above. That is, just as in the
discussion of lines above, if the two points are P0 and P1, we can identify any point between them
as P = (1− t)P0 + tP1 for a unique value of t between 0 and 1. This is called the parametric form
for the line segment between the two points. If we change our limits of the value of t, we can get
other variations on the idea of a line. For example, if we allow t to range over any interval, we get
a different line segment. As another example, if we take the equation above for all nonnegative
values of t, we get a line that has an initial point but no ending point; such a line is called a ray.
The line segment gives us an example of determining a continuous set of points by functions from
the interval [0,1] to 3-space. In general, if we consider any set of continuous functions x(t), y(t),
and z(t) that are defined on [0,1], the set of points they generate is called a parametric curve in 3-
space. There are some very useful applications of such curves. For example, you can display the
locations of a moving point in space, you can compute the positions along a curve from which you
will view a scene in a fly-through, or you can describe the behavior of a function of two variables
on a domain that lies on a curve in 2-space.
Vectors
Vectors in 3-space are triples of real numbers written as <a, b, c>. These may be identified with
points, or they may be viewed as representing the motion needed to go from one point to another in
space. The latter viewpoint will be one we use often.
The length of a vector is defined as the square root of the sum of the squares of the vector’s
components, written < a,b,c > = a 2 + b 2 + c 2 . A unit vector is a vector whose length is 1, and
unit vectors are very important in a number of modeling and rendering computations; basically a
unit vector can be treated as a pure direction. If V =< a,b,c > is any vector, we can make it a unit
vector by dividing each of its components by its length: a V ,b V ,c V . Doing this is called
normalizing the vector.
There are two computations on vectors that we will need to understand, and sometimes to perform,
in developing the geometry for our graphic images. The first is the dot product of two vectors.
This produces a single real value that represents the projection of one vector on the other and its
value is the product of the lengths of the two vectors times the cosine of the angle between them.
The dot product computation is quite simple: it is simply the sum of the componentwise products
of the vectors. If the two vectors A and B are
There is an important alternate meaning of the dot product. A straightforward calculation shows
that U • V = U ∗ V ∗ cos(Θ), where ||V|| denotes the length of a vector and Θ is the angle
between the vectors. This links the algebraic calculation of the dot product with geometric
properties of vectors, and is very important. For example, we see that if two vectors are parallel
the dot product is simply the product of their lengths with the sign saying whether they are oriented
in the same or opposite directions. However, if the vectors are orthogonal—the angle between
them is 90°—then the dot product is zero. If the angle between them is acute, then the dot product
will be positive, no matter what the orientation of the vectors, because the cosine of any angle
between –90° and 90° is positive; if the angle between them is obtuse, then the dot product will be
negative. Note that another useful application of the fact is that you can compute the angle between
two vectors if you know the vectors’ lengths and dot product.
V
U
The relationship between the dot product and the cosine of the included angle also allows us to
look at the component of any vector that lies in the direction of another vector, which we call the
projection of one vector on another. As we see in Figure 4.3, with any two vectors we can
construct a right triangle in which one side is one of the vectors and the other is the projection of
the first on the second. Because the side of the triangle that lies in the direction of V has length
U ∗cos(Θ) , and because of the definition of the dot product in terms of the cosine of the included
angle, the length of the projection of U onto V is actually U • V / V . This is especially useful
when V is a unit vector because then the dot product alone gives the length of the projection, and
this is one of the reasons for normalizing the vectors we use.
The second computation is the cross product, or vector product, of two vectors. The cross product
of two vectors is a third vector that is perpendicular to each of the original vectors and whose
length is the product of the two vector lengths times the sin of the angle between them. Thus if
two vectors are parallel, the cross product is zero; if they are orthogonal, the cross product has
length equal to the product of the two lengths; if they are both unit vectors, the cross product is the
sine of the included angle. The computation of the cross product can be expressed as the
determinant of a matrix whose first row is the three standard unit vectors, whose second row is the
first vector of the product, and whose third row is the second vector of the product. Denoting the
unit direction vectors in the X , Y , and Z directions as i, j, and k, as above, we can express the
cross product of two vectors < a,b,c > and < u,v,w > in terms of a determinant:
i j k
b c a c a b
< a,b,c > × < u,v,w >= det a b c = i det − j det + k det =< bw − cv,cu − aw,av − bu >
v w u w u v
u v w
As you may remember from determinants, if two adjacent rows or columns of a matrix are
switched, the determinant’s sign is changed. This shows that UxV = –VxU.
As an example of cross products,, consider the two points we saw earlier, but treat them as
vectors: u = <3.0, 4.0, 5.0> and v = <5.0, -1,5, 4.0>. The length of u is the square root of
u•u, or 7.071, and the length of v is the square root of v•v, or 6.576. Then we see that
u•v = 15.0 – 6.0 + 20.0 = 29.0, and the cosine of the angle between u and v is
29.0/(7.071*6.576) or 0.624. Further, the cross product of the two vectors is computed as
Carrying out the 2x2 determinants gives us u x v = 23.5 i + 13.0 j – 24.5 k as the cross product.
You should check to see that this product is orthogonal to both u and v by computing the dot
products, which should be 0.
The cross product has a “handedness” property and is said to be a right-handed operation. That is,
if you align the fingers of your right hand with the direction of the first vector and curl your fingers
towards the second vector, your right thumb will point in the direction of the cross product. Thus
the order of the vectors is important; if you reverse the order, you reverse the sign of the product
(recall that interchanging two rows of a determinant will change its sign), so the cross product
operation is not commutative. As a simple example, with i, j, and k as above, we see that
i x j = k but that j x i = -k. In general, if you consider the arrangement of Figure 4.4,
if you think of the three direction vectors as being wrapped around as if they were visible from the
first octant of 3-space, the product of any two is the third direction vector if the letters are in
counterclockwise order around the circle in Figure 4.4, and the negative of the third if the order is
clockwise. Note also that the cross product of two collinear vectors (one of the vectors is a
constant multiple of the other) will always be zero, so the geometric interpretation of the cross
product does not apply in this case.
j j
i
i
k k
The cross product can be very useful when you need to define a vector perpendicular to two given
vectors; the most common application of this is defining a normal vector to a polygon by
computing the cross product of two edge vectors. For a triangle as shown in Figure 4.5 with
vertices A, B, and C in order counterclockwise from the “front” side of the triangle, the normal
vector can be computed by creating the two difference vectors P = C – B and Q = A – C, and
A
N = PxQ
Q
B
Figure 4.5: the normal to a triangle as the cross product of two edges
Reflection vectors
There are several times in computer graphics where it is important to calculate a vector that is a
reflection of another vector in some surface. One example is in specular light calculations; we will
see later that the brightness of specular light at a point (shiny light reflected from the surface
similarly to the way a mirror reflects light) will depend on the angle between the vector to the eye
from that point and the reflection of the vector from that point to the light. Another example is in
any model where objects hit a surface and are reflected from it, where the object’s velocity vector
after the bounce is the reflection of its incoming velocity vector. In these cases, we need to know
the normal to the surface at the point where the vector to be reflected hits the surface, and the
calculation is fairly straightforward. Figure 4.6 shows the situation we are working with on.
N
X
Outgoing Incoming
Q P
In this figure, recall that N is normalized, that is, N is a unit vector perpendicular to the surface.
Let N* be the vector that Q makes when it is projected on N. Because of the symmetry of the
figure, we have N* = –(N•P)N, so that X = P + N* = P – (N•P)N. But Q + P = 2X , so
Q = 2 ( P–(N•P)N)–P, from which Q = P – 2 (N•P)N. This is an easy calculation and the code
is left to the reader.
Transformations
In the previous two chapters we discussed transformations rather abstractly: as functions that
operate on 3D space to produce given effects. In the spirit of this chapter, however, we describe
how these functions are represented concretely for computation and, in particular, the
representation of each of the basic scaling, rotation, and transformation matrices.
To begin, we recall that we earlier introduced the notion of homogeneous coordinates for points in
3D space: we identify the 3D point (x, y, z) with the homogeneous 4D point (x, y, z, 1). The
transformations T in 3D computer graphics are all linear functions on 4D space and so may be
represented as 4x4 matrices:
This is straightforward but fairly slow; you should be able to find ways to speed it up.
Geometrically, this treats the matrices as sets of vectors: the left-hand matrix is composed of row
vectors and the right-hand matrix of column vectors. The product of the matrices is composed of
the dot products of each row matrix from the left by each column matrix on the right.
So with this background, let’s proceed to consider how the basic transformations look as matrices.
For scaling, the OpenGL function glScalef(sx, sy, sz) is expressed as
sx 0 0 0
0 sy 0 0
0 0 sz 0
0 0 0 1
1 0 0 tx
0 1 0 ty
0 0 1 tz
0 0 0 1
For rotation, we have a more complex situation. OpenGL allows you to define a rotation around
any given line with the function glRotatef(angle, x, y, z) where angle is the amount or
rotation (in degrees), and <x, y, z> is the direction vector of the line around which the rotation is to
be done. We can write a matrix for the rotations around the coordinate axes. For the rotation
around the X -axis, glRotatef(angle, 1., 0., 0.), the matrix is as follows; note that the
first component, the X -component, is not changed by this matrix.
1 0 0 0
0 cos(angle) –sin(angle) 0
0 sin(angle) cos(angle) 0
0 0 0 1
cos(angle) –sin(angle) 0 0
sin(angle) cos(angle) 0 0
0 0 1 0
0 0 0 1
For the Y -axis, there is a difference because the cross produce of the X - and Z-axes is in the
opposite direction to the Y -axis. This means that the angle relative to the Y -axis is the negative of
the angle relative to the cross product, giving us a change in the sign of the sine function. So the
matrix for glRotatef(angle, 0., 1., 0.) is:
cos(angle) 0 sin(angle) 0
0 1 0 0
-sin(angle) 0 cos(angle) 0
0 0 0 1
The formula for a rotation around an arbitrary line is more complex and is given in the OpenGL
manual, so we will not present it here.
We saw above that a line could be defined in terms of a single parameter, so it is often called a one-
dimensional space. A plane, on the other hand, is a two-dimensional space, determined by two
parameters. If we have any two non-parallel lines that meet in a single point, we recall that they
determine a plane that can be thought of as all points that are translations of the given point by
vectors that are linear combinations of the direction vectors of the two lines. Thus any plane in
space is seen as two-dimensional where each of the two lines contributes one of the dimensional
components. In practice, we usually don’t have two lines in the plane but have three points in the
plane that do not lie in a single straight line, and we get the two lines by letting each of two
different pairs of points determine a line. Because each pair of points lies in the plane, so does
each of the two lines they generate, and so we have two lines.
In more general terms, let’s consider the vector N =< A,B,C > defined as the cross product of the
two vectors determined by the two lines. Then N is perpendicular to each of the two vectors and
hence to any line in the plane. In fact, this can be taken as defining the plane: the plane is defined
by all lines through the fixed point perpendicular to N. If we take a fixed point in the plane,
(U,V,W), and a variable point in the plane, (x,y,z), we can use the dot product to express the
perpendicular relationship as
< A,B,C > • < u − U, y − V,z − W >= 0.
When we expand this dot product we see
A(u − X ) + B(y − V ) + C(z − W ) = Ax + By + Cz + (−AU − BV − CW ) = 0.
This allows us to give an equation for the plane:
Ax + By + Cz + D = 0
for an appropriate value of D. Thus the coefficients of the variables in the plane equation exactly
match the components of the vector normal to the plane—a very useful fact from time to time.
Let’s consider an example here that will illustrate both the previous section and this section. If we
consider three points A = (1.0, 2.0, 3.0), B = (2.0, 1.0, -1.0), and C = (-1.0, 2.0, 1.0), we can
easily see that they do not lie on a single straight line in 3D space. Thus these three points define a
plane; let’s calculate the plane’s equation.
To begin, the difference vectors are A–B = <-1.0, 1.0, 4.0> and B–C = <3.0, -1.0, -2.0> for the
original points, so these two vectors applied to any one of the points will determine two lines in the
plane. We then compute the cross product (B–C)x(A–B) of these two vectors through a sequence
of 2x2 determinants as we outlined above, and we get the result
i j k
−1 −2 3 −2 3 −1
det 3 −1 −2 = idet − j det + k det = <–2.0, –10.0, 2.0>
−1 4 −1 4 −1 1
−1 1 4
Thus the equation of the plane is –2X – 10Y + 2Z + D = 0, and putting in the coordinates of B we
can calculate the constant D as –12.0, giving a final equation as –2X – 10Y + 2Z – 12 = 0. Here
any point for which the plane equation yields a positive value lies on the side of the plane in the
direction the normal is facing, and any point that yields a negative value lies on the other side of the
plane.
Just as we earlier defined a way to compute the distance from a point to a line, we also want to be
able to compute the distance from a point to a plane. This will be useful when we discuss collision
detection later, and may also have other applications.
Let’s consider a plane Ax+By+Cz+D=0 with normal vector N=<A,B,C>, unit normal vector
n=<a,b,c>, an arbitrary point P=(u,v,w). Then select any point Q=(d,e,f) in the plane, and
consider the relationships shown in Figure 4.7. The diagram shows that the distance from the
point to the plane is the projection of the vector P–Q on the unit normal vector n, or (P–Q)•n. This
gives us an easy way to compute this distance, especially since we can choose the point Q any way
we wish.
Most graphics systems, including OpenGL, are based on modeling and rendering based on
polygons and polyhedra. A polygon is a plane region bounded by a sequence of directed line
segments with the property that the end of one segment is the same as the start of the next segment,
and the end of the last line segment is the start of the first segment. A polyhedron is a region of
3–space that is bounded by a set of polygons. Because polyhedra are composed of polygons, we
will focus on modeling with polygons, and this will be a large part of the basis for the modeling
chapter below.
The reason for modeling based on polygons is that many of the fundamental algorithms of graphics
have been designed for polygon operations. In particular, many of these algorithms operate by
interpolating values across the polygon; you will see this below in depth buffering, shading, and
other areas. In order to interpolate across a polygon, the polygon must be convex. Informally, a
polygon is complex if it has no indentations; formally, a polygon is complex if for any two points
in the polygon (either the interior or the boundary), the line segment between them lies entirely
within the polygon.
Because a polygon bounds a region of the plane, we can talk about the interior or exterior of the
polygon. In a convex polygon, this is straightforward because the figure is defined by its
bounding planes or lines, and we can simply determine which side of each is “inside” the figure.
If your graphics API only allows you to define convex polygons, this is all you need consider. In
general, though, polygons can be non-convex and we would like to define the concept of “inside”
for them. Because this is less simple, we look to convex figures for a starting point and notice that
if a point is inside the figure, any ray from an interior point (line extending in only one direction
from the point) must exit the figure in precisely one point, while if a point is outside the figure, if
the ray hits the polygon it must both enter and exit, and so crosses the boundary of the figure in
either 0 or 2 points. We extend this idea to general polygons by saying that a point is inside the
polygon if a ray from the point crosses the boundary of the polygon an odd number of times, and
is outside the polygon if a ray from the point crosses the boundary of the polygon an even number
of times. This is illustrated in Figure 4.8. In this figure, points A, D, E, and G are outside the
polygons and points B, D, and F are inside. Note carefully the case of point G; our definition of
inside and outside might not be intuitive in some cases.
Another way to think about convexity is in terms of linear combinations of points. We can define a
convex sum of points P0, P1, ... Pn as a sum ∑ c i Pi where each of the coefficients ci is non-
negative and the sum of the coefficients is exactly 1. If we recall that the parametric definition of a
line segment is (1-t)P0+tP1, we note that this is a convex sum. So if a polygon is convex, all
Figure 4.8: Interior and exterior points of a convex polygon (left) and two non-convex polygons
(center and right)
A convex polygon also has a broader property: any point in the polygon is a convex sum of
vertices of the polygon. Because this means that the entire interior of the polygon can be expressed
as a convex sum of the vertices, we would expect that interpolation processes such as depth
(described in an earlier chapter) and color smoothing (described in a later chapter) could be
expressed by the same convex sum of these properties for the vertices. Thus convexity is a very
important property for geometric objects in computer graphics systems.
As we suggested above, most graphics systems, and certainly OpenGL, require that all polygons
be convex in order to render them correctly. If you need to use a polygon that is not convex, you
may always subdivide it into triangles or other convex polygons and work with them instead of the
original polygon. As an alternative, OpenGL provides a facility to tesselate a polygon—divide it
into convex polygons—automatically, but this is a complex operation that we do not cover in these
notes.
Polyhedra
As we saw in the earlier chapters, polyhedra are volumes in 3D space that are bounded by
polygons. In order to work with a polyhedron you need to define the polygons that form its
boundaries. In terms of the scene graph, then, a polyhedron is a group node whose elements are
polygons. Most graphics APIs do not provide a rich set of pre-defined polyhedra that you can use
in modeling; in OpenGL, for example, you have only the Platonic solids and a few simple
polyhedral approximations of other objects (sphere, torus, etc.) A convex polyhedron is one for
which any two points in the object are connected by a line segment that is completely contained in
the object.
Because polyhedra are almost always defined in terms of polygons, we will not focus on them but
will rather focus on polygons. Thus in the next section when we talk about collision detection, the
most detailed level of testing will be to identify polygons that intersect.
Collision detection
There are times when we need to know whether two objects meet in order to understand the logic
of a particular scene, particularly when that scene involves moving objects. There are several ways
to handle collision detection, involving a little extra modeling and several stages of logic, and we
outline them here without too much detail because there isn’t any one right way to do it.
As we discuss testing below, we will need to know the actual coordinates of various points in 3D
world space. You can track the coordinates of a point as you apply the modeling transformation to
an object, but this can take a great deal of computation that we would otherwise give to the
graphics API, so this works against the approach we have been taking. But your API may have
the capability of giving you the world coordinates of a point with a simple inquiry function. In
OpenGL, for example, you can use the function glGetFloatv(GL_MODELVIEW_MATRIX) to
get the current value of the modelview matrix at any point; this returns an array of 16 real values
that is the matrix to be applied to your model at that point. If you treat this as a 4x4 matrix and
multiply it by the coordinates of any vertex, you will get the coordinates of the transformed vertex
in 3D eye space. This will give you a consistent space in which to make your tests as described
below.
In order to simplify collision detection, it is usual to start thinking of possible collisions instead of
actual collisions. Quick rejection of possible collisions will make a big difference in speeding up
handling actual collisions. One standard approach is to use a substitute object instead of the real
object, such as a sphere or a box that surrounds the object closely. These are called bounding
objects, such as bounding spheres or bounding boxes, and they are chosen for the ease of collision
testing. It is easy to see if two spheres could collide, because this happens precisely when the
distance between their centers is less than the sum of the radii of the spheres. It is also easy to see
if two rectangular boxes intersect because in this case, you can test the relative values of the larger
and smaller dimensions of each box in each direction. Of course, you must be careful that the
bounding objects are defined after all transformations are done for the original object, or you may
distort the bounding object and make the tests more difficult.
As you test for collisions, then, you start by testing for collisions between the bounding objects of
your original objects. When you find a possible collision, you must then move to more detailed
tests based on the actual objects. We will assume that your objects are defined by a polygonal
boundary, and in fact we will assume that the boundary is composed of triangles. So the next set
of tests are for possible collisions between triangles. Unless you know which triangles in one
object are closest to which triangles in another object, you may need to test all possible pairs of
triangles, one in each object, so we might start with a quick rejection of triangles.
Just as we could tell when two bounding objects were too far apart to collide, we should be able to
tell when a triangle in one object is too far from the bounding object of the other object to collide.
If that bounding object is a sphere, you could see whether the coordinates of the triangle’s vertices
(in world space) are farther from that sphere than the longest side of the triangle, for example, or if
you have more detailed information on the triangle such as its circumcenter, you could test for the
circumcenter to be farther from the sphere than the radius of the circumcircle. The circumcenter of
a triangle is the common intersection of the three perpendicular bisectors of the sides of the circle;
the circumcircle is the circle with center at the circumcenter that goes through the vertices of the
triangle. See Figure 4.9 for a sketch that illustrates this.
A B
After we have ruled out impossible triangle collisions, we must consider the possible intersection
of a triangle in one object with a triangle in the other object. In this case we work with each line
segment bounding one triangle and with the plane containing the other triangle, and we compute
the point where the line meets the plane of the triangle. If the line segment is given by the
parametric equation Q0 + t ∗(Q1− Q0) and let the plane of the triangle be Ax+By+Cz+D=0, we can
readily calculate the value of t that gives the intersection of the line and the plane. If this value of t
is not between 0 and 1, then the segment does not intersect the plane and we are finished. If the
segment does intersect the plane, we need to see if the intersection is within the triangle or not.
Once we know that the line is close enough to have a potential intersection, we move on to test
whether the point where the line meets the plane lies inside the triangle, as shown in Figure 4.10.
With the counterclockwise orientation of the triangle, any point on the inside of the triangle is to the
left (that is, in a counterclockwise direction) of the oriented edge for each edge of the triangle. The
location of this point can be characterized by the cross product of the edge vector and the vector
from the vertex to the point; if this cross product has the same orientation as the normal vector to
the triangle for each vertex, then the point is inside the triangle. If the point at the intersection of
the line segment and the plane of the triangle plane is Q, this means that we must have all of the
relations
N • ((P1− P0) × (Q − P0)) > 0
N • ((P2 − P1)× (Q − P1)) > 0
N • ((P0 − P2) × (Q − P2)) > 0
to express these relations between the point and each edge of the triangle.
N
P2
(P1-P0)x(Q-P0)
P0 Q
P1
Up to this point we have emphasized Cartesian, or rectangular, coordinates for describing 2D and
3D geometry, but there are times when other kinds of coordinate systems are most useful. The
coordinate systems we discuss here are based on angles, not distances, in at least one of their
In 2D space, we can identify any point (X , Y ) with the line segment from the origin to that point.
This identification allows us to write the point in terms of the angle Θ the line segment makes with
the positive X -axis and the distance R of the point from the origin as:
X = Rcos(Θ)
Y = Rsin(Θ)
or, inversely,
R = sqrt(X 2 + Y 2 )
Θ = arccos(X / R)
where Θ is the value between 0 and 2π that is in the right quadrant to match the signs of X and Y .
This representation (R,Θ) is known as the polar form for the point, and the use of the polar form
for all points is called the polar coordinates for 2D space. This is illustrated in the left-hand image
in Figure 4.11.
There are two alternatives to Cartesian coordinates for 3D space. Cylindrical coordinates add a
third linear dimension to 2D polar coordinates, giving the angle between the X -Z plane and the
plane through the Z-axis and the point, along with the distance from the Z-axis and the Z-value of
the point. Points in cylindrical coordinates are represented as (R,Θ,Z) with R and Θ as above and
with the Z-value as in rectangular coordinates. The right-hand image of Figure 4.11 shows the
structure of cylindrical coordinates for 3D space.
(X,Y) (X,Y,Z)
R Z
Θ
R
Θ
Cylindrical coordinates are a useful extension of a 2D polar coordinate model to 3D space. They
not particularly common in graphics modeling, but can be very helpful when appropriate. For
example, if you have a planar object that has to remain upright with respect to a vertical direction,
but the object has to rotate to face the viewer in a scene as the viewer moves around, then it would
be appropriate to model the object’s rotation using cylindrical coordinates. An example of such an
object is a billboard, as discussed later in the chapter on high-efficiency graphics techniques.
Spherical coordinates represent 3D points in terms much like the latitude and longitude on the
surface of the earth. The latitude of a point is the angle from the equator to the point, and ranges
from 90° south to 90° north. The longitude of a point is the angle from the “prime meridian” to the
point, where the prime meridian is determined by the half-plane that runs from the center of the
earth through the Greenwich Observatory just east of London, England. The latitude and longitude
valued uniquely determine any point on the surface of the earth, and any point in space can be
represented relative to the earth by determining what point on the earth’s surface meets a line from
the center of the earth to the point, and then identifying the point by the latitude and longitude of the
point on the earth’s surface and the distance to the point from the center of the earth. Spherical
Spherical coordinates can be very useful when you want to control motion to achieve smooth
changes in angles or distances around a point. They can also be useful if you have an object in
space that must constantly show the same face to the viewer as the viewer moves around; again,
this is another kind of billboard application and will be described later in these notes.
x = R cos(Φ)sin(Θ)
y = R cos(Φ)cos(Θ)
z = R sin(Φ)
Converting from rectangular to spherical coordinates is not much more difficult. Again referring to
Figure 4.9, we see that R is the diagonal of a rectangle and that the angles can be described in
terms of the trigonometric functions based on the sides. So we have the equations
R = sqrt(X 2 + Y 2 + Z 2 )
Φ= Arc sin(Z /R)
Θ = arctan( X /sqrt(X 2 + Y 2 )
Note that the inverse trigonometric function is the principle value for the longitude (Φ), and the
angle for the latitude (Θ) is chosen between 0° and 360° so that the sine and cosine of Θ match the
algebraic sign (+ or -) of the X and Y coordinates. Figure 4.12 shows a sphere showing latitude
and longitude lines and containing an inscribed rectangular coordinate system, as well as the figure
needed to make the conversion between spherical and rectangular coordinates.
(R,Θ,Φ)
R
R sin(Φ)
Φ
R cos(Φ)
Θ R cos(Φ) cos(Θ)
R cos(Φ) sin(Θ)
While our perceptions and experience are limited to three dimensions, there is no such limit to the
kind of information we may want to display with our graphics system. Of course, we cannot deal
with these higher dimensions directly, so we will have other techniques to display higher-
dimensional information. There are some techniques for developing three-dimensional information
by projecting or combining higher-dimensional data, and some techniques for adding extra non-
spatial information to 3D information in order to represent higher dimensions. We will discuss
some ideas for higher-dimensional representations in later chapters in terms of visual
communications and science applications.
Summary
This chapter has presented a number of properties of 3D analytic geometry that can be very useful
when doing computer graphics. Whether you are doing modeling, laying out the geometry of
objects or defining other properties such as normal vectors, or whether you are defining the motion
of objects in your scene or the relationship between them, it is very helpful to have experience with
the kind of mathematical or geometric behaviors that will let you represent them so that your
program can manipulate them effectively. With the tools from this chapter, you will be able to do
this comfortably and fluently.
Questions
1. If you have any two coordinate systems defined in 2D space, show how you can convert either
one into the other by applying a scaling, rotation, and translation to either one (where the
operations are defined relative to the target coordinate system).
2. Can you do the same as asked in the previous question when you have two coordinate systems
in 3D space? Why or why not? (Hint: consider right-handed and left-handed coordinate
systems.) Can you do this if you use a new kind of transformation, called a reflection, that
reverses the sign of one particular coordinate while not changing the value of any other
coordinate? What might the 3x3 matrix for a reflection look like?
3. Pick any two distinct points in 3D space, and do the hand calculations to write the parametric
equations for a line segment that joins the points. How do you modify these equations in case
you want a ray that starts at one point and goes through the other? How do you modify them
in case you want the equations for a complete line through the two points?
Exercises
4. Write the complete set of functions for converting among Cartesian, cylindrical, and polar
coordinate systems. Make these general so you can use them as utilities if you want to model
an object using any of these coordinates.
5. Model a problem using polar or cylindrical coordinates and create a display for the problem by
building triangles, quads, or polygons in these coordinates and converting to standard OpenGL
geometry using the coordinate system conversions in the previous exercise.
Experiments
6. In case you have an object with dimension higher than three, there are many ways you might
be able to project it into 3D space so it could be viewed. For example, if you look at the 4D
Work out several ways that you could project a 4D object into 3D space by working with the
coordinates of each vertex, and for each way you come up with, implement a view of the 4D
cube with that projection.
7. Write a function that implements the “point inside a triangle” test, and apply this function to test
whether a particular line of your choice intersects a particular triangle of your choice, all in 3D
space. Extend this function to test whether two triangles intersect.
Introduction
Color is a fundamental concept for computer graphics. We need to be able to define colors for our
graphics that represent good approximations of real-world colors, and we need to be able to
manipulate colors as we develop our applications.
There are many ways to specify colors, but all depend principally on the fact that the human visual
system generally responds to colors through the use of three kinds of cells in the retina of the eye.
This response is complex and includes both physical and psychological processes, but the
fundamental fact of three kinds of stimulus is maintained by all the color models in computer
graphics. For most work, the usual model is the RGB (Red, Green, Blue) color model that
matches in software the physical design of computer monitors, which are made with a pattern of
three kinds of phosphor which emit red, green, and blue light when they are excited by an electron
beam. This RGB model is used for color specification in almost all computer graphics APIs, and it
is the basis for the discussion here. There are a number of other models of color, and we discuss a
few of these in this chapter. However, we refer you to textbooks and other sources, especially
Foley et al. [FvD], for additional discussions on color models and for more complete information
on converting color representations from one model to another.
Because the computer monitor uses three kinds of phosphor, and each phosphor emits light levels
based on the energy of the electron beam that is directed at it, a common approach is to specify a
color by the level of each of the three primaries. These levels are a proportion of the maximum light
energy that is available for that primary, so an RGB color is specified by a triple (r, g, b) where
each of the three components represents the amount of that particular component in the color and
where the ordering is the red-green-blue that is implicit in the name RGB. Colors can be specified
in a number of ways in the RGB system, but in this book we will specify each component by the
proportion of the maximum intensity we want for the color. This proportion for each primary is
represented by a real number between 0.0 and 1.0, inclusive. There are other ways to represent
colors, of course. In an integer-based system that is also often used, each color component can be
represented by an integer that depends on the color depth available for the system; if you have eight
bits of color for each component, which is a common property, the integer values are in the range 0
to 255. The real-number approach is used more commonly in graphics APIs because it is more
device-independent. In either case, the number represents the proportion of the available color of
that primary hue that is desired for the pixel. Thus the higher the number for a component, the
brighter is the light in that color, so with the real-number representation, black is represented by
(0.0, 0.0, 0.0) and white by (1.0, 1.0, 1.0). The RGB primaries are represented respectively by
red (1.0, 0.0, 0.0), green (0.0, 1.0, 0.0), and blue (0.0, 0.0, 1.0); that is, colors that are fully
bright in a single primary component and totally dark in the other primaries. Other colors are a mix
of the three primaries as needed.
While we say that the real-number representation for color is more device-independent, most
graphics hardware deals with colors using integers. Floating-point values are converted to integers
to save space and to speed operations, with the exact representation and storage of the integers
depending on the number of bits per color per pixel and on other hardware design issues. This
distinction sometimes comes up in considering details of color operations in your API, but is
generally something that you can ignore. Some color systems outside the usual graphics APIs use
special kinds of color capabilities and there are additional technologies for representing and creating
these capabilities. However, the basic concept of using floating-point values for colors is the same
as for our APIs. The color-generation process itself is surprisingly complex because the monitor
or other viewing device must generate perceptually-linear values, but most hardware generates
color with exponential, not linear, properties. All these color issues are hidden from the API
programmer, however, and are managed after being translated from the API representations of the
colors, allowing API-based programs to work relatively the same across a wide range of
platforms.
In addition to dealing with the color of light, modern graphics systems add a fourth component to
the question of color. This fourth component is called “the alpha channel” because that was its
original notation [POR], and it represents the opacity of the material that is being modeled. As is
the case with color, this is represented by a real number between 0.0 (no opacity — completely
transparent) and 1.0 (completely opaque — no transparency). This is used to allow you to create
objects that you can see through at some level, and can be a very valuable tool when you want to
be able to see more than just the things at the front of a scene. However, transparency is not
determined globally by the graphics API; it is determined by compositing the new object with
whatever is already present in the Z-buffer. Thus if you want to create an image that contains many
levels of transparency, you will need to pay careful attention to the sequence in which you draw
your objects, drawing the furthest first in order to get correct attenuation of the colors of
background objects.
Principles
The basic principle for using color with graphics is simple: once you have specified a color, all the
geometry that is specified after that point will be drawn in the specified color until that color is
changed. This means that if you want different parts of your scene to be presented in different
colors you will need to re-specify the color for each distinct part, but this is not difficult.
In terms of the scene graph, color is part of the appearance node that accompanies each geometry
node. We will later see much more complex ways to determine an appearance, but for now color
is our first appearance issue. When you write your code from the scene graph, you will need to
write the code that manages the appearance before you write the code for the geometry in order to
have the correct appearance information in the system before the geometry is drawn.
Color may be represented in the appearance node in any way you wish. Below we will discuss
three different kinds of color models, and in principle you may represent color with any of these.
However, most graphics APIs support RGB color, so you may have to convert your color
specifications to RGB before you can write the appearance code. Later in this chapter we include
code to do two such conversions.
The RGB color model is associated with a geometric presentation of a color space. That space can
be represented by a cube consisting of all points (r, g, b) with each of r, g, and b having a value
that is a real number between 0 and 1. Because of the easy analogy between color triples and space
To illustrate the numeric properties of the RGB color system, we will create the edges of the color
cube as shown in Figure 5.1 below, which has been rotated to illustrate the colors more fully. To
do this, we create a small cube with a single color, and then draw a number of these cubes around
the edge of the geometric unit cube, with each small cube having a color that matches its location.
We see the origin (0,0,0) corner, farthest from the viewer, mostly by its absence because of the
black background, and the (1,1,1) corner nearest the viewer as white. The three axis directions are
the pure red, green, and blue corners. Creating this figure is discussed below in the section on
Figure 5.1: tracing the colors of the edges of the RGB cube
creating a model with a full spectrum of colors, and it would be useful to add an interior cube
within the figure shown that could be moved around the space interactively and would change
color to illustrate the color at its current position in the cube.
This figure suggests the nature of the RGB cube, but a the entire RGB cube is much more
complete. It shown from two points of view in Figure 5.2, from the white vertex and from the
black vertex, so you can see the full range of colors on the surface of the cube. Note that the three
vertices closest to the white vertex are the cyan, magenta, and yellow vertices, while the three
vertices closest to the black vertex are the red, green, and blue vertices. This illustrates the additive
nature of the RGB color model, with the colors getting lighter as the amounts of the primary colors
Figure 5.2: two views of the RGB cube — from the white (left) and black (right) corners
Color is extremely important in computer graphics, and we can get color in our image in two ways:
by directly setting the color for the objects we are drawing, or by defining properties of object
surfaces and lights and having the color generated by a lighting model. In this chapter we only
think about the colors of objects, and save the color of lights and the way light interacts with
surfaces for a later chapter on lighting and shading. In general, the behavior of a scene will reflect
both these attributes—if you have a red object and illuminate it with a blue light, your object will
seem to be essentially black, because a red object reflects no blue light and the light contains no
other color than blue.
Luminance
Luminance of a color is the color’s brightness, or the intensity of the light it represents, without
regard for its actual color. This concept is particularly meaningful for emissive colors on the
screen, because these actually correspond to the amount of light that is emitted from the screen.
The concept of luminance is important for several reasons. One is that a number of members of
any population have deficiencies in the ability to distinguish different colors, the family of so-called
color blindness problems, but are able to distinguish differences in luminance. You need to take
luminance into account when you design your displays so that these persons can make sense of
them Luminance is also important because part of the interpretation of an image deals with the
brightness of its parts, and you need to understand how to be sure that you use colors with the
right kind of luminance for your communication. Fore example, in the chapter on visual
communication we will see how we can use luminance information to get color scales that are
approximately uniform in terms of having the luminance of the color represent the numerical value
that the color is to represent.
For RGB images, luminance is quite easy to compute. Of the three primaries, green is the
brightest and so contributes most to the luminance of a color. Red is the next brightest, and blue is
the least bright. The actual luminance will vary from system to system and even from display
device to display device because of differences in the way color numbers are translated into
voltages and because of the way the phosphors respond. In general, though, we are relatively
accurate if we assume that luminance is calculated by the formula
luminance = 0.30*red + 0.59 * green 0.11*blue
so the overall brightness ratios are approximately 6:3:1 for green:red:blue.
To see the effects of constant luminance, we can pass a plane 0.30R+0.59G+0.11B+t through the
RGB color space and examine the plane it exposes in the color cube as the parameter t varies. An
example of this is shown in Figure 5.3 in both color and grayscale.
There are times when the RGB color model is not easy or natural to use. When we want to capture
a particular color, few of us think of the color in terms of the proportions of red, green, and blue
that are needed to create it. Other color models give us different ways to think about color that
make it more intuitive to specify a color. There are also some processes for which the RGB
approach does not model the reality of color production. We need to have a wider range of ways
to model color to accomodate these realities.
A more intuitive approach to color is found with two other color models: the HSV (Hue-
Saturation-Value) and HLS (Hue-Lightness-Saturation) models. These models represent color as a
hue (intuitively, a descriptive variation on a standard color such as red, or magenta, or blue, or
cyan, or green, or yellow) that is modified by setting its value (a property of the color’s darkness
or lightness) and its saturation (a property of the color’s vividness). This lets us find numerical
ways to say “the color should be a dark, vivid reddish-orange” by using a hue that is to the red side
of yellow, has a relatively low value, and has a high saturation.
Just as there is a geometric model for RGB color space, there is one for HSV color space: a cone
with a flat top, as shown in Figure 5.4 below. The distance around the circle in degrees represents
the hue, starting with red at 0, moving to green at 120, and blue at 240. The distance from the
vertical axis to the outside edge represents the saturation, or the amount of the primary colors in the
particular color. This varies from 0 at the center (no saturation, which makes no real coloring) to 1
at the edge (fully saturated colors). The vertical axis represents the value, from 0 at the bottom (no
color, or black) to 1 at the top. So a HSV color is a triple representing a point in or on the cone,
and the “dark, vivid reddish-orange” color would be something like (40.0, 1.0, 0.7). Code to
display this geometry interactively is discussed at the end of this chapter, and writing an interactive
display program gives a much better view of the space.
Figure 5.4: three views of HSV color space: side (left), top (middle), bottom (right)
The shape of the HSV model space can be a bit confusing. The top surface represents all the
lighter colors based on the primaries, because colors getting lighter have the same behavior as
colors becoming less saturated. The reason the geometric model tapers to a point at the bottom is
that there is no real color variation near black. In this model, the gray colors are the colors with a
In the HLS color model, shown in Figure 5.5, the geometry is much the same as the HSV model
but the top surface is stretched into a second cone. Hue and saturation have the same meaning as
HSV but lightness replaces value, and lightness corresponds to the brightest colors at a value of
0.5. The rationale for the dual cone that tapers to a point at the top as well as the bottom is that as
colors get lighter, they lose their distinctions of hue and saturation in a way that is very analogous
with the way colors behave as they get darker. In some ways, the HLS model seems to come
closer to the way people talk about “tints” and “tones” when they talk about paints, with the
strongest colors at lightness 0.5 and becoming lighter (tints) as the lightness is increased towards
1.0, and becoming darker (tones) as the lightness is decreased towards 0.0. Just as in the HSV
case above, the grays form the center line of the cone with saturation 0, and the hue is
meaningless.
Figure 5.5: the HLS double cone from the red (left), green (middle), and blue(right) directions.
The top and bottom views of the HLS double cone look just like those of the HSV single cone, but
the side views of the HLS double cone are quite different. Figure 5.5 shows the HLS double cone
from the three primary-color sides: red, green, and blue respectively. The views from the top or
bottom are exactly those of the HSV cone and so are now shown here. The images in the figure do
not show the geometric shape very well; the discussion of this model in the code section below will
show you how this can be presented, and an interactive program to display this space will allow
you to interact with the model and see it more effectively in 3-space.
There are relatively simple functions that convert a color defined in one space into the same color as
defined in another space. We do not include all these functions in these notes, but they are covered
in [FvD], and the functions to convert HSV to RGB and to convert HLS to RGB are included in
the code discussions below about producing these figures.
All the color models above are based on colors presented on a computer monitor or other device
where light is emitted to the eye. Such colors are called emissive colors, and operate by adding
light at different wavelengths as different screen cells emit light. The fact that most color presented
by programming comes from a screen makes this the primary way we think about color in
computer graphics systems. This is not the only way that color is presented to us, however.
When you read these pages in print, and not on a screen, the colors you see are generated by light
that is reflected from the paper through the inks on the page. Such colors can be called
transmissive colors and operate by subtracting colors from the light being reflected from the page.
This is a totally different process and needs separate treatment. Figure 5.6 illustrates this principle.
Transmissive color processes use inks or films that transmit only certain colors while filtering out
all others. Two examples are the primary inks for printing and the films for theater lights; the
primary values for transmissive color are cyan (which transmits both blue and green), magenta
(which transmits both blue and red), and yellow (which transmits both red and green). In
principle, if you use all three inks or filters (cyan, magenta, and yellow), you should have no light
transmitted and so you should see only black. In practice, actual materials are not perfect and
allow a little off-color light to pass, so this would produce a dark and muddy gray (the thing that
printers call “process black”) so you need to add an extra “real” black to the parts that are intended
to be really black. This cyan-magenta-yellow-black model is called CMYK color and is the basis
for printing and other transmissive processes. It is used to create color separations that combine to
form full-color images as shown in Figure 5.7, which shows a full-color image (left) and the sets
of yellow, cyan, black, and magenta separations (right-hand side, clockwise from top left) that are
used to create plates to print the color image. We will not consider the CMYK model further in this
discussion because its use is in printing and similar technologies, but not in graphics
programming. We will meet this approach to color again when we discuss graphics hardcopy,
however.
The numerical color models we have discussed above are device-independent; they assume that
colors are represented by real numbers and thus that there are an infinite number of colors available
to be displayed. This is, of course, an incorrect assumption, because computers lack the capability
of any kind of infinite storage. Instead, computers use color capabilities based on the amount of
memory allocated to holding color information.
The basic model we usually adopt for computer graphics is based on screen displays and can be
called direct color. For each pixel on the screen we store the color information directly in the
screen buffer. The number of bits of storage we use for each color primary is called the color
depth for our graphics. At the time of this writing, it is probably most common to use eight bits of
color for each of the R, G, and B primaries, so we often talk about 24-bit color. In fact, it is not
uncommon to include the Z-buffer depth in the color discussion, with the model of RGBA color,
and if the system uses an 8-bit Z-buffer we might hear that it has 32-bit color. This is not
universal, however; some systems use fewer bits to store colors, and not all systems use an equal
number of bits for each color, while some systems use more bits per color. The very highest-end
professional graphics systems, for example, often use 36-bit or 48-bit color. However, some
image formats do not offer the possibility of greater depth; the GIF format, for example specifies
8-bit indexed color in its standard.
One important effect of color depth is that the exact color determined by your color computations
will not be displayed on the screen. Instead, the color is aliased by rounding it to a value that can
be represented with the color depth of your system. This can lead to serious effects called Mach
bands, shown in Figure 5.8. These occur because very small differences between adjacent color
representations are perceived visually to be significant. Because the human visual system is
extremely good at detecting edges, these differences are interpreted as strong edges and disrupt the
perception of a smooth image. You should be careful to look for Mach banding in your work, and
when you see it, you should try to modify your image to make it less visible. Figure 5.8 shows a
small image that contains some Mach bands, most visible in the tan areas toward the front of the
image.
Mach bands happen when you have regions of solid color that differ only slightly, which is not
uncommon when you have a limited number of colors. The human eye is exceptionally able to see
edges, possibly as an evolutionary step in hunting, and will identify even a small color difference
Color gamut
Color is not only limited by the number of colors that can be displayed, but also by limitations in
the technology of display devices. No matter what display technology you use—phosphates on a
video-based screen, ink on paper, or LCD cells on a flat panel—there is a limited range of color
that the technology can present. This is also true of more traditional color technologies, such as
color film. The range of a device is called its color gamut, and we must realize that the gamut of
our devices limits our ability to represent certain kinds of images. A significant discussion of the
color gamut of different devices is beyond the scope of the content we want to include for the first
course in computer graphics, but it is important to realize that there are serious limitations on the
colors you can produce on most devices.
In most graphics APIs, color can be represented as more than just a RGB triple; it can also include
a blending level (sometimes thought of as a transparency level) so that anything with this color will
have a color blending property. Thus color is represented by a quadruple (r,g,b,a) and the color
model that includes blending is called the RGBA model. The transparency level a for a color is
called the alpha value, and its value is a number between 0.0 and 1.0 that is actually a measure of
opacity instead of transparency. That is, if you use standard kinds of blending functions and if the
alpha value is 1.0, the color is completely opaque, but in the same situation if the alpha value is
0.0, the color is completely transparent. However, we are using the term “transparent” loosely
here, because the real property represented by the alpha channel is blending, not transparency. The
alpha channel was invented to permit image compositing [POR] in which an image could be laid
over another image and have part of the underlying image show through. So while we may say (or
sometimes even think) “transparent” we really mean blended.
This difference between blended and transparent colors can be very significant. If we think of
transparent colors, we are modeling the logical equivalent of colored glass. This kind of material
embodies transmissive, not emissive, colors — only certain wavelengths are passed through,
while the rest are absorbed. But this is not the model that is used for the alpha value; blended
colors operate by averaging emissive RGB colors, which is the opposite of the transmissive model
implied by transparency. The difference can be important in creating the effects you need in an
image. There is an additional issue to blending because averaging colors in RGB space may not
result in the intermediate colors you would expect; the RGB color model is one of the worse color
models for perceptual blending but we have no real choice in most graphics APIs.
Blending creates some significant challenges if we want to create the impression of transparency.
To begin, we make the simple observation that is something is intended to seem transparent to
some degree, you must be able to see things behind it. This suggests a simple first step: if you are
working with objects having their alpha color component less than 1.0, it is useful and probably
important to allow the drawing of things that might be behind these objects. To do that, you
should draw all solid objects (objects with alpha component equal to 1.0) before drawing the
things you want to seem transparent, turn off the depth test while drawing items with blended
colors, and turn the depth test back on again after drawing them. This at least allows the
possibility that some concept of transparency is allowed.
But it may not be enough to do this, and in fact this attempt at transparency may lead to more
confusing images than leaving the depth test intact. Let us consider the case that that you have
eye C1 C2 C3
When we draw the first object, the frame buffer will have color C1; no other coloring is involved.
When we draw the second object on top of the first, the frame buffer will have color
0.5*C1+0.5*C2, because the foreground (C2) has alpha 0.5 and the background (C1) is
included with weight 0.5 = 1-0.5. Finally, when the third object is drawn on top of the
others, the color will be
0.5*C3+0.5*(0.5*C1+0.5*C2) , or 0.5*C3+0.25*C2+0.25*C1.
That is, the color of the most recent object drawn is emphasized much more than the color of the
other objects. This shows up clearly in the right-hand part of Figure 5.10 below, where the red
square is drawn after the other two squares. On the other hand, if you had drawn object three
before object 2, and object 2 before object 1, the color would have been
0.5*C1+0.25*C2+0.25*C3,
so the order in which you draw things, not the order in which they are placed in space, determines
the color.
But this again emphasizes a difference between blending and transparency. If we were genuinely
modeling transparency, it would not make any difference which object were placed first and which
last; each would subtract light in a way that is independent of its order. So this represents another
challenge if you would want to create an illusion of transparency with more than one non-solid
object.
The problem with the approaches above, and with the results shown in Figure 5.10 below, is that
the most recently drawn object is not necessarily the object that is nearest the eye. Our model of
blending actually works fairly well if the order of drawing is back-to-front in the scene. If we
consider the effect of actual partial transparency, we see that the colors of objects farther away
from the eye really are of less importance in the final scene than nearer colors. So if we draw the
objects in back-to-front order, our blending will model transparency much better. We will address
this with an example later in this chapter.
Indexed color
On some systems, the frame buffer is not large enough to handle three bytes per pixel in the
display. This was rather common on systems before the late 1990s and such systems are still
supported by graphics APIs. In these systems, we have what is called indexed color, where the
frame buffer stores a single integer value per pixel, and that value is an index into an array of RGB
color values called the color table. Typically the integer is simply an unsigned byte and there are
256 colors available to the system, and it is up to the programmer to define the color table the
application is to use.
Besides the extra computational difficulties caused by having to use color table entries instead of
actual RGB values in a scene, systems with indexed color are very vulnerable to color aliasing
problems. Mach banding is one such color aliasing problem, as are color approximations when
pseudocolor is used in scientific applications.
Besides the stereo viewing technique with two views from different points, there are other
techniques for doing 3D viewing that do not require an artificial eye convergence. When we
discuss texture maps in a later chapter, we will describe a 1D texture map technique that colors 3D
images more red in the nearer parts and more blue in the more distant parts. An example of this is
shown in Figure 1.10. This makes the images self-converge when you view them through a pair
of ChromaDepth™ glasses, as we will describe there, so more people can see the spatial properties
of the image, and it can be seen from anywhere in a room. There are also more specialized
techniques such as creating alternating-eye views of the image on a screen with an overscreen that
can be given alternating polarization and viewing them through polarized glasses that allow each
eye to see only one screen at a time, or using dual-screen technologies such as head-mounted
displays. The extension of the techniques above to these more specialized technologies is
straightforward and is left to your instructor if such technologies are available.
There is another interesting technique for creating full-color images that your user can view in 3D.
It involves the red/blue glasses that are familiar to some of us from the days of 3D movies in the
1950s and that you may sometimes see for 3D comic books or the like. However, most of those
were grayscale images, and the technique we will present here works for full-color images.
The images we will describe are called anaglyphs. For these we will generate images for both the
left and right eyes, and will combine the two images by using the red information from the left-eye
image and the blue and green information from the right-eye image as shown in Figure 5.11. The
resulting image will look similar to that shown in Figure 5.12, but when viewed through red/blue
Some examples
If you were to draw a piece of the standard coordinate planes and to use colors with alpha less than
1.0 for the planes, you would be able to see through each coordinate plane to the other planes as
though the planes were made of some partially-transparent plastic. We have modeled a set of three
squares, each lying in a coordinate plane and centered at the origin, and each defined as having a
rather low alpha value of 0.5 so that the other squares are supposed to show through. In this
section we consider the effects of a few different drawing options on this view.
Figure 5.12: the partially transparent coordinates planes (left); the same coordinate planes fully
transparent but with same alpha (center); the same coordinate planes with adjusted alpha (right)
However, in the image in the center of Figure 5.12, we have disabled the depth test, and this
presents a more problematic situation. In this case, the result is something much more like
transparent planes, but the transparency is very confusing because the last plane drawn, the red
plane, always seems to be on top because its color is the brightest. This figure shows that the
OpenGL attempt at transparency is not necessarily a desirable property; it is quite difficult to get
information about the relationship of the planes from this image. Thus one would want to be
careful with the images one would create whenever one chose to work with transparent or blended
images. This figure is actually created by exactly the same code as the one above with blending
disabled instead of enabled.
Finally, we change the alpha values of the three squares to account for the difference between the
weights in the final three-color section. Here we use 1.0 for the first color (blue), 0.5 for the
second color (green) but only 0.33 for red, and we see that this final image, the right-hand image
in Figure 5.12, has the following color weights in its various regions:
• 0.33 for each of the colors in the shared region,
• 0.5 for each of blue and green in the region they share,
• 0.33 each for red and green in the region they share,
• 0.33 for red and 0.67 for blue in the region they share,
• the original alpha values for the regions where there is only one color.
Note that the “original alpha values” gives us a solid blue, a fairly strong green, and a weak read as
stand-alone colors. This gives us a closer approximation to the appearance actual transparency for
these three colors, with a particular attention to the clear gray in the area they all cover, but there are
Let’s look at this example again from the point of view of depth-sorting the things we will draw.
In this case, the three planes intersect each other and must be subdivided into four pieces each so
that there is no overlap. Because there is no overlap of the parts, we can sort them so that the
pieces farther from the eye will be drawn first. This allows us to draw in back-to-front order,
where the blending provides a better model of how transparency operates. Figure 5.13 shows
how this would work. The technique of adjusting your model is not always as easy as this,
because it can be difficult to subdivide parts of a figure, but this shows its effectiveness.
There is another issue with depth-first drawing, however. If you are creating a scene that permits
the user either to rotate the eye point around your model or to rotate parts of your model, then the
model will not always have the same parts closest to the eye. In this case, you will need to use a
feature of your graphics API to identify the distance of each part from the eyepoint. This is usually
done by rendering a point in each part in the background and getting its Z-value with the current
eye point. This is a more advanced operation than we are now ready to discuss, so we refer you to
the manuals for your API to see if it is supported and, if so, how it works.
Figure 5.13: the partially transparent planes broken into quadrants and drawn back-to-front
As you examine this figure, note that although each of the three planes has the same alpha value of
0.5, the difference in luminance between the green and blue colors is apparent in the way the plane
with the green in front looks different from the plane with the blue (or the red, for that matter) in
front. This goes back to the difference in luminance between colors that we discussed earlier in the
chapter.
Color in OpenGL
OpenGL uses the RGB and RGBA color models with real-valued components. These colors
follow the RGB discussion above very closely, so there is little need for any special comments on
color itself in OpenGL. Instead, we will discuss blending in OpenGL and then will give some
examples of code that uses color for its effects.
Specifying colors
In OpenGL, colors are specified with the glColor*(...) functions. These follow the usual
OpenGL pattern of including information on the parameters they will take, including a dimension,
the type of data, and whether the data is scalar or vector. Specifically we will see functions such as
glColor3f(r, g, b) — 3 real scalar color parameters, or
glColor4fv(r, g, b, a) — 4 real color parameters in a vector.