Introduction To Computer Graphics
Introduction To Computer Graphics
TO COMPUTER
GRAPHICS
DAVID J. ECK
Introduction to Computer Graphics
Version 1.2, January 2018
David J. Eck
Hobart and William Smith Colleges
c
2015–2018, David J. Eck
Preface ix
1 Introduction 1
1.1 Painting and Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Elements of 3D Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Hardware and Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Two-Dimensional Graphics 11
2.1 Pixels, Coordinates, and Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Pixel Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Real-number Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 Aspect Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.4 Color Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Basic Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Stroke and Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 Polygons, Curves, and Paths . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Viewing and Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.3 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.4 Combining Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.5 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.6 Shear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.7 Window-to-Viewport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.8 Matrices and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Hierarchical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.1 Building Complex Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 Scene Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.3 The Transform Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5 Java Graphics2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.1 Graphics2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.2 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5.3 Stroke and Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.5.4 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5.5 BufferedImage and Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6 HTML Canvas Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6.1 The 2D Graphics Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
iii
iv CONTENTS
2.6.2 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.6.3 Stroke and Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6.4 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.6.5 Auxiliary Canvases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.6.6 Pixel Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.6.7 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.7 SVG: A Scene Description Language . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.7.1 SVG Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.7.2 Shapes, Styles, and Transforms . . . . . . . . . . . . . . . . . . . . . . . . 69
2.7.3 Polygons and Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.7.4 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.7.5 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
APPENDICES
B Blender 355
B.1 Blender Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
B.1.1 The 3D View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
B.1.2 Adding and Transforming Objects . . . . . . . . . . . . . . . . . . . . . . 357
B.1.3 Edit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
B.1.4 Light, Material, and Texture . . . . . . . . . . . . . . . . . . . . . . . . . 361
B.1.5 Saving Your Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
B.1.6 More Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
B.2 Blender Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
B.2.1 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
B.2.2 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
B.2.3 Proportional Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
viii CONTENTS
E Glossary 411
Preface
ix
x Preface
they try to run some of the samples locally instead of over the Web. For me, Firefox can run
such examples. This issue affects only some of the examples.
∗ ∗ ∗
I have taught computer graphics every couple of years or so for almost 30 years. As the
field developed, I had to make major changes almost every time I taught the course, but for
much of that time, I was able to structure the course primarily around OpenGL 1.1, a graphics
API that was in common use for an extended period. OpenGL 1.1 supported fundamental
graphics concepts in a way that was fairly easy to use. OpenGL is still widely supported, but,
for various reasons, the parts of it that were easy to use have been officially dropped from
the latest versions (although they are in practice supported on most desktop computers). The
result is a much more powerful API but one that is much harder to learn. In particular, modern
OpenGL in its pure form does not make for a good introduction to graphics programming.
My approach in this book is to use a subset of OpenGL 1.1 to introduce the fundamental
concepts of three-dimensional graphics. I then go on to cover WebGL—a version of OpenGL
that runs in a web browser—as an example of the more modern approach to computer graph-
ics. While OpenGL makes up the major foundation for the course, the real emphasis is on
fundamental concepts such as geometric modeling and transformations; hierarchical modeling
and scene graphs; color, lighting, and textures; and animation.
Chapter 1 is a short overview of computer graphics. It introduces many concepts that will
be covered in much more detail in the rest of the book.
Chapter 2 covers two-dimensional graphics in Java, JavaScript, and SVG, with an emphasis
on ideas such as transformations and scene graphs that carry over to three dimensions.
Chapter 3 and Chapter 4 cover OpengGL 1.1. While OpenGL 1.1 is fairly primitive by
today’s standard, it includes many basic features that are still fundamental to three-dimensional
computer graphics, in a form that is an easier starting point for people new to 3D graphics.
Only part of the API is covered.
Chapter 5 covers three.js, a higher-level 3D graphics API for Web graphics using JavaScript.
This chapter shows how fundamental concepts can be used in a higher-level interface.
Chapter 6 and Chapter 7 cover WebGL, a modern version of OpenGL for graphics on the
Web. WebGL is very low-level, and it requires the programmer to write “shader programs” to
implement many features that are built into OpenGL 1.1. Looking at the implementation is an
opportunity to understand more deeply how computers actually make 3D images.
And Chapter 8 looks briefly at some advanced techniques that are not possible in OpenGL.
Appendix A contains brief introductions to three programming languages that are used in the
book: Java, C, and JavaScript. Appendix B is meant to get readers started with the most basic
uses of Blender, a sophisticated 3D modeling program. I have found that introducing students
to Blender is a good way to help them develop their three-dimensional intuition. Appendix C
contains even briefer introductions to two 2D graphics programs, Gimp and Inkscape.
∗ ∗ ∗
Professor David J. Eck
Department of Mathematics and Computer Science
Hobart and William Smith Colleges
300 Pulteney Street
Geneva, New York 14456, USA
Email: [email protected]
WWW: http://math.hws.edu/eck/
January, 2018
Chapter 1
Introduction
The term “computer graphics” refers to anything involved in the creation or manipu-
lation of images on computer, including animated images. It is a very broad field, and one in
which changes and advances seem to come at a dizzying pace. It can be difficult for a beginner
to know where to start. However, there is a core of fundamental ideas that are part of the
foundation of most applications of computer graphics. This book attempts to cover those foun-
dational ideas, or at least as many of them as will fit into a one-semester college-level course.
While it is not possible to cover the entire field in a first course—or even a large part of it—this
should be a good place to start.
This short chapter provides an overview and introduction to the material that will be covered
in the rest of the book, without going into a lot of detail.
1
2 CHAPTER 1. INTRODUCTION
the pixels on the screen will be changed to match, and the displayed image will change.
A computer screen used in this way is the basic model of raster graphics. The term
“raster” technically refers to the mechanism used on older vacuum tube computer monitors:
An electron beam would move along the rows of pixels, making them glow. The beam was
moved across the screen by powerful magnets that would deflect the path of the electrons. The
stronger the beam, the brighter the glow of the pixel, so the brightness of the pixels could be
controlled by modulating the intensity of the electron beam. The color values stored in the
frame buffer were used to determine the intensity of the electron beam. (For a color screen,
each pixel had a red dot, a green dot, and a blue dot, which were separately illuminated by the
beam.)
A modern flat-screen computer monitor is not a raster in the same sense. There is no
moving electron beam. The mechanism that controls the colors of the pixels is different for
different types of screen. But the screen is still made up of pixels, and the color values for all
the pixels are still stored in a frame buffer. The idea of an image consisting of a grid of pixels,
with numerical color values for each pixel, defines raster graphics.
∗ ∗ ∗
Although images on the computer screen are represented using pixels, specifying individual
pixel colors is not always the best way to create an image. Another way is to specify the basic
geometric objects that it contains, shapes such as lines, circles, triangles, and rectangles. This
is the idea that defines vector graphics: Represent an image as a list of the geometric shapes
that it contains. To make things more interesting, the shapes can have attributes, such as
the thickness of a line or the color that fills a rectangle. Of course, not every image can be
composed from simple geometric shapes. This approach certainly wouldn’t work for a picture
of a beautiful sunset (or for most any other photographic image). However, it works well for
many types of images, such as architectural blueprints and scientific illustrations.
In fact, early in the history of computing, vector graphics was even used directly on computer
screens. When the first graphical computer displays were developed, raster displays were too
slow and expensive to be practical. Fortunately, it was possible to use vacuum tube technology
in another way: The electron beam could be made to directly draw a line on the screen, simply
by sweeping the beam along that line. A vector graphics display would store a display list
of lines that should appear on the screen. Since a point on the screen would glow only very
briefly after being illuminated by the electron beam, the graphics display would go through the
display list over and over, continually redrawing all the lines on the list. To change the image,
it would only be necessary to change the contents of the display list. Of course, if the display
list became too long, the image would start to flicker because a line would have a chance to
visibly fade before its next turn to be redrawn.
But here is the point: For an image that can be specified as a reasonably small number of
geometric shapes, the amount of information needed to represent the image is much smaller
using a vector representation than using a raster representation. Consider an image made up
of one thousand line segments. For a vector representation of the image, you only need to store
the coordinates of two thousand points, the endpoints of the lines. This would take up only a
few kilobytes of memory. To store the image in a frame buffer for a raster display would require
much more memory. Similarly, a vector display could draw the lines on the screen more quickly
than a raster display could copy the the same image from the frame buffer to the screen. (As
soon as raster displays became fast and inexpensive, however, they quickly displaced vector
displays because of their ability to display all types of images reasonably well.)
∗ ∗ ∗
1.1. PAINTING AND DRAWING 3
The divide between raster graphics and vector graphics persists in several areas of computer
graphics. For example, it can be seen in a division between two categories of programs that
can be used to create images: painting programs and drawing programs. In a painting
program, the image is represented as a grid of pixels, and the user creates an image by assigning
colors to pixels. This might be done by using a “drawing tool” that acts like a painter’s brush,
or even by tools that draw geometric shapes such as lines or rectangles. But the point in a
painting program is to color the individual pixels, and it is only the pixel colors that are saved.
To make this clearer, suppose that you use a painting program to draw a house, then draw a
tree in front of the house. If you then erase the tree, you’ll only reveal a blank background, not
a house. In fact, the image never really contained a “house” at all—only individually colored
pixels that the viewer might perceive as making up a picture of a house.
In a drawing program, the user creates an image by adding geometric shapes, and the image
is represented as a list of those shapes. If you place a house shape (or collection of shapes making
up a house) in the image, and you then place a tree shape on top of the house, the house is
still there, since it is stored in the list of shapes that the image contains. If you delete the tree,
the house will still be in the image, just as it was before you added the tree. Furthermore, you
should be able to select one of the shapes in the image and move it or change its size, so drawing
programs offer a rich set of editing operations that are not possible in painting programs. (The
reverse, however, is also true.)
A practical program for image creation and editing might combine elements of painting and
drawing, although one or the other is usually dominant. For example, a drawing program might
allow the user to include a raster-type image, treating it as one shape. A painting program
might let the user create “layers,” which are separate images that can be layered one on top of
another to create the final image. The layers can then be manipulated much like the shapes in
a drawing program (so that you could keep both your house and your tree in separate layers,
even if in the image of the house is in back of the tree).
Two well-known graphics programs are Adobe Photoshop and Adobe Illustrator. Photoshop
is in the category of painting programs, while Illustrator is more of a drawing program. In
the world of free software, the GNU image-processing program, Gimp, is a good alternative to
Photoshop, while Inkscape is a reasonably capable free drawing program. Short introductions
to Gimp and Inkscape can be found in Appendix C.
∗ ∗ ∗
The divide between raster and vector graphics also appears in the field of graphics file
formats. There are many ways to represent an image as data stored in a file. If the original
image is to be recovered from the bits stored in the file, the representation must follow some
exact, known specification. Such a specification is called a graphics file format. Some popular
graphics file formats include GIF, PNG, JPEG, and SVG. Most images used on the Web are
GIF, PNG, or JPEG. Modern web browsers also have support for SVG images.
GIF, PNG, and JPEG are basically raster graphics formats; an image is specified by storing
a color value for each pixel. GIF is an older file format, which has largely been superseded
by PNG, but you can still find GIF images on the web. (The GIF format supports animated
images, so GIFs are often used for simple animations on Web pages.) GIF uses an indexed
color model with a maximum of 256 colors. PNG can use either indexed or full 24-bit color,
while JPEG is meant for full color images.
The amount of data necessary to represent a raster image can be quite large. However,
the data usually contains a lot of redundancy, and the data can be “compressed” to reduce its
size. GIF and PNG use lossless data compression, which means that the original image
4 CHAPTER 1. INTRODUCTION
can be recovered perfectly from the compressed data. JPEG uses a lossy data compression
algorithm, which means that the image that is recovered from a JPEG file is not exactly the
same as the original image; some information has been lost. This might not sound like a good
idea, but in fact the difference is often not very noticeable, and using lossy compression usually
permits a greater reduction in the size of the compressed data. JPEG generally works well for
photographic images, but not as well for images that have sharp edges between different colors.
It is especially bad for line drawings and images that contain text; PNG is the preferred format
for such images.
SVG, on the other hand, is fundamentally a vector graphics format (although SVG im-
ages can include raster images). SVG is actually an XML-based language for describing two-
dimensional vector graphics images. “SVG” stands for “Scalable Vector Graphics,” and the
term “scalable” indicates one of the advantages of vector graphics: There is no loss of quality
when the size of the image is increased. A line between two points can be represented at any
scale, and it is still the same perfect geometric line. If you try to greatly increase the size of
a raster image, on the other hand, you will find that you don’t have enough color values for
all the pixels in the new image; each pixel from the original image will be expanded to cover a
rectangle of pixels in the scaled image, and you will get multi-pixel blocks of uniform color. The
scalable nature of SVG images make them a good choice for web browsers and for graphical
elements on your computer’s desktop. And indeed, some desktop environments are now using
SVG images for their desktop icons.
∗ ∗ ∗
A digital image, no matter what its format, is specified using a coordinate system. A
coordinate system sets up a correspondence between numbers and geometric points. In two
dimensions, each point is assigned a pair of numbers, which are called the coordinates of the
point. The two coordinates of a point are often called its x -coordinate and y-coordinate,
although the names “x” and “y” are arbitrary.
A raster image is a two-dimensional grid of pixels arranged into rows and columns. As
such, it has a natural coordinate system in which each pixel corresponds to a pair of integers
giving the number of the row and the number of the column that contain the pixel. (Even in
this simple case, there is some disagreement as to whether the rows should be numbered from
top-to-bottom or from bottom-to-top.)
For a vector image, it is natural to use real-number coordinates. The coordinate system for
an image is arbitrary to some degree; that is, the same image can be specified using different
coordinate systems. I do not want to say a lot about coordinate systems here, but they will be a
major focus of a large part of the book, and they are even more important in three-dimensional
graphics than in two dimensions.
of more basic shapes, if it is not itself considered to be basic. To make a two-dimensional image
of the scene, the scene is projected from three dimensions down to two dimensions. Projection
is the equivalent of taking a photograph of the scene. Let’s look at how it all works in a little
more detail.
First, the geometry. . . . We start with an empty 3D space or “world.” Of course, this
space exists only conceptually, but it’s useful to think of it as real and to be able to visualize it
in your mind. The space needs a coordinate system that associates each point in the space with
three numbers, usually referred to as the x, y, and z coordinates of the point. This coordinate
system is referred to as “world coordinates.”
We want to build a scene inside the world, made up of geometric objects. For example,
we can specify a line segment in the scene by giving the coordinates of its two endpoints,
and we can specify a triangle by giving the coordinates of its three vertices. The smallest
building blocks that we have to work with, such as line segments and triangles, are called
geometric primitives. Different graphics systems make different sets of primitive available,
but in many cases only very basic shapes such as lines and triangles are considered primitive.
A complex scene can contain a large number of primitives, and it would be very difficult to
create the scene by giving explicit coordinates for each individual primitive. The solution,
as any programmer should immediately guess, is to chunk together primitives into reusable
components. For example, for a scene that contains several automobiles, we might create a
geometric model of a wheel. An automobile can be modeled as four wheels together with
models of other components. And we could then use several copies of the automobile model in
the scene. Note that once a geometric model has been designed, it can be used as a component
in more complex models. This is referred to as hierarchical modeling .
Suppose that we have constructed a model of a wheel out of geometric primitives. When
that wheel is moved into position in the model of an automobile, the coordinates of all of its
primitives will have to be adjusted. So what exactly have we gained by building the wheel? The
point is that all of the coordinates in the wheel are adjusted in the same way. That is, to place
the wheel in the automobile, we just have to specify a single adjustment that is applied to the
wheel as a whole. The type of “adjustment” that is used is called a geometric transform (or
geometric transformation). A geometric transform is used to adjust the size, orientation, and
position of a geometric object. When making a model of an automobile, we build one wheel.
We then apply four different transforms to the wheel model to add four copies of the wheel
to the automobile. Similarly, we can add several automobiles to a scene by applying different
transforms to the same automobile model.
The three most basic kinds of geometric transform are called scaling , rotation, and trans-
lation. A scaling transform is used to set the size of an object, that is, to make it bigger or
smaller by some specified factor. A rotation transform is used to set an object’s orientation,
by rotating it by some angle about some specific axis. A translation transform is used to set
the position of an object, by displacing it by a given amount from its original position. In
this book, we will meet these transformations first in two dimensions, where they are easier to
understand. But it is in 3D graphics that they become truly essential.
∗ ∗ ∗
Next, appearance. . . . Geometric shapes by themselves are not very interesting. You
have to be able to set their appearance. This is done by assigning attributes to the geometric
objects. An obvious attribute is color, but getting a realistic appearance turns out to be a lot
more complicated than simply specifying a color for each primitive. In 3D graphics, instead of
color, we usually talk about material . The term material here refers to the properties that
6 CHAPTER 1. INTRODUCTION
determine the intrinsic visual appearance of a surface. Essentially, this means how the surface
interacts with light that hits the surface. Material properties can include a basic color as well
as other properties such as shininess, roughness, and transparency.
One of the most useful kinds of material property is a texture. In most general terms,
a texture is a way of varying material properties from point-to-point on a surface. The most
common use of texture is to allow different colors for different points. This is done by using
a 2D image as a texture, which can be applied to a surface so that the image looks like it is
“painted” onto the surface. However, texture can also refer to changing values for things like
transparency or “bumpiness.” Textures allow us to add detail to a scene without using a huge
number of geometric primitives; instead, you can use a smaller number of textured primitives.
A material is an intrinsic property of an object, but the actual appearance of the object
also depends on the environment in which the object is viewed. In the real world, you don’t
see anything unless there is some light in the environment. The same is true in 3D graphics:
you have to add simulated lighting to a scene. There can be several sources of light in a
scene. Each light source can have its own color, intensity, and direction or position. The light
from those sources will then interact with the material properties of the objects in the scene.
Support for lighting in a graphics system can range from fairly simple to very complex and
computationally intensive.
∗ ∗ ∗
Finally, the image. . . . In general, the ultimate goal of 3D graphics is to produce 2D
images of the 3D world. The transformation from 3D to 2D involves viewing and projection.
The world looks different when seen from different points of view. To set up a point of view,
we need to specify the position of the viewer and the direction that the viewer is looking. It
is also necessary to specify an “up” direction, a direction that will be pointing upwards in the
final image. This can be thought of as placing a “virtual camera” into the scene. Once the
view is set up, the world as seen from that point of view can be projected into 2D. Projection
is analogous to taking a picture with the camera.
The final step in 3D graphics is to assign colors to individual pixels in the 2D image. This
process is called rasterization, and the whole process of producing an image is referred to as
rendering the scene.
In many cases the ultimate goal is not to create a single image, but to create an animation,
consisting a sequence of images that show the world at different times. In an animation, there
are small changes from one image in the sequence to the next. Almost any aspect of a scene
can change during an animation, including coordinates of primitives, transformations, material
properties, and the view. For example, an object can be made to grow over the course of an
animation by gradually increasing the scale factor in a scaling transformation that is applied to
the object. And changing the view during an animation can give the effect of moving or flying
through the scene. Of course, it can be difficult to compute the necessary changes. There are
many techniques to help with the computation. One of the most important is to use a “physics
engine,” which computes the motion and interaction of objects based on the laws of physics.
(However, you won’t learn about physics engines in this book.)
applications. (Today, you probably have more graphics computing power on your smart phone.)
OpenGL is supported by the graphics hardware in most modern computing devices, including
desktop computers, laptops, and many mobile devices. This section will give you a bit of
background about the history of OpenGL and about the graphics hardware that supports it.
In the first desktop computers, the contents of the screen were managed directly by the
CPU. For example, to draw a line segment on the screen, the CPU would run a loop to set the
color of each pixel that lies along the line. Needless to say, graphics could take up a lot of the
CPU’s time. And graphics performance was very slow, compared to what we expect today. So
what has changed? Computers are much faster in general, of course, but the big change is that
in modern computers, graphics processing is done by a specialized component called a GPU ,
or Graphics Processing Unit. A GPU includes processors for doing graphics computations; in
fact, it can include a large number of such processors that work in parallel to greatly speed up
graphical operations. It also includes its own dedicated memory for storing things like images
and lists of coordinates. GPU processors have very fast access to data that is stored in GPU
memory—much faster than their access to data stored in the computer’s main memory.
To draw a line or perform some other graphical operation, the CPU simply has to send
commands, along with any necessary data, to the GPU, which is responsible for actually car-
rying out those commands. The CPU offloads most of the graphical work to the GPU, which
is optimized to carry out that work very quickly. The set of commands that the GPU under-
stands make up the API of the GPU. OpenGL is an example of a graphics API, and most GPUs
support OpenGL in the sense that they can understand OpenGL commands, or at least that
OpenGL commands can efficiently be translated into commands that the GPU can understand.
OpenGL is not the only graphics API. The best-known alternative is probably Direct3D,
a 3D graphics API used for Microsoft Windows. OpenGL is more widely available, since it is
not limited to Microsoft, but Direct3D is supported by most graphics cards, and it has often
introduced new features earlier than OpenGL.
∗ ∗ ∗
I have said that OpenGL is an API, but in fact it is a series of APIs that have been subject
to repeated extension and revision. The current version, in early 2018, is 4.6, and it is very
different from the 1.0 version from 1992. Furthermore, there is a specialized version called
OpengGL ES for “embedded systems” such as mobile phones and tablets. And there is also
WebGL, for use in Web browsers, which is basically a port of OpenGL ES 2.0. Furthermore, a
new API named Vulkan has been defined as a replacement for OpenGL; Vulkan is a complex,
low-level API designed for speed and efficiency rather than ease-of-use, and it will likely not
completely replace OpenGL for some time, if ever. It will be useful to know something about
how and why OpenGL has changed.
First of all, you should know that OpenGL was designed as a “client/server” system. The
server, which is responsible for controlling the computer’s display and performing graphics com-
putations, carries out commands issued by the client. Typically, the server is a GPU, including
its graphics processors and memory. The server executes OpenGL commands. The client is the
CPU in the same computer, along with the application program that it is running. OpenGL
commands come from the program that is running on the CPU. However, it is actually possible
to run OpenGL programs remotely over a network. That is, you can execute an application
program on a remote computer (the OpenGL client), while the graphics computations and
display are done on the computer that you are actually using (the OpenGL server).
The key idea is that the client and the server are separate components, and there is a
communication channel between those components. OpenGL commands and the data that
8 CHAPTER 1. INTRODUCTION
they need are communicated from the client (the CPU) to the server (the GPU) over that
channel. The capacity of the channel can be a limiting factor in graphics performance. Think
of drawing an image onto the screen. If the GPU can draw the image in microseconds, but it
takes milliseconds to send the data for the image from the CPU to the GPU, then the great speed
of the GPU is irrelevant—most of the time that it takes to draw the image is communication
time.
For this reason, one of the driving factors in the evolution of OpenGL has been the desire
to limit the amount of communication that is needed between the CPU and the GPU. One
approach is to store information in the GPU’s memory. If some data is going to be used several
times, it can be transmitted to the GPU once and stored in memory there, where it will be
immediately accessible to the GPU. Another approach is to try to decrease the number of
OpenGL commands that must be transmitted to the GPU to draw a given image.
OpenGL draws primitives such as triangles. Specifying a primitive means specifying coor-
dinates and attributes for each of its vertices. In the original OpenGL 1.0, a separate command
was used to specify the coordinates of each vertex, and a command was needed each time the
value of an attribute changed. To draw a single triangle would require three or more commands.
Drawing a complex object made up of thousands of triangles would take many thousands of
commands. Even in OpenGL 1.1, it became possible to draw such an object with a single
command instead of thousands. All the data for the object would be loaded into arrays, which
could then be sent in a single step to the GPU. Unfortunately, if the object was going to be
drawn more than once, then the data would have to be retransmitted each time the object was
drawn. This was fixed in OpenGL 1.5 with Vertex Buffer Objects. A VBO is a block of
memory in the GPU that can store the coordinates or attribute values for a set of vertices.
This makes it possible to reuse the data without having to retransmit it from the CPU to the
GPU every time it is used.
Similarly, OpenGL 1.1 introduced texture objects to make it possible to store several
images on the GPU for use as textures. This means that texture images that are going to
be reused several times can be loaded once into the GPU, so that the GPU can easily switch
between images without having to reload them.
∗ ∗ ∗
As new capabilities were added to OpenGL, the API grew in size. But the growth was still
outpaced by the invention of new, more sophisticated techniques for doing graphics. Some of
these new techniques were added to OpenGL, but the problem is that no matter how many
features you add, there will always be demands for new features—as well as complaints that all
the new features are making things too complicated! OpenGL was a giant machine, with new
pieces always being tacked onto it, but still not pleasing everyone. The real solution was to
make the machine programmable. With OpenGL 2.0, it became possible to write programs
to be executed as part of the graphical computation in the GPU. The programs are run on the
GPU at GPU speed. A programmer who wants to use a new graphics technique can write a
program to implement the feature and just hand it to the GPU. The OpenGL API doesn’t have
to be changed. The only thing that the API has to support is the ability to send programs to
the GPU for execution.
The programs are called shaders (although the term does’t really describe what most of
them actually do). The first shaders to be introduced were vertex shaders and fragment
shaders. When a primitive is drawn, some work has to be done at each vertex of the primitive,
such as applying a geometric transform to the vertex coodinates or using the attributes and
global lighting environment to compute the color of that vertex. A vertex shader is a program
1.3. HARDWARE AND SOFTWARE 9
that can take over the job of doing such “per-vertex” computations. Similarly, some work has
to be done for each pixel inside the primitive. A fragment shader can take over the job of
performing such “per-pixel” computations. (Fragment shaders are also called pixel shaders.)
The idea of programmable graphics hardware was very successful—so successful that in
OpenGL 3.0, the usual per-vertex and per-fragment processing was deprecated (meaning that
its use was discouraged). And in OpenGL 3.1, it was removed from the OpenGL standard,
although it is still present as an optional extension. In practice, all the original features of
OpenGL are still supported in desktop versions of OpenGL and will probably continue to be
available in the future. On the embedded system side, however, with OpenGL ES 2.0 and later,
the use of shaders is mandatory, and a large part of the OpenGL 1.1 API has been completely
removed. WebGL, the version of OpenGL for use in web browsers, is based on OpenGL ES 2.0,
and it also requires shaders to get anything at all done. Nevertheless, we will begin our study of
OpenGL with version 1.1. Most of the concepts and many of the details from that version are
still relevant, and it offers an easier entry point for someone new to 3D graphics programming.
OpenGL shaders are written in GLSL (OpenGL Shading Language). Like OpenGL itself,
GLSL has gone through several versions. We will spend some time later in the course studying
GLSL ES 1.0, the version used with WebGL 1.0 and OpenGL ES 2.0. GLSL uses a syntax
similar to the C programming language.
∗ ∗ ∗
As a final remark on GPU hardware, I should note that the computations that are done for
different vertices are pretty much independent, and so can potentially be done in parallel. The
same is true of the computations for different fragments. In fact, GPUs can have hundreds or
thousands of processors that can operate in parallel. Admittedly, the individual processors are
much less powerful than a CPU, but then typical per-vertex and per-fragment computations
are not very complicated. The large number of processors, and the large amount of parallelism
that is possible in graphics computations, makes for impressive graphics performance even on
fairly inexpensive GPUs.
10 CHAPTER 1. INTRODUCTION
Chapter 2
Two-Dimensional Graphics
With this chapter, we begin our study of computer graphics by looking at the two-
dimensional case. Things are simpler, and a lot easier to visualize, in 2D than in 3D, but most
of the ideas that are covered in this chapter will also be very relevant to 3D.
The chapter begins with four sections that examine 2D graphics in a general way, without
tying it to a particular programming language or graphics API. The coding examples in these
sections are written in pseudocode that should make sense to anyone with enough programming
background to be reading this book. In the next three sections, we will take quick looks at 2D
graphics in three particular languages: Java with Graphics2D, JavaScript with HTML <canvas>
graphics, and SVG. We will see how these languages use many of the general ideas from earlier
in the chapter.
(By the way, readers might be interested in the labs that I used when I taught a
course from this textbook in 2017. They can be found on that course’s web page, at
http://math.hws.edu/eck/cs424/index f17.html.)
11
12 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11
0 7
1 6
2 5
3 (3,5) 4
4 3 (3,5)
5 2
6 1
7 0
Note in particular that the pixel that is identified by a pair of coordinates (x,y) depends on the
choice of coordinate system. You always need to know what coordinate system is in use before
you know what point you are talking about.
Row and column numbers identify a pixel, not a point. A pixel contains many points;
mathematically, it contains an infinite number of points. The goal of computer graphics is not
really to color pixels—it is to create and manipulate images. In some ideal sense, an image
should be defined by specifying a color for each point, not just for each pixel. Pixels are an
approximation. If we imagine that there is a true, ideal image that we want to display, then
any image that we display by coloring pixels is an approximation. This has many implications.
Suppose, for example, that we want to draw a line segment. A mathematical line has no
thickness and would be invisible. So we really want to draw a thick line segment, with some
specified width. Let’s say that the line should be one pixel wide. The problem is that, unless
the line is horizontal or vertical, we can’t actually draw the line by coloring pixels. A diagonal
geometric line will cover some pixels only partially. It is not possible to make part of a pixel
black and part of it white. When you try to draw a line with black and white pixels only,
the result is a jagged staircase effect. This effect is an example of something called “aliasing.”
Aliasing can also be seen in the outlines of characters drawn on the screen and in diagonal or
curved boundaries between any two regions of different color. (The term aliasing likely comes
from the fact that ideal images are naturally described in real-number coordinates. When you
try to represent the image using pixels, many real-number coordinates will map to the same
integer pixel coordinates; they can all be considered as different names or “aliases” for the same
pixel.)
Antialiasing is a term for techniques that are designed to mitigate the effects of aliasing.
The idea is that when a pixel is only partially covered by a shape, the color of the pixel should be
a mixture of the color of the shape and the color of the background. When drawing a black line
on a white background, the color of a partially covered pixel would be gray, with the shade of
gray depending on the fraction of the pixel that is covered by the line. (In practice, calculating
this area exactly for each pixel would be too difficult, so some approximate method is used.)
Here, for example, is a geometric line, shown on the left, along with two approximations of that
line made by coloring pixels. The lines are greately magnified so that you can see the individual
pixels. The line on the right is drawn using antialiasing, while the one in the middle is not:
2.1. PIXELS, COORDINATES, AND COLORS 13
Note that antialiasing does not give a perfect image, but it can reduce the “jaggies” that are
caused by aliasing (at least when it is viewed on a normal scale).
There are other issues involved in mapping real-number coordinates to pixels. For example,
which point in a pixel should correspond to integer-valued coordinates such as (3,5)? The center
of the pixel? One of the corners of the pixel? In general, we think of the numbers as referring
to the top-left corner of the pixel. Another way of thinking about this is to say that integer
coordinates refer to the lines between pixels, rather than to the pixels themselves. But that
still doesn’t determine exactly which pixels are affected when a geometric shape is drawn. For
example, here are two lines drawn using HTML canvas graphics, shown greatly magnified. The
lines were specified to be colored black with a one-pixel line width:
The top line was drawn from the point (100,100) to the point (120,100). In canvas graphics,
integer coordinates corresponding to the lines between pixels, but when a one-pixel line is
drawn, it extends one-half pixel on either side of the infinitely thin geometric line. So for the
top line, the line as it is drawn lies half in one row of pixels and half in another row. The
graphics system, which uses antialiasing, rendered the line by coloring both rows of pixels gray.
The bottom line was drawn from the point (100.5,100.5) to (120.5,120.5). In this case, the line
lies exactly along one line of pixels, which gets colored black. The gray pixels at the ends of
the bottom line have to do with the fact that the line only extends halfway into the pixels at
its endpoints. Other graphics systems might render the same lines differently.
The interactive demo c2/pixel-magnifier.html lets you experiment with pixels and antialias-
ing. Interactive demos can be found on the web pages in the on-line version of this book. If you
have downloaded the web site, you can also find the demos in the folder named demos. (Note
that in any of the interactive demos that accompany this book, you can click the question mark
icon in the upper left for more information about how to use it.)
∗ ∗ ∗
14 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
All this is complicated further by the fact that pixels aren’t what they used to be. Pixels
today are smaller! The resolution of a display device can be measured in terms of the number
of pixels per inch on the display, a quantity referred to as PPI (pixels per inch) or sometimes
DPI (dots per inch). Early screens tended to have resolutions of somewhere close to 72 PPI.
At that resolution, and at a typical viewing distance, individual pixels are clearly visible. For a
while, it seemed like most displays had about 100 pixels per inch, but high resolution displays
today can have 200, 300 or even 400 pixels per inch. At the highest resolutions, individual
pixels can no longer be distinguished.
The fact that pixels come in such a range of sizes is a problem if we use coordinate systems
based on pixels. An image created assuming that there are 100 pixels per inch will look tiny on a
400 PPI display. A one-pixel-wide line looks good at 100 PPI, but at 400 PPI, a one-pixel-wide
line is probably too thin.
In fact, in many graphics systems, “pixel” doesn’t really refer to the size of a physical
pixel. Instead, it is just another unit of measure, which is set by the system to be something
appropriate. (On a desktop system, a pixel is usually about one one-hundredth of an inch. On
a smart phone, which is usually viewed from a closer distance, the value might be closer to
1/160 inch. Furthermore, the meaning of a pixel as a unit of measure can change when, for
example, the user applies a magnification to a web page.)
Pixels cause problems that have not been completely solved. Fortunately, they are less of a
problem for vector graphics, which is mostly what we will use in this book. For vector graphics,
pixels only become an issue during rasterization, the step in which a vector image is converted
into pixels for display. The vector image itself can be created using any convenient coordinate
system. It represents an idealized, resolution-independent image. A rasterized image is an
approximation of that ideal image, but how to do the approximation can be left to the display
hardware.
setCoordinateSystem(left,right,bottom,top)
The graphics system would then be responsible for automatically transforming the coordinates
from the specfiied coordinate system into pixel coordinates. Such a subroutine might not be
available, so it’s useful to see how the transformation is done by hand. Let’s consider the general
case. Given coordinates for a point in one coordinate system, we want to find the coordinates
for the same point in a second coordinate system. (Remember that a coordinate system is just
a way of assigning numbers to points. It’s the points that are real!) Suppose that the horizontal
and vertical limits are oldLeft, oldRight, oldTop, and oldBottom for the first coordinate system,
and are newLeft, newRight, newTop, and newBottom for the second. Suppose that a point
has coordinates (oldX,oldY ) in the first coordinate system. We want to find the coordinates
(newX,newY ) of the point in the second coordinate system
oldLeft oldRight newLeft newRight
oldTop newTop
oldBottom newBottom
-5 -5
It is not always a bad thing to use different units of length in the vertical and horizontal
directions. However, suppose that you want to use coordinates with limits left, right, bottom,
and top, and that you do want to preserve the aspect ratio. In that case, depending on the
shape of the display rectangle, you might have to adjust the values either of left and right or
of bottom and top to make the aspect ratios match:
2.1. PIXELS, COORDINATES, AND COLORS 17
-5 5 -7 -5 5 7 -5 5
5 5 8
-5 -5
We will look more deeply into geometric transforms later in the chapter, and at that time, we’ll
see some program code for setting up coordinate systems.
how those colors are chosen. This is just a fact about the way our eyes actually work; it might
have been different. Three basic colors can produce a reasonably large fraction of the set of
perceivable colors, but there are colors that you can see in the world that you will never see on
your computer screen. (This whole discussion only applies to people who actually have three
kinds of cone cell. Color blindness, where someone is missing one or more kinds of cone cell, is
surprisingly common.)
The range of colors that can be produced by a device such as a computer screen is called
the color gamut of that device. Different computer screens can have different color gamuts,
and the same RGB values can produce somewhat different colors on different screens. The color
gamut of a color printer is noticeably different—and probably smaller—than the color gamut
of a screen, which explain why a printed image probably doesn’t look exactly the same as it
did on the screen. (Printers, by the way, make colors differently from the way a screen does it.
Whereas a screen combines light to make a color, a printer combines inks or dyes. Because of
this difference, colors meant for printers are often expressed using a different set of basic colors.
A common color model for printer colors is CMYK, using the colors cyan, magenta, yellow, and
black.)
In any case, the most common color model for computer graphics is RGB. RGB colors are
most often represented using 8 bits per color component, a total of 24 bits to represent a color.
This representation is sometimes called “24-bit color.” An 8-bit number can represent 28 , or
256, different values, which we can take to be the positive integers from 0 to 255. A color is
then specified as a triple of integers (r,g,b) in that range.
This representation works well because 256 shades of red, green, and blue are about as many
as the eye can distinguish. In applications where images are processed by computing with color
components, it is common to use additional bits per color component, to avoid visual effects
that might occur due to rounding errors in the computations. Such applications might use a
16-bit integer or even a 32-bit floating point value for each color component. On the other
hand, sometimes fewer bits are used. For example, one common color scheme uses 5 bits for
the red and blue components and 6 bits for the green component, for a total of 16 bits for a
color. (Green gets an addition bit because the eye is more sensitive to green light than to red
or blue.) This “16-bit color” saves memory compared to 24-bit color and was more common
when memory was more expensive.
There are many other color models besides RGB. RGB is sometimes criticized as being
unintuitive. For example, it’s not obvious to most people that yellow is made of a combination
of red and green. The closely related color models HSV and HSL describe the same set of
colors as RGB, but attempt to do it in a more intuitive way. (HSV is sometimes called HSB,
with the “B” standing for “brightness.” HSV and HSB are exactly the same model.)
The “H” in these models stands for “hue,” a basic spectral color. As H increases, the color
changes from red to yellow to green to cyan to blue to magenta, and then back to red. The
value of H is often taken to range from 0 to 360, since the colors can be thought of as arranged
around a circle with red at both 0 and 360 degrees.
The “S” in HSV and HSL stands for “saturation,” and is taken to range from 0 to 1. A
saturation of 0 gives a shade of gray (the shade depending on the value of V or L). A saturation
of 1 gives a “pure color,” and decreasing the saturation is like adding more gray to the color.
“V” stands for “value,” and “L” stands for “lightness.” They determine how bright or dark the
color is. The main difference is that in the HSV model, the pure spectral colors occur for V=1,
while in HSL, they occur for L=0.5.
Let’s look at some colors in the HSV color model. The illustration below shows colors with
2.2. SHAPES 19
a full range of H-values, for S and V equal to 1 and to 0.5. Note that for S=V=1, you get
bright, pure colors. S=0.5 gives paler, less saturated colors. V=0.5 gives darker colors.
It’s probably easier to understand color models by looking at some actual colors and how
they are represented. The interactive demo c2/rgb-hsv.html lets you experiment with the RGB
and HSV color models.
∗ ∗ ∗
Often, a fourth component is added to color models. The fourth component is called alpha,
and color models that use it are referred to by names such as RGBA and HSLA. Alpha is not a
color as such. It is usually used to represent transparency. A color with maximal alpha value is
fully opaque; that is, it is not at all transparent. A color with alpha equal to zero is completely
transparent and therefore invisible. Intermediate values give translucent, or partly transparent,
colors. Transparency determines what happens when you draw with one color (the foreground
color) on top of another color (the background color). If the foreground color is fully opaque,
it simply replaces the background color. If the foreground color is partly transparent, then it
is blended with the background color. Assuming that the alpha component ranges from 0 to 1,
the color that you get can be computed as
new color = (alpha)*(foreground color) + (1 - alpha)*(background color)
This computation is done separately for the red, blue, and green color components. This is
called alpha blending . The effect is like viewing the background through colored glass; the
color of the glass adds a tint to the background color. This type of blending is not the only
possible use of the alpha component, but it is the most common.
An RGBA color model with 8 bits per component uses a total of 32 bits to represent a color.
This is a convenient number because integer values are often represented using 32-bit values. A
32-bit integer value can be interpreted as a 32-bit RGBA color. How the color components are
arranged within a 32-bit integer is somewhat arbitrary. The most common layout is to store the
alpha component in the eight high-order bits, followed by red, green, and blue. (This should
probably be called ARGB color.) However, other layouts are also in use.
2.2 Shapes
We have been talking about low-level graphics concepts like pixels and coordinates, but
fortunately we don’t usually have to work on the lowest levels. Most graphics systems let you
work with higher-level shapes, such as triangles and circles, rather than individual pixels. And
20 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
a lot of the hard work with coordinates is done using transforms rather than by working with
coordinates directly. In this section and the next, we will look at some of the higher-level
capabilities that are typically provided by 2D graphics APIs.
On the left are three wide lines with no cap, a round cap, and a square cap. The geometric line
segment is shown as a dotted line. (The no-cap style is called “butt.”) To the right are four
lines with different patters of dots and dashes. In the middle are three different styles of line
joins: mitered, rounded, and beveled.
∗ ∗ ∗
The basic rectangular shape has sides that are vertical and horizontal. (A tilted rectangle
generally has to be made by applying a rotation.) Such a rectangle can be specified with two
points, (x1,y1) and (x2,y2), that give the endpoints of one of the diagonals of the rectangle.
Alternatively, the width and the height can be given, along with a single base point, (x,y). In
that case, the width and height have to be positive, or the rectangle is empty. The base point
(x,y) will be the upper left corner of the rectangle if y increases from top to bottom, and it will
be the lower left corner of the rectangle if y increases from bottom to top.
2.2. SHAPES 21
height
width
Suppose that you are given points (x1,y1) and (x2,y2), and that you want to draw the rectangle
that they determine. And suppose that the only rectangle-drawing command that you have
available is one that requires a point (x,y), a width, and a height. For that command, x must
be the smaller of x1 and x2, and the width can be computed as the absolute value of x1 minus
x2. And similarly for y and the height. In pseudocode,
DrawRectangle from points (x1,y1) and (x2,y2):
x = min( x1, x2 )
y = min( y1, y2 )
width = abs( x1 - x2 )
height = abs( y1 - y2 )
DrawRectangle( x, y, width, height )
A common variation on rectangles is to allow rounded corners. For a “round rect,” the
corners are replaced by elliptical arcs. The degree of rounding can be specified by giving the
horizontal radius and vertical radius of the ellipse. Here are some examples of round rects. For
the shape at the right, the two radii of the ellipse are shown:
My final basic shape is the oval. (An oval is also called an ellipse.) An oval is a closed curve
that has two radii. For a basic oval, we assume that the radii are vertical and horizontal. An
oval with this property can be specified by giving the rectangle that just contains it. Or it can
be specified by giving its center point and the lengths of its vertical radius and its horizontal
radius. In this illustration, the oval on the left is shown with its containing rectangle and with
its center point and radii:
r1
The oval on the right is a circle. A circle is just an oval in which the two radii have the same
length.
If ovals are not available as basic shapes, they can be approximated by drawing a large
number of line segments. The number of lines that is needed for a good approximation depends
on the size of the oval. It’s useful to know how to do this. Suppose that an oval has center
point (x,y), horizontal radius r1, and vertical radius r2. Mathematically, the points on the oval
are given by
( x + r1*cos(angle), y + r2*sin(angle) )
22 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
where angle takes on values from 0 to 360 if angles are measured in degrees or from 0 to 2π if
they are measured in radians. Here sin and cos are the standard sine and cosine functions. To
get an approximation for an oval, we can use this formula to generate some number of points
and then connect those points with line segments. In pseudocode, assuming that angles are
measured in radians and that pi represents the mathematical constant π,
Draw Oval with center (x,y), horizontal radius r1, and vertical radius r2:
for i = 0 to numberOfLines:
angle1 = i * (2*pi/numberOfLines)
angle2 = (i+1) * (2*pi/numberOfLines)
a1 = x + r1*cos(angle1)
b1 = y + r2*sin(angle1)
a2 = x + r1*cos(angle2)
b2 = y + r2*sin(angle2)
Draw Line from (x1,y1) to (x2,y2)
For a circle, of course, you would just have r1 = r2. This is the first time we have used the
sine and cosine functions, but it won’t be the last. These functions play an important role in
computer graphics because of their association with circles, circular motion, and rotation. We
will meet them again when we talk about transforms in the next section.
1
1
1
-1
1
2.2. SHAPES 23
The shapes are also shown filled using the two fill rules. For the shapes in the center, the fill
rule is to color any region that has a non-zero winding number. For the shapes shown on the
right, the rule is to color any region whose winding number is odd; regions with even winding
number are not filled.
There is still the question of what a shape should be filled with. Of course, it can be filled
with a color, but other types of fill are possible, including patterns and gradients. A pattern
is an image, usually a small image. When used to fill a shape, a pattern can be repeated
horizontally and vertically as necessary to cover the entire shape. A gradient is similar in that
it is a way for color to vary from point to point, but instead of taking the colors from an
image, they are computed. There are a lot of variations to the basic idea, but there is always
a line segment along which the color varies. The color is specified at the endpoints of the line
segment, and possibly at additional points; between those points, the color is interpolated. For
other points on the line that contains the line segment, the pattern on the line segment can
be repeated, or the color of the endpoint can simply be extended. For a linear gradient, the
color is constant along lines perpendicular to the basic line segment, so you get lines of solid
color going in that direction. In a radial gradient, the color is constant along circles centered
at one of the endpoints of the line segment. And that doesn’t exhaust the possibilities. To give
you an idea what patterns and gradients can look like, here is a shape, filled with two gradients
and two patterns:
The first shape is filled with a simple linear gradient defined by just two colors, while the second
shape uses a radial gradient.
Patterns and gradients are not necessarily restricted to filling shapes. Stroking a shape is,
after all, the same as filling a band of pixels along the boundary of the shape, and that can be
done with a gradient or a pattern, instead of with a solid color.
Finally, I will mention that a string of text can be considered to be a shape for the purpose
of drawing it. The boundary of the shape is the outline of the characters. The text is drawn
by filling that shape. In some graphics systems, it is also possible to stroke the outline of the
shape that defines the text. In the following illustration, the string “Graphics” is shown, on
top, filled with a pattern and, below that, filled with a gradient and stroked with solid black:
24 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
olygons olygons
Sometimes, polygons are required to be “simple,” meaning that the polygon has no self-
intersections. That is, all the vertices are different, and a side can only intersect another
side at its endpoints. And polygons are usually required to be “planar,” meaning that all the
vertices lie in the same plane. (Of course, in 2D graphics, everything lies in the same plane, so
this is not an issue. However, it does become an issue in 3D.)
How then should we draw polygons? That is, what capabilities would we like to have in a
graphics API for drawing them. One possibility is to have commands for stroking and for filling
polygons, where the vertices of the polygon are given as an array of points or as an array of
x-coordinates plus an array of y-coordinates. In fact, that is sometimes done; for example, the
Java graphics API includes such commands. Another, more flexible, approach is to introduce
the idea of a “path.” Java, SVG, and the HTML canvas API all support this idea. A path is
a general shape that can include both line segments and curved segments. Segments can, but
don’t have to be, connected to other segments at their endpoints. A path is created by giving
a series of commands that tell, essentially, how a pen would be moved to draw the path. While
a path is being created, there is a point that represents the pen’s current location. There will
be a command for moving the pen without drawing, and commands for drawing various kinds
of segments. For drawing polygons, we need commands such as
• createPath() — start a new, empty path
• moveTo(x,y) — move the pen to the point (x,y), without adding a segment to the the
path; that is, without drawing anything
• lineTo(x,y) — add a line segment to the path that starts at the current pen location
and ends at the point (x,y), and move the pen to (x,y)
• closePath() — add a line segment from the current pen location back to the starting
point, unless the pen is already there, producing a closed path.
(For closePath, I need to define “starting point.” A path can be made up of “subpaths” A
subpath consists of a series of connected segments. A moveTo always starts a new subpath.
2.2. SHAPES 25
A closePath ends the current segment and implicitly starts a new one. So “starting point”
means the position of the pen after the most recent moveTo or closePath.)
Suppose that we want a path that represents the triangle with vertices at (100,100),
(300,100), and (200, 200). We can do that with the commands
createPath()
moveTo( 100, 100 )
lineTo( 300, 100 )
lineTo( 200, 200 )
closePath()
The closePath command at the end could be replaced by lineTo(100,100), to move the pen
back to the first vertex.
A path represents an abstract geometric object. Creating one does not make it visible on
the screen. Once we have a path, to make it visible we need additional commands for stroking
and filling the path.
Earlier in this section, we saw how to approximate an oval by drawing, in effect, a regular
polygon with a large number of sides. In that example, I drew each side as a separate line
segment, so we really had a bunch of separate lines rather than a polygon. There is no way to
fill such a thing. It would be better to approximate the oval with a polygonal path. For an oval
with center (x,y) and radii r1 and r2:
createPath()
moveTo( x + r1, y )
for i = 1 to numberOfPoints-1
angle = i * (2*pi/numberOfLines)
lineTo( x + r1*cos(angle), y + r2*sin(angle) )
closePath()
Using this path, we could draw a filled oval as well as stroke it. Even if we just want to draw
the outline of a polygon, it’s still better to create the polygon as a path rather than to draw
the line segments as separate sides. With a path, the computer knows that the sides are part of
single shape. This makes it possible to control the appearance of the “join” between consecutive
sides, as noted earlier in this section.
∗ ∗ ∗
I noted above that a path can contain other kinds of segments besides lines. For example,
it might be possible to include an arc of a circle as a segment. Another type of curve is a
Bezier curve. Bezier curves can be used to create very general curved shapes. They are fairly
intuitive, so that they are often used in programs that allow users to design curves interactively.
Mathematically, Bezier curves are defined by parametric polynomial equations, but you don’t
need to understand what that means to use them. There are two kinds of Bezier curve in
common use, cubic Bezier curves and quadratic Bezier curves; they are defined by cubic and
quadratic polynomials respectively. When the general term “Bezier curve” is used, it usually
refers to cubic Bezier curves.
A cubic Bezier curve segment is defined by the two endpoints of the segment together with
two control points. To understand how it works, it’s best to think about how a pen would
draw the curve segment. The pen starts at the first endpoint, headed in the direction of the
first control point. The distance of the control point from the endpoint controls the speed of
the pen as it starts drawing the curve. The second control point controls the direction and
speed of the pen as it gets to the second endpoint of the curve. There is a unique cubic curve
that satisfies these conditions.
26 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
The illustration above shows three cubic Bezier curve segments. The two curve segments on
the right are connected at an endpoint to form a longer curve. The curves are drawn as thick
black lines. The endpoints are shown as black dots and the control points as blue squares, with
a thin red line connecting each control point to the corresponding endpoint. (Ordinarily, only
the curve would be drawn, except in an interface that lets the user edit the curve by hand.)
Note that at an endpoint, the curve segment is tangent to the line that connects the endpoint
to the control point. Note also that there can be a sharp point or corner where two curve
segments meet. However, one segment will merge smoothly into the next if control points are
properly chosen.
This will all be easier to understand with some hands-on experience. The interactive demo
c2/cubic-bezier.html lets you edit cubic Bezier curve segments by dragging their endpoints and
control points.
When a cubic Bezier curve segment is added to a path, the path’s current pen location acts
as the first endpoint of the segment. The command for adding the segment to the path must
specify the two control points and the second endpoint. A typical command might look like
cubicCurveTo( cx1, cy1, cx2, cy2, x, y )
This would add a curve from the current location to point (x,y), using (cx1,cy1) and (cx2,cy2)
as the control points. That is, the pen leaves the current location heading towards (cx1,cy1),
and it ends at the point (x,y), arriving there from the direction of (cx2,cy2).
Quadratic Bezier curve segments are similar to the cubic version, but in the quadratic case,
there is only one control point for the segment. The curve leaves the first endpoint heading
in the direction of the control point, and it arrives at the second endpoint coming from the
direction of the control point. The curve in this case will be an arc of a parabola.
Again, this is easier to understand this with some hands-on experience. Try the interactive
demo c2/quadratic-bezier.html.
2.3 Transforms
In Section 2.1, we discussed coordinate systems and how it is possible to transform
coordinates from one coordinate system to another. In this section, we’ll look at that idea a
little more closely, and also look at how geometric transformations can be used to place graphics
objects into a coordinate system.
or “world” that we want to view, and the coordinates that we use to define the scene are called
world coordinates.
For 2D graphics, the world lies in a plane. It’s not possible to show a picture of the entire
infinite plane. We need to pick some rectangular area in the plane to display in the image.
Let’s call that rectangular area the window , or view window. A coordinate transform is used
to map the window to the viewport.
(300,100)
0 800
0
(-1,2)
3
T
600
-3
(3,-1) (700,400)
Window Viewport
In this illustration, T represents the coordinate transformation. T is a function that takes world
coordinates (x,y) in some window and maps them to pixel coordinates T(x,y) in the viewport.
(I’ve drawn the viewport and window with different sizes to emphasize that they are not the
same thing, even though they show the same objects, but in fact they don’t even exist in the
same space, so it doesn’t really make sense to compare their sizes.) In this example, as you can
check,
T(x,y) = ( 800*(x+4)/8, 600*(3-y)/6 )
Look at the rectangle with corners at (-1,2) and (3,-1) in the window. When this rectangle is
displayed in the viewport, it is displayed as the rectangle with corners T(-1,2) and T(3,-1). In
this example, T(-1,2) = (300,100) and T(3,-1) = (700,400).
We use coordinate transformations in this way because it allows us to choose a world coor-
dinate system that is natural for describing the scene that we want to display, and it is easier
to do that than to work directly with viewport coordinates. Along the same lines, suppose that
we want to define some complex object, and suppose that there will be several copies of that
object in our scene. Or maybe we are making an animation, and we would like the object to
have different positions in different frames. We would like to choose some convenient coordinate
system and use it to define the object once and for all. The coordinates that we use to define
an object are called object coordinates for the object. When we want to place the object
into a scene, we need to transform the object coordinates that we used to define the object into
the world coordinate system that we are using for the scene. The transformation that we need
is called a modeling transformation. This picture illustrates an object defined in its own
object coordinate system and then mapped by three different modeling transformations into
the world coordinate system:
28 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
M1
M2
M3
Remember that in order to view the scene, there will be another transformation that maps the
object from a view window in world coordinates into the viewport.
Now, keep in mind that the choice of a view window tells which part of the scene is shown
in the image. Moving, resizing, or even rotating the window will give a different view of the
scene. Suppose we make several images of the same car:
What happened between making the top image in this illustration and making the image on
the bottom left? In fact, there are two possibilities: Either the car was moved to the right, or
the view window that defines the scene was moved to the left. This is important, so be sure
you understand it. (Try it with your cell phone camera. Aim it at some objects, take a step
to the left, and notice what happens to the objects in the camera’s viewfinder: They move
to the right in the picture!) Similarly, what happens between the top picture and the middle
picture on the bottom? Either the car rotated counterclockwise, or the window was rotated
clockwise. (Again, try it with a camera—you might want to take two actual photos so that you
can compare them.) Finally, the change from the top picture to the one on the bottom right
could happen because the car got smaller or because the window got larger. (On your camera,
a bigger window means that you are seeing a larger field of view, and you can get that by
applying a zoom to the camera or by backing up away from the objects that you are viewing.)
There is an important general idea here. When we modify the view window, we change
the coordinate system that is applied to the viewport. But in fact, this is the same as leaving
that coordinate system in place and moving the objects in the scene instead. Except that to
get the same effect in the final image, you have to apply the opposite transformation to the
objects (for example, moving the window to the left is equivalent to moving the objects to the
2.3. TRANSFORMS 29
right). So, there is no essential distinction between transforming the window and transforming
the object. Mathematically, you specify a geometric primitive by giving coordinates in some
natural coordinate system, and the computer applies a sequence of transformations to those
coordinates to produce, in the end, the coordinates that are used to actually draw the primitive
in the image. You will think of some of those transformations as modeling transforms and some
as coordinate transforms, but to the computer, it’s all the same.
The on-line version of this section includes the live demo c2/transform-equivalence-2d.html
that can help you to understand the equivalence between modeling transformations and view-
port transformations. Read the help text in the demo for more information.
We will return to this idea several times later in the book, but in any case, you can see that
geometric transforms are a central concept in computer graphics. Let’s look at some basic types
of transformation in more detail. The transforms we will use in 2D graphics can be written in
the form
x1 = a*x + b*y + e
y1 = c*x + d*y + f
where (x,y) represents the coordinates of some point before the transformation is applied, and
(x1,y1 ) are the transformed coordinates. The transform is defined by the six constants a, b, c,
d, e, and f. Note that this can be written as a function T, where
T(x,y) = ( a*x + b*y + e, c*x + d*y + f )
A transformation of this form is called an affine transform. An affine transform has the
property that, when it is applied to two parallel lines, the transformed lines will also be parallel.
Also, if you follow one affine transform by another affine transform, the result is again an affine
transform.
2.3.2 Translation
A translation transform simply moves every point by a certain amount horizontally and a
certain amount vertically. If (x,y) is the original point and (x1,y1 ) is the transformed point,
then the formula for a translation is
x1 = x + e
y1 = y + f
where e is the number of units by which the point is moved horizontally and f is the amount by
which it is moved vertically. (Thus for a translation, a = d = 1, and b = c = 0 in the general
formula for an affine transform.) A 2D graphics system will typically have a function such as
translate( e, f )
to apply a translate transformation. The translation would apply to everything that is drawn
after the command is given. That is, for all subsequent drawing operations, e would be added
to the x-coordinate and f would be added to the y-coordinate. Let’s look at an example.
Suppose that you draw an “F” using coordinates in which the “F” is centered at (0,0). If
you say translate(4,2) before drawing the “F”, then every point of the “F” will be moved
horizontally by 4 units and vertically by 2 units before the coordinates are actually used, so
that after the translation, the “F” will be centered at (4,2):
30 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
The light gray “F” in this picture shows what would be drawn without the translation; the dark
red “F” shows the same “F” drawn after applying a translation by (4,2). The top arrow shows
that the upper left corner of the “F” has been moved over 4 units and up 2 units. Every point
in the “F” is subjected to the same displacement. Note that in my examples, I am assuming
that the y-coordinate increases from bottom to top. That is, the y-axis points up.
Remember that when you give the command translate(e,f ), the translation applies to all the
drawing that you do after that, not just to the next shape that you draw. If you apply another
transformation after the translation, the second transform will not replace the translation.
It will be combined with the translation, so that subsequent drawing will be affected by the
combined transformation. For example, if you combine translate(4,2) with translate(-1,5), the
result is the same as a single translation, translate(3,7). This is an important point, and there
will be a lot more to say about it later.
Also remember that you don’t compute coordinate transformations yourself. You just spec-
ify the original coordinates for the object (that is, the object coordinates), and you specify
the transform or transforms that are to be applied. The computer takes care of applying the
transformation to the coordinates. You don’t even need to know the equations that are used
for the transformation; you just need to understand what it does geometrically.
2.3.3 Rotation
A rotation transform, for our purposes here, rotates each point about the origin, (0,0). Every
point is rotated through the same angle, called the angle of rotation. For this purpose, angles
can be measured either in degrees or in radians. (The 2D graphics APIs that we will look at
later in this chapter use radians, but OpenGL uses degrees.) A rotation with a positive angle
rotates objects in the direction from the positive x-axis towards the positive y-axis. This is
counterclockwise in a coordinate system where the y-axis points up, as it does in my examples
here, but it is clockwise in the usual pixel coordinates, where the y-axis points down rather
than up. Although it is not obvious, when rotation through an angle of r radians about the
origin is applied to the point (x,y), then the resulting point (x1,y1 ) is given by
x1 = cos(r) * x - sin(r) * y
y1 = sin(r) * x + cos(r) * y
That is, in the general formula for an affine transform, e = f = 0, a = d = cos(r ), b = -sin(r ),
and c = sin(r ). Here is a picture that illustrates a rotation about the origin by the angle
negative 135 degrees:
2.3. TRANSFORMS 31
Again, the light gray “F” is the original shape, and the dark red “F” is the shape that results
if you apply the rotation. The arrow shows how the upper left corner of the original “F” has
been moved.
A 2D graphics API would typically have a command rotate(r ) to apply a rotation. The
command is used before drawing the objects to which the rotation applies.
Assume that angles are measured in degrees. The translation will then apply to all subsequent
drawing. But, because of the rotation command, the things that you draw after the translation
are rotated objects. That is, the translation applies to objects that have already been rotated.
An example is shown on the left in the illustration below, where the light gray “F” is the original
shape, and red “F” shows the result of applying the two transforms to the original. The original
“F” was first rotated through a 90 degree angle, and then moved 4 units to the right.
Note that transforms are applied to objects in the reverse of the order in which they are given
in the code (because the first transform in the code is applied to an object that has already
been affected by the second transform). And note that the order in which the transforms are
applied is important. If we reverse the order in which the two transforms are applied in this
example, by saying
32 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
rotate(90)
translate(4,0)
then the result is as shown on the right in the above illustration. In that picture, the original
“F” is first moved 4 units to the right and the resulting shape is then rotated through an angle
of 90 degrees about the origin to give the shape that actually appears on the screen.
For another example of applying several transformations, suppose that we want to rotate
a shape through an angle r about a point (p,q) instead of about the point (0,0). We can do
this by first moving the point (p,q) to the origin, using translate(-p,-q). Then we can do a
standard rotation about the origin by calling rotate(r ). Finally, we can move the origin back
to the point (p,q) by applying translate(p,q). Keeping in mind that we have to write the code
for the transformations in the reverse order, we need to say
translate(p,q)
rotate(r)
translate(-p,-q)
before drawing the shape. (In fact, some graphics APIs let us accomplish this transform with a
single command such as rotate(r,p,q). This would apply a rotation through the angle r about
the point (p,q).)
2.3.5 Scaling
A scaling transform can be used to make objects bigger or smaller. Mathematically, a scaling
transform simply multiplies each x-coordinate by a given amount and each y-coordinate by a
given amount. That is, if a point (x,y) is scaled by a factor of a in the x direction and by a
factor of d in the y direction, then the resulting point (x1,y1 ) is given by
x1 = a * x
y1 = d * y
If you apply this transform to a shape that is centered at the origin, it will stretch the shape
by a factor of a horizontally and d vertically. Here is an example, in which the original light
gray “F” is scaled by a factor of 3 horizontally and 2 vertically to give the final dark red “F”:
The common case where the horizontal and vertical scaling factors are the same is called
uniform scaling . Uniform scaling stretches or shrinks a shape without distorting it.
When scaling is applied to a shape that is not centered at (0,0), then in addition to being
stretched or shrunk, the shape will be moved away from 0 or towards 0. In fact, the true
description of a scaling operation is that it pushes every point away from (0,0) or pulls every
2.3. TRANSFORMS 33
point towards (0,0). If you want to scale about a point other than (0,0), you can use a sequence
of three transforms, similar to what was done in the case of rotation.
A 2D graphics API can provide a function scale(a,d ) for applying scaling transformations.
As usual, the transform applies to all x and y coordinates in subsequent drawing operations.
Note that negative scaling factors are allowed and will result in reflecting the shape as well
as possibly stretching or shrinking it. For example, scale(1,-1) will reflect objects vertically,
through the x -axis.
It is a fact that every affine transform can be created by combining translations, rotations
about the origin, and scalings about the origin. I won’t try to prove that, but c2/transforms-
2d.html is an interactive demo that will let you experiment with translations, rotations, and
scalings, and with the transformations that can be made by combining them.
I also note that a transform that is made from translations and rotations, with no scaling,
will preserve length and angles in the objects to which it is applied. It will also preserve aspect
ratios of rectangles. Transforms with this property are called “Euclidean.” If you also allow
uniform scaling, the resulting transformation will preserve angles and aspect ratio, but not
lengths.
2.3.6 Shear
We will look at one more type of basic transform, a shearing transform. Although shears
can in fact be built up out of rotations and scalings if necessary, it is not really obvious how
to do so. A shear will “tilt” objects. A horizontal shear will tilt things towards the left (for
negative shear) or right (for positive shear). A vertical shear tilts them up or down. Here is an
example of horizontal shear:
A horizontal shear does not move the x-axis. Every other horizontal line is moved to the left or
to the right by an amount that is proportional to the y-value along that line. When a horizontal
shear is applied to a point (x,y), the resulting point (x1,y1 ) is given by
x1 = x + b * y
y1 = y
for some constant shearing factor b. Similarly, a vertical shear with shearing factor c is given
by the equations
x1 = x
y1 = c * x + y
Shear is occasionally called “skew,” but skew is usually specified as an angle rather than as a
shear factor.
34 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
2.3.7 Window-to-Viewport
The last transformation that is applied to an object before it is displayed in an image is the
window-to-viewport transformation, which maps the rectangular view window in the xy-plane
that contains the scene to the rectangular grid of pixels where the image will be displayed.
I’ll assume here that the view window is not rotated; that it, its sides are parallel to the x-
and y-axes. In that case, the window-to-viewport transformation can be expressed in terms of
translation and scaling transforms. Let’s look at the typical case where the viewport has pixel
coordinates ranging from 0 on the left to width on the right and from 0 at the top to height at
the bottom. And assume that the limits on the view window are left, right, bottom, and top.
In that case, the window-to-viewport transformation can be programmed as:
scale( width / (right-left), height / (bottom-top) );
translate( -left, -top )
These should be the last transforms that are applied to a point. Since transforms are applied
to points in the reverse of the order in which they are specified in the program, they should be
the first transforms that are specified in the program. To see how this works, consider a point
(x,y) in the view window. (This point comes from some object in the scene. Several modeling
transforms might have already been applied to the object to produce the point (x,y), and that
point is now ready for its final transformation into viewport coordinates.) The coordinates (x,y)
are first translated by (-left,-top) to give (x-left,y-top). These coordinates are then multiplied
by the scaling factors shown above, giving the final coordinates
x1 = width / (right-left) * (x-left)
y1 = height / (bottom-top) * (y-top)
Note that the point (left,top) is mapped to (0,0), while the point (right,bottom) is mapped to
(width,height), which is just what we want.
There is still the question of aspect ratio. As noted in Subsection 2.1.3, if we want to force
the aspect ratio of the window to match the aspect ratio of the viewport, it might be necessary
to adjust the limits on the window. Here is pseudocode for a subroutine that will do that, again
assuming that the top-left corner of the viewport has pixel coordinates (0,0):
subroutine applyWindowToViewportTransformation (
left, right, // horizontal limits on view window
bottom, top, // vertical limits on view window
width, height, // width and height of viewport
preserveAspect // should window be forced to match viewport aspect?
)
if preserveAspect :
// Adjust the limits to match the aspect ratio of the drawing area.
displayAspect = abs(height / width);
windowAspect = abs(( top-bottom ) / ( right-left ));
if displayAspect > windowAspect :
// Expand the viewport vertically.
excess = (top-bottom) * (displayAspect/windowAspect - 1)
top = top + excess/2
bottom = bottom - excess/2
else if displayAspect < windowAspect :
// Expand the viewport horizontally.
excess = (right-left) * (windowAspect/displayAspect - 1)
right = right + excess/2
2.3. TRANSFORMS 35
This is really nice, but there is a gaping problem: Translation is not a linear transfor-
mation. To bring translation into this framework, we do something that looks a little strange
at first: Instead of representing a point in 2D as a pair of numbers (x,y), we represent it as the
triple of numbers (x,y,1). That is, we add a one as the third coordinate. It then turns out that
we can then represent rotation, scaling, and translation—and hence any affine transformation—
on 2D space as multiplication by a 3-by-3 matrix. The matrices that we need have a bottom
row containing (0,0,1). Multiplying (x,y,1) by such a matrix gives a new vector (x1,y1,1). We
ignore the extra coordinate and consider this to be a transformation of (x,y) into (x1,y1 ). For
the record, the 3-by-3 matrices for translation (Ta,b ), scaling (Sa,b ), and rotation (Rd ) in 2D
are
1 0 a a 0 0 cos(d) -sin(d) 0
Ta,b = 0 1 b Sa,b = 0 b 0 Rd= sin(d) cos(d) 0
0 0 1 0 0 1 0 0 1
You can compare multiplication by these matrices to the formulas given above for translation,
scaling, and rotation. However, you won’t need to do the multiplication yourself. For now,
the important idea that you should take away from this discussion is that a sequence of trans-
formations can be combined into a single transformation. The computer only needs to keep
track of a single matrix, which we can call the “current matrix” or “current transformation.”
To implement transform commands such as translate(a,b) or rotate(d), the computer simply
multiplies the current matrix by the matrix that represents the transform.
object has been scaled and rotated, it’s easy to use a translation to move the reference point
to any desired point in the scene. (Of course, in a particular case, you might not need all three
operations.) Remember that in the code, the transformations are specified in the opposite
order from the order in which they are applied to the object and that the transformations are
specified before drawing the object. So in the code, the translation would come first, followed
by the rotation and then the scaling. Modeling transforms are not always composed in this
order, but it is the most common usage.
The modeling transformations that are used to place an object in the scene should not
affect other objects in the scene. To limit their application to just the one object, we can
save the current transformation before starting work on the object and restore it afterwards.
How this is done differs from one graphics API to another, but let’s suppose here that there
are subroutines saveTransform() and restoreTransform() for performing those tasks. That is,
saveTransform will make a copy of the modeling transformation that is currently in effect and
store that copy. It does not change the current transformation; it merely saves a copy. Later,
when restoreTransform is called, it will retrieve that copy and will replace the current modeling
transform with the retrieved transform. Typical code for drawing an object will then have the
form:
saveTransform()
translate(dx,dy) // move object into position
rotate(r) // set the orientation of the object
scale(sx,sy) // set the size of the object
.
. // draw the object, using its natural coordinates
.
restoreTransform()
Note that we don’t know and don’t need to know what the saved transform does. Perhaps
it is simply the so-called identity transform, which is a transform that doesn’t modify the
coordinates to which it is applied. Or there might already be another transform in place, such
as a coordinate transform that affects the scene as a whole. The modeling transform for the
object is effectively applied in addition to any other transform that was specified previously.
The modeling transform moves the object from its natural coordinates into its proper place in
the scene. Then on top of that, a coordinate transform that is applied to the scene as a whole
would carry the object along with it.
Now let’s extend this idea. Suppose that the object that we want to draw is itself a complex
picture, made up of a number of smaller objects. Think, for example, of a potted flower made
up of pot, stem, leaves, and bloom. We would like to be able to draw the smaller component
objects in their own natural coordinate systems, just as we do the main object. For example,
we would like to specify the bloom in a coordinate system in which the center of the bloom is
at (0,0). But this is easy: We draw each small component object, such as the bloom, in its own
coordinate system, and use a modeling transformation to move the sub-object into position
within the main object. We are composing the complex object in its own natural coordinate
system as if it were a complete scene.
On top of that, we can apply another modeling transformation to the complex object as
a whole, to move it into the actual scene; the sub-objects of the complex object are carried
along with it. That is, the overall transformation that applies to a sub-object consists of a
modeling transformation that places the sub-object into the complex object, followed by the
transformation that places the complex object into the scene.
38 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
In fact, we can build objects that are made up of smaller objects which in turn are made
up of even smaller objects, to any level. For example, we could draw the bloom’s petals in
their own coordinate systems, then apply modeling transformations to place the petals into the
natural coordinate system for the bloom. There will be another transformation that moves the
bloom into position on the stem, and yet another transformation that places the entire potted
flower into the scene. This is hierarchical modeling.
Let’s look at a little example. Suppose that we want to draw a simple 2D image of a cart
with two wheels.
This cart is used as one part of a complex scene in an example below. The body of the cart can
be drawn as a pair of rectangles. For the wheels, suppose that we have written a subroutine
drawWheel()
that draws a wheel. This subroutine draws the wheel in its own natural coordinate system. In
this coordinate system, the wheel is centered at (0,0) and has radius 1.
In the cart’s coordinate system, I found it convenient to use the midpoint of the base of
the large rectangle as the reference point. I assume that the positive direction of the y-axis
points upward, which is the common convention in mathematics. The rectangular body of the
cart has width 6 and height 2, so the coordinates of the lower left corner of the rectangle are
(-3,0), and we can draw it with a command such as fillRectangle(-3,0,6,2). The top of the cart
is a smaller red rectangle, which can be drawn in a similar way. To complete the cart, we need
to add two wheels to the object. To make the size of the wheels fit the cart, they need to be
scaled. To place them in the correct positions relative to body of the cart, one wheel must be
translated to the left and the other wheel, to the right. When I coded this example, I had to
play around with the numbers to get the right sizes and positions for the wheels, and I found
that the wheels looked better if I also moved them down a bit. Using the usual techniques of
hierarchical modeling, we save the current transform before drawing each wheel, and we restore
it after drawing the wheel. This restricts the effect of the modeling transformation for the wheel
to that wheel alone, so that it does not affect any other part of the cart. Here is pseudocode
for a subroutine that draws the cart in its own coordinate system:
subroutine drawCart() :
saveTransform() // save the current transform
translate(-1.65,-0.1) // center of first wheel will be at (-1.65,-0.1)
scale(0.8,0.8) // scale to reduce radius from 1 to 0.8
drawWheel() // draw the first wheel
restoreTransform() // restore the saved transform
saveTransform() // save it again
translate(1.5,-0.1) // center of second wheel will be at (1.5,-0.1)
scale(0.8,0.8) // scale to reduce radius from 1 to 0.8
drawWheel(g2) // draw the second wheel
2.4. HIERARCHICAL MODELING 39
You can probably guess how hierarchical modeling is used to draw the three windmills in
this example. There is a drawWindmill method that draws a windmill in its own coordinate
system. Each of the windmills in the scene is then produced by applying a different modeling
transform to the standard windmill. Furthermore, the windmill is itself a complex object that
is constructed from several sub-objects using various modeling transformations.
∗ ∗ ∗
It might not be so easy to see how different parts of the scene can be animated. In fact,
animation is just another aspect of modeling. A computer animation consists of a sequence
of frames. Each frame is a separate image, with small changes from one frame to the next.
From our point of view, each frame is a separate scene and has to be drawn separately. The
same object can appear in many frames. To animate the object, we can simply apply a different
modeling transformation to the object in each frame. The parameters used in the transformation
40 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
can be computed from the current time or from the frame number. To make a cart move from
left to right, for example, we might apply a modeling transformation
translate( frameNumber * 0.1, 0 )
to the cart, where frameNumber is the frame number. In each frame, the cart will be 0.1 units
farther to the right than in the previous frame. (In fact, in the actual program, the translation
that is applied to the cart is
translate( -3 + 13*(frameNumber % 300) / 300.0, 0 )
which moves the reference point of the cart from -3 to 13 along the horizontal axis every 300
frames. In the coordinate system that is used for the scene, the x-coordinate ranges from 0 to
7, so this puts the cart outside the scene for much of the loop.)
The really neat thing is that this type of animation works with hierarchical modeling.
For example, the drawWindmill method doesn’t just draw a windmill—it draws an animated
windmill, with turning vanes. That just means that the rotation applied to the vanes depends
on the frame number. When a modeling transformation is applied to the windmill, the rotating
vanes are scaled and moved as part of the object as a whole. This is an example of hierarchical
modeling. The vanes are sub-objects of the windmill. The rotation of the vanes is part of
the modeling transformation that places the vanes into the windmill object. Then a further
modeling transformation is applied to the windmill object to place it in the scene.
The file java2d/HierarchicalModeling2D.java contains the complete source code for a Java
version of this example. The next section of this book covers graphics programming in Java.
Once you are familiar with that, you should take a look at the source code, especially the
paintComponent() method, which draws the entire scene.
CART
SUN
WHEEL WINDMILL
(12)
(12)
GROUND
FILLED FILLED
CIRCLE SQUARE
CIRCLE
LINE
VANE
In this drawing, a single object can have several connections to one or more parent objects.
Each connection represents one occurrence of the object in its parent object. For example, the
“filled square” object occurs as a sub-object in the cart and in the windmill. It is used twice in
the cart and once in the windmill. (The cart contains two red rectangles, which are created as
squares with a non-uniform scaling; the pole of the windmill is made as a scaled square.) The
“filled circle” is used in the sun and is used twice in the wheel. The “line” is used 12 times in
the sun and 12 times in the wheel; I’ve drawn one thick arrow, marked with a 12, to represent
12 connections. The wheel, in turn, is used twice in the cart. (My diagram leaves out, for lack
of space, two occurrences of the filled square in the scene: It is used to make the road and the
line down the middle of the road.)
Each arrow in the picture can be associated with a modeling transformation that places
the sub-object into its parent object. When an object contains several copies of a sub-object,
each arrow connecting the sub-object to the object will have a different associated modeling
transformation. The object is the same for each copy; only the transformation differs.
Although the scene graph exists conceptually, in some applications it exists only implicitly.
For example, the Java version of the program that was mentioned above draws the image
“procedurally,” that is, by calling subroutines. There is no data structure to represent the
scene graph. Instead, the scene graph is implicit in the sequence of subroutine calls that
draw the scene. Each node in the graph is a subroutine, and each arrow is a subroutine
call. The various objects are drawn using different modeling transformations. As discussed in
Subsection 2.3.8, the computer only keeps track of a “current transformation” that represents
all the transforms that are applied to an object. When an object is drawn by a subroutine, the
42 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
program saves the current transformation before calling the subroutine. After the subroutine
returns, the saved transformation is restored. Inside the subroutine, the object is drawn in
its own coordinate system, possibly calling other subroutines to draw sub-objects with their
own modeling transformations. Those extra transformations will have no effect outside of the
subroutine, since the transform that is in effect before the subroutine is called will be restored
after the subroutine returns.
It is also possible for a scene graph to be represented by an actual data structure in the
program. In an object-oriented approach, the graphical objects in the scene are represented
by program objects. There are many ways to build an object-oriented scene graph API. For a
simple example implemented in Java, you can take a look at java2d/SceneGraphAPI2D.java.
This program draws the same animated scene as the previous example, but it represents the
scene with an object-oriented data structure rather than procedurally. The same scene graph
API is implemented in JavaScript in the live demo c2/cart-and-windmills.html, and you might
take a look at that after you read about HTML canvas graphics in Section 2.6.
In the example program, both in Java and in JavaScript, a node in the scene graph is
represented by an object belonging to a class named SceneGraphNode. SceneGraphNode is an
abstract class, and actual nodes in the scene graph are defined by subclasses of that class. For
example, there is a subclass named CompoundObject to represent a complex graphical object
that is made up of sub-objects. A variable, obj, of type CompoundObject includes a method
obj.add (subobj ) for adding a sub-object to the compound object.
When implementing a scene graph as a data structure made up of objects, a decision has
to be made about how to handle transforms. One option is to allow transformations to be
associated with any node in the scene graph. In this case, however, I decided to use special
nodes to represent transforms as objects of type TransformedObject. A TransformedObject is a
SceneGraphNode that contains a link to another SceneGraphNode and also contains a modeling
transformation that is to be applied to that object. The modeling transformation is given in
terms of scaling, rotation, and translation amounts that are instance variables in the object.
It is worth noting that these are always applied in the order scale, then rotate, then translate,
no matter what order the instance variables are set in the code. If you want to do a trans-
lation followed by a rotation, you will need two TransformedObjects to implement it, since a
translation plus a rotation in the same TransformedObjec would be applied in the order rotate-
then-translate. It is also worth noting that the setter methods for the scaling, rotation, and
translation have a return value that is equal to the object. This makes it possible to chain calls
to the methods into a single statement such as
transformedObject.setScale(5,2).setTranslation(3.5,0);
and even say things like
world.add(
new TransformedObject(windmill).setScale(0.4,0.4).setTranslation(2.2,1.3)
);
This type of chaining can make for more compact code and can eliminate the need for a lot of
extra temporary variables.
Another decision has to be made about how to handle color. One possibility would be to
make a ColoredObject class similar to TransformedObject. However, in this case I just added
a setColor () method to the main ScreenGraphNode class. A color that is set on a compound
object is inherited by any sub-objects, unless a different color is set on the sub-object. In other
words, a color on a compound object acts as a default color for its sub-objects, but color can
be overridden on the sub-objects.
2.4. HIERARCHICAL MODELING 43
In addition to compound objects and transformed objects, we need scene graph nodes to
represent the basic graphical objects that occupy the bottom level of the scene graph. These
are the nodes that do the actual drawing in the end.
For those who are familiar with data structures, I will note that a scene graph is actually
an example of a “directed acyclic graph” or “dag.” The process of drawing the scene involves
a traversal of this dag. The term “acyclic” means that there can’t be cycles in the graph. For
a scene graph, this is the obvious requirement that an object cannot be a sub-object, either
directly or indirectly, of itself.
scale(0.3, 0.3)
scale(2, 2) translate(dx, 0)
FILLED
CIRCLE
WHEEL CART
scale(0.8, 0.8)
translate(1.65, 0)
rotate(r)
The rotation amount for the wheel and the translation amount for the cart are shown as
variables, since they are different in different frames of the animation. When the computer
starts drawing the scene, the modeling transform that is in effect is the identity transform,
that is, no transform at all. As it prepares to draw the cart, it saves a copy of the current
transform (the identity) by pushing it onto the stack. It then modifies the current transform
by multiplying it by the modeling transforms for the cart, scale(0.3,0.3) and translate(dx,0).
When it comes to drawing the wheel, it again pushes the current transform (the modeling
transform for the cart as a whole) onto the stack, and it modifies the current transform to take
the wheel’s modeling transforms into account. Similarly, when it comes to the filled circle, it
saves the modeling transform for the wheel, and then applies the modeling transform for the
circle.
When, finally, the circle is actually drawn in the scene, it is transformed by the combined
transform. That transform places the circle directly into the scene, but it has been composed
from the transform that places the circle into the wheel, the one that places the wheel into the
cart, and the one that places the cart into the scene. After drawing the circle, the computer
replaces the current transform with one it pops from the stack. That will be the modeling
transform for the wheel as a whole, and that transform will be used for any further parts of the
wheel that have to be drawn. When the wheel is done, the transform for the cart is popped.
And when the cart is done, the original transform, the identity, is popped. When the computer
goes onto the next object in the scene, it starts the whole process again, with the identity
transform as the starting point.
This might sound complicated, but I should emphasize that it something that the computer
does for you. Your responsibility is simply to design the individual objects, in their own natural
coordinate system. As part of that, you specify the modeling transformations that are applied
to the sub-objects of that object. You construct the scene as a whole in a similar way. The
computer will then put everything together for you, taking into account the many layers of
hierarchical structure. You only have to deal with one component of the structure at a time.
That’s the power of hierarchical design; that’s how it helps you deal with complexity.
the basics of Java programming. But even if you don’t, you should be able to follow most
of the discussion of the graphics API itself. (See Section A.1 in Appendix A for a very basic
introduction to Java.)
The original version of Java had a much smaller graphics API. It was tightly focused on
pixels, and it used only integer coordinates. The API had subroutines for stroking and filling a
variety of basic shapes, including lines, rectangles, ovals, and polygons (although Java uses the
term draw instead of stroke). Its specification of the meaning of drawing operations was very
precise on the pixel level. Integer coordinates are defined to refer to the lines between pixels.
For example, a 12-by-8 pixel grid has x -coordinates from 0 to 12 and y-coordinates from 0 to
8, as shown below. The lines between pixels are numbered, not the pixels.
0 3 8 1 3 8
0
5 5
8 8
The command fillRect(3,2,5,3) fills the rectangle with upper left corner at (3,2), with width 5,
and with height 3, as shown on the left above. The command drawRect(3,2,5,3) conceptually
drags a “pen” around the outline of this rectangle. However, the pen is a 1-pixel square, and
it is the upper left corner of the pen that moves along the outline. As the pen moves along
the right edge of the rectangle, the pixels to the right of that edge are colored; as the pen
moves along the bottom edge, the pixels below the edge are colored. The result is as shown
on the right above. My point here is not to belabor the details, but to point out that having
a precise specification of the meaning of graphical operations gives you very fine control over
what happens on the pixel level.
Java’s original graphics did not support things like real-number coordinates, transforms,
antialiasing, or gradients. Just a few years after Java was first introduced, a new graphics API
was added that does support all of these. It is that more advanced API that we will look at
here.
2.5.1 Graphics2D
Java is an object-oriented language. Its API is defined as a large set of classes, The actual
drawing operations in the original graphics API were mostly contained in the class named
Graphics. In the newer API, drawing operations are methods in a class named Graphics2D,
which is a subclass of Graphics, so that all the original drawing operations are still available.
(A class in Java is contained in a collection of classes known as a “package.” Graphics and
Graphics2D, for example, are in the package named java.awt. Classes that define shapes and
transforms are in a package named java.awt.geom.)
A graphics system needs a place to draw. In Java, the drawing surface is often an object
of the class JPanel, which represents a rectangular area on the screen. The JPanel class has a
method named paintComponent() to draw its content. To create a drawing surface, you can
create a subclass of JPanel and provide a definition for its paintComponent() method. All
drawing should be done inside paintComponent(); when it is necessary to change the contents
of the drawing, you can call the panel’s repaint() method to trigger a call to paintComponent().
46 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
The paintComponent() method has a parameter of type Graphics, but the parameter that is
passed to the method is actually an object of type Graphics2D, and it can be type-cast to
Graphics2D to obtain access to the more advanced graphics capabilities. So, the definition of
the paintComponent() method usually looks something like this:
protected void paintComponent( Graphics g ) {
Graphics2D g2;
g2 = (Graphics2D)g; // Type-cast the parameter to Graphics2D.
.
. // Draw using g2.
.
}
In the rest of this section, I will assume that g2 is a variable of type Graphics2D, and I will
discuss some of the things that you can do with it. As a first example, I note that Graphics2D
supports antialiasing, but it is not turned on by default. It can be enabled in a graphics context
g2 with the rather intimidating command
g2.setRenderingHint(RenderingHints.KEY ANTIALIASING,
RenderingHints.VALUE ANTIALIAS ON);
For simple examples of graphics in complete Java programs, you can look at the sample
programs java2d/GraphicsStarter.java and java2d/AnimationStarter.java. They provide very
minimal frameworks for drawing static and animated images, respectively, using Graphics2D.
The program java2d/EventsStarter.java is a similar framework for working with mouse and key
events in a graphics program. You can use these programs as the basis for some experimentation
if you want to explore Java graphics.
2.5.2 Shapes
Drawing with the original Graphics class is done using integer coordinates, with the measurement
given in pixels. This works well in the standard coordinate system, but is not appropriate when
real-number coordinates are used, since the unit of measure in such a coordinate system will
not be equal to a pixel. We need to be able to specify shapes using real numbers. The Java
package java.awt.geom provides support for shapes defined using real number coordinates. For
example, the class Line2D in that package represents line segments whose endpoints are given
as pairs of real numbers.
Now, Java has two real number types: double and float. The double type can represent a
larger range of numbers than float, with a greater number of significant digits, and double is
the more commonly used type. In fact, doubles are simply easier to use in Java. However, float
values generally have enough accuracy for graphics applications, and they have the advantage
of taking up less space in memory. Furthermore, computer graphics hardware often uses float
values internally.
So, given these considerations, the java.awt.geom package actually provides two versions
of each shape, one using coordinates of type float and one using coordinates of type double.
This is done in a rather strange way. Taking Line2D as an example, the class Line2D itself
is an abstract class. It has two subclasses, one that represents lines using float coordinates
and one using double coordinates. The strangest part is that these subclasses are defined
as nested classes inside Line2D: Line2D.Float and Line2D.Double. This means that you can
declare a variable of type Line2D, but to create an object, you need to use Line2D.Double or
Line2D.Float:
2.5. JAVA GRAPHICS2D 47
the starting point of a new piece of the path. The method p.lineTo(x,y) draws a line from the
current pen position to (x,y), leaving the pen at (x,y). The method p.close() can be used to
close the path (or the current piece of the path) by drawing a line back to its starting point.
For example, the following code creates a triangle with vertices at (0,5), (2,-3), and (-4,1):
Path2D.Double p = new Path2D.Double();
p.moveTo(0,5);
p.lineTo(2,-3);
p.lineTo(-4,1);
p.close();
You can also add Bezier curve segments to a Path2D. Bezier curves were discussed in Sub-
section 2.2.3. You can add a cubic Bezier curve to a Path2D p with the method
p.curveTo( cx1, cy1, cx2, cy2, x, y );
This adds a curve segment that starts at the current pen position and ends at (x,y), using
(cx1,cy1 ) and (cx2,cy2 ) as the two control points for the curve. The method for adding a
quadratic Bezier curve segment to a path is quadTo. It requires only a single control point:
p.quadTo( cx, cy, x, y );
When a path intersects itself, its interior is determined by looking at the winding number,
as discussed in Subsection 2.2.2. There are two possible rules for determining whether a point
is interior: asking whether the winding number of the curve about that point is non-zero, or
asking whether it is even. You can set the winding rule used by a Path2D p with
p.setWindingRule( Path2D.WIND NON ZERO );
p.setWindingrule( Path2D.WIND EVEN ODD );
will draw the image with its upper left corner at the point (x,y). (The fourth parameter is hard
to explain, but it should be specified as null for BufferedImages.) This draws the image at its
natural width and height, but a different width and height can be specified in the method:
g2.drawImage( img, x, y, width, height, null );
There is also a method for drawing a string of text. The method specifies the string and
the basepoint of the string. (The basepoint is the lower left corner of the string, ignoring
“descenders” like the tail on the letter “g”.) For example,
g2.drawString( "Hello World", 100, 50 );
Images and strings are subject to transforms in the same way as other shapes. Transforms are
the only way to get rotated text and images. As an example, here is what can happen when
you apply a rotation to some text and an image:
2.5. JAVA GRAPHICS2D 49
Here, g2 is of type Graphics2D, and shape can be of type Path2D, Line2D, Rectangle2D or any
of the other shape classes. These are often used on a newly created object, when that object
represents a shape that will only be drawn once. For example
g2.draw( new Line2D.Double( -5, -5, 5, 5 ) );
Of course, it is also possible to create shape objects and reuse them many times.
The “pen” that is used for stroking a shape is usually represented by an object of type
BasicStroke. The default stroke has line width equal to 1. That’s one unit in the current
coordinate system, not one pixel. To get a line with a different width, you can install a new
stroke with
g2.setStroke( new BasicStroke(width) );
The width in the constructor is of type float. It is possible to add parameters to the con-
structor to control the shape of a stroke at its endpoints and where two segments meet. (See
Subsection 2.2.1.) For example,
g2.setStroke( new BasicStroke( 5.0F,
BasicStroke.CAP ROUND, BasicStroke.JOIN BEVEL) );
It is also possible to make strokes out of dashes and dots, but I won’t discuss how to do it here.
∗ ∗ ∗
Stroking or filling a shape means setting the colors of certain pixels. In Java, the rule
that is used for coloring those pixels is called a “paint.” Paints can be solid colors, gradients,
or patterns. Like most things in Java, paints are represented by objects. If paint is such an
object, then
g2.setPaint(paint);
50 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
will set paint to be used in the graphics context g2 for subsequent drawing operations, until
the next time the paint is changed. (There is also an older method, g2.setColor (c), that works
only for colors and is equivalent to calling g2.setPaint(c).)
Solid colors are represented by objects of type Color. A color is represented internally as
an RGBA color. An opaque color, with maximal alpha component, can be created using the
constructor
new Color( r, g, b );
where r, g, and b are integers in the range 0 to 255 that give the red, green, and blue components
of the color. To get a translucent color, you can add an alpha component, also in the range 0
to 255:
new Color( r, b, g, a );
There is also a function, Color.getHSBColor (h,s,b), that creates a color from values in the HSB
color model (which is another name for HSV). In this case, the hue, saturation, and brightness
color components must be given as values of type float. And there are constants to represent
about a dozen common colors, such as Color.WHITE, Color.RED, and Color.YELLOW. For
example, here is how I might draw a square with a black outline and a light blue interior:
Rectangle2D square = new Rectangle2D.double(-2,-2,4,4);
g2.setPaint( new Color(200,200,255) );
g2.fill( square );
g2.setStroke( new BasicStroke(0.1F) );
g2.setPaint( Color.BLACK );
g2.draw( square );
Beyond solid colors, Java has the class GradientPaint, to represent simple linear gradients,
and TexturePaint to represent to represent pattern fills. (Image patterns used in a similar way
in 3D graphics are called textures.) Gradients and patterns were discussed in Subsection 2.2.2.
For these paints, the color that is applied to a pixel depends on the coordinates of the pixel.
To create a TexturePaint, you need a BufferedImage object to specify the image that it will
use as a pattern. You also have to say how coordinates in the image will map to drawing
coordinates in the display. You do this by specifying a rectangle in that will hold one copy of
the image. So the constructor takes the form:
new TexturePaint( image, rect );
where image is the BufferedImage and rect is a Rectangle2D. Outside that specified rectangle,
the image is repeated horizontally and vertically. The constructor for a GradientPaint takes the
form
new GradientPaint( x1, y1, color1, x2, y2, color2, cyclic )
Here, x1, y1, x2, and y2 are values of type float; color1 and color2 are of type Color ; and cyclic
is boolean. The gradient color will vary along the line segment from the point (x1,y1 ) to the
point (x2,y2 ). The color is color1 at the first endpoint and is color2 at the second endpoint.
Color is constant along lines perpendicular to that line segment. The boolean parameter cyclic
says whether or not the color pattern repeats. As an example, here is a command that will
install a GradientPaint into a graphics context:
g2.setPaint( new GradientPaint( 0,0, Color.BLACK, 200,100, Color.RED, true ) );
2.5. JAVA GRAPHICS2D 51
You should, by the way, note that the current paint is used for strokes as well as for fills.
The sample Java program java2d/PaintDemo.java displays a polygon filled with a
GradientPaint or a TexturePaint and lets you adjust their properties. The image files
java2d/QueenOfHearts.png and java2d/TinySmiley.png are part of that program, and they
must be in the same location as the compiled class files that make up that program when it is
run.
2.5.4 Transforms
Java implements geometric transformations as methods in the Graphics2D class. For example,
if g2 is a Graphics2D, then calling g2.translate(1,3) will apply a translation by (1,3) to objects
that are drawn after the method is called. The methods that are available correspond to the
transform functions discussed in Section 2.3:
• g2.scale(sx,sy) — scales by a horizontal scale factor sx and a vertical scale factor sy.
• g2.rotate(r) — rotates by the angle r about the origin, where the angle is measured in
radians. A positive angle rotates the positive x-axis in the direction of the positive y-axis.
• g2.rotate(r,x,y) — rotates by the angle r about the point (x,y).
• g2.translate(dx,dy) — translates by dx horizontally and dy vertically.
• g2.shear(sx,sy) — applies a horizontal shear amount sx and a vertical shear amount
sy. (Usually, one of the shear amounts is 0, giving a pure horizontal or vertical shear.)
A transform in Java is represented as an object of the class AffineTransform. You can create
a general affine transform with the contstructor
AffineTransform trns = new AffineTransform(a,b,c,d,e,f);
The transform trns will transform a point (x,y) to the point (x1,y1 ) given by
x1 = a*x + c*y + e
y1 = b*x + d*y + f;
You can apply the transform trns to a graphics context g2 by calling g2.transform(trns).
The graphics context g2 includes the current affine transform, which is the composition of
all the transforms that have been applied. Commands such as g2.rotate and g2.transform
modify the current transform. You can get a copy of the current transform by calling
g2.getTransform(), which returns an AffineTransform object. You can set the current transform
using g2.setTransform(trns). This replaces the current transform in g2 with the AffineTransform
trns. (Note that g2.setTransform(trns) is different from g2.transform(trns); the first command
replaces the current transform in g2, while the second modifies the current transform by
composing it with trns.)
The getTransform and setTransform methods can be used to implement hierarchical mod-
eling. The idea, as discussed in Section 2.4, is that before drawing an object, you should save
the current transform. After drawing the object, restore the saved transform. Any additional
modeling transformations that are applied while drawing the object and its sub-objects will
have no effect outside the object. In Java, this looks like
AffineTransform savedTransform = g2.getTransform();
drawObject();
g2.setTransform( savedTransform );
52 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
For hierarchical graphics, we really need a stack of transforms. However, if the hierarchy is
implemented using subroutines, then the above code would be part of a subroutine, and the value
of the local variable savedTransform would be stored on the subroutine call stack. Effectively,
we would be using the subroutine call stack to implement the stack of saved transforms.
In addition to modeling transformations, transforms are used to set up the window-to-
viewport transformation that establishes the coordinate system that will be used for drawing.
This is usually done in Java just after the graphics context has been created, before any drawing
operations. It can be done with a Java version of the applyWindowToViewportTransformation
function from Subsection 2.3.7. See the sample program java2d/GraphicsStarter.java for an
example.
∗ ∗ ∗
I will mention one more use for AffineTransform objects: Sometimes, you do need to explicitly
transform coordinates. For example, given object coordinates (x,y), I might need to know where
they will actually end up on the screen, in pixel coordinates. That is, I would like to transform
(x,y) by the current transform to get the corresponding pixel coordinates. The AffineTransform
class has a method for applying the affine transform to a point. It works with objects of type
Point2D. Here is an example:
AffineTransform trns = g2.getTransform();
Point2D.Double originalPoint = new Point2D.Double(x,y);
Point2D.Double transformedPoint = new Point2D.Double();
trns.transform( originalPoint, transformedPoint );
// transformedPoint nowcontains the pixel coords corresponding to (x,y)
int pixelX = (int)transformedPoint.x;
int pixelY = (int)transformedPoint.y;
One way I have used this is when working with strings. Often when displaying a string in a
transformed coordinate system, I want to transform the basepoint of a string, but not the string
itself. That is, I want the transformation to affect the location of the string but not its size
or rotation. To accomplish this, I use the above technique to obtain the pixel coordinates for
the transformed basepoint, and then draw the string at those coordinates, using an original,
untransformed graphics context.
The reverse operation is also sometimes necessary. That is, given pixel coordinates (px,py),
find the point (x,y) that is transformed to (px,py) by a given affine transform. For example,
when implementing mouse interaction, you will generally know the pixel coordinates of the
mouse, but you will want to find the corresponding point in your own chosen coordinate sys-
tem. For that, you need an inverse transform. The inverse of an affine transform T is
another transform that performs the opposite transformation. That is, if T(x,y) = (px,py),
and if R is the inverse transform, then R(px,py) = (x,y). In Java, the inverse transform of an
AffineTransform trns can be obtained with
AffineTransform inverse = trns.createInverse();
(A final note: The older drawing methods from Graphics, such as drawLine, use integer
coordinates. It’s important to note that any shapes drawn using these older methods are
subject to the same transformation as shapes such as Line2D that are specified with real number
coordinates. For example, drawing a line with g.drawLine(1,2,5,7) will have the same effect as
drawing a Line2D that has endpoints (1.0,2.0) and (5.0,7.0). In fact, all drawing is affected by
the transformation of coordinates.)
2.5. JAVA GRAPHICS2D 53
for reading and setting the color of individual pixels. An image consists of rows and columns
of pixels. If OSC is a BufferedImage, then
int color = OSC.getRGB(x,y)
gets the integer that represents the color of the pixel in column number x and row number y.
Each color component is stored in an 8-bit field in the integer color value. The individual color
components can be extracted for processing using Java’s bit manipulation operators:
int red = (color >> 16) & 255;
int green = (color >> 8) & 255;
int blue = color & 255;
Similarly, given red, green, and blue color component values in the range 0 to 255, we can
combine those component values into a single integer and use it to set the color of a pixel in
the image:
int color = (red << 16) | (green << 8) | blue;
OSC.setRGB(x,y,color);
There are also methods for reading and setting the colors of an entire rectangular region of
pixels.
Pixel operations are used to implement two features of the sample program. First, there is
a “Smudge” tool. When the user drags with this tool, it’s like smearing wet paint. When the
user first clicks the mouse, the color components from a small square of pixels surrounding the
mouse position are copied into arrays. As the user moves the mouse, color from the arrays is
blended with the color of the pixels near the mouse position. Here is a small rectangle that has
been “smudged”:
The second use of pixel manipulation is in implementing “filters.” A filter, in this program, is
an operation that modifies an image by replacing the color of each pixel with a weighted average
of the colors of a 3-by-3 square of pixels. A “Blur” filter for example, uses equal weights for all
pixels in the average, so the color of a pixel is changed to the simple average of the colors of
that pixel and its neighbors. Using different weights for each pixel can produce some striking
effects.
The pixel manipulation in the sample program produces effects that can’t be achieved with
pure vector graphics. I encourage you to learn more by looking at the source code. You might
also take a look at the live demos in the next section, which implement the same effects using
HTML canvas graphics.
The first line gets a reference to the canvas element on the web page, using its id. The second
line creates the graphics context for that canvas element. (This code will produce an error in a
web browser that doesn’t support canvas, so you might add some error checking such as putting
these commands inside a try..catch statement.)
Typically, you will store the canvas graphics context in a global variable and use the same
graphics context throughout your program. This is in contrast to Java, where you typically
get a new Graphics2D context each time the paintComponent() method is called, and that
new context is in its initial state with default color and stroke properties and with no applied
transform. When a graphics context is global, changes made to the state in one function call
will carry over to subsequent function calls, unless you do something to limit their effect. This
can actually lead to a fairly common type of bug: For example, if you apply a 30-degree rotation
in a function, those rotations will accumulate each time the function is called, unless you do
something to undo the previous rotation before applying the next rotation.
The rest of this section will be mostly concerned with describing what you can do with
a canvas graphics context. But here, for the record, is the complete source code for a very
minimal web page that uses canvas graphics:
<!DOCTYPE html>
<html>
<head>
<title>Canvas Graphics</title>
<script>
var canvas; // DOM object corresponding to the canvas
var graphics; // 2D graphics context for drawing on the canvas
function draw() {
// draw on the canvas, using the graphics context
graphics.fillText("Hello World", 10, 20);
}
function init() {
56 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
canvas = document.getElementById("theCanvas");
graphics = canvas.getContext("2d");
draw(); // draw something on the canvas
}
</script>
</head>
<body onload="init()">
<canvas id="theCanvas" width="640" height="480"></canvas>
</body>
</html>
For a more complete, though still minimal, example, look at the sample page can-
vas2d/GraphicsStarter.html. (You should look at the page in a browser, but you should also read
the source code.) This example shows how to draw some basic shapes using canvas graphics,
and you can use it as a basis for your own experimentation. There are also three more advanced
“starter” examples: canvas2d/GraphicsPlusStarter.html adds some utility functions for drawing
shapes and setting up a coordinate system; canvas2d/AnimationStarter.html adds animation
and includes a simple hierarchical modeling example; and canvas2d/EventsStarter.html shows
how to respond to keyboard and mouse events.
2.6.2 Shapes
The default coordinate system on a canvas is the usual: The unit of measure is one pixel;
(0,0) is at the upper left corner; the x -coordinate increases to the right; and the y-coordinate
increases downward. The range of x and y values are given by the width and height properties
of the <canvas> element. The term “pixel” here for the unit of measure is not really correct.
Probably, I should say something like “one nominal pixel.” The unit of measure is one pixel
at typical desktop resolution with no magnification. If you apply a magnification to a browser
window, the unit of measure gets stretched. And on a high-resolution screen, one unit in the
default coordinate system might correspond to several actual pixels on the display device.
The canvas API supports only a very limited set of basic shapes. In fact, the only basic
shapes are rectangles and text. Other shapes must be created as paths. Shapes can be stroked
and filled. That includes text: When you stroke a string of text, a pen is dragged along the
outlines of the characters; when you fill a string, the insides of the characters are filled. It only
really makes sense to stroke text when the characters are rather large. Here are the functions
for drawing rectangles and text, where graphics refers to the object that represents the graphics
context:
• graphics.fillRect(x,y,w,h) — draws a filled rectangle with corner at (x,y), with width
w and with height h. If the width or the height is less than or equal to zero, nothing is
drawn.
• graphics.strokeRect(x,y,w,h) — strokes the outline of the same rectangle.
• graphics.clearRect(x,y,w,h) — clears the rectangle by filling it with fully transparent
pixels, allowing the background of the canvas to show. The background is determined by
the properties of the web page on which the canvas appears. It might be a background
color, an image, or even another canvas.
• graphics.fillText(str,x,y) — fills the characters in the string str. The left end of the
baseline of the string is positioned at the point (x,y).
• graphics.strokeText(str,x,y) — strokes the outlines of the characters in the string.
2.6. HTML CANVAS GRAPHICS 57
A path can be created using functions in the graphics context. The context keeps track of
a “current path.” In the current version of the API, paths are not represented by objects, and
there is no way to work with more than one path at a time or to keep a copy of a path for later
reuse. Paths can contain lines, Bezier curves, and circular arcs. Here are the most common
functions for working with paths:
• graphics.beginPath() — start a new path. Any previous path is discarded, and the
current path in the graphics context is now empty. Note that the graphics context also
keeps track of the current point, the last point in the current path. After calling graph-
ics.beginPath(), the current point is undefined.
• graphics.moveTo(x,y) — move the current point to (x,y), without adding anything to
the path. This can be used for the starting point of the path or to start a new, disconnected
segment of the path.
• graphics.lineTo(x,y) — add the line segment starting at current point and ending at
(x,y) to the path, and move the current point to (x,y).
• graphics.bezierCurveTo(cx1,cy1,c2x,cy2,x,y) — add a cubic Bezier curve to the
path. The curve starts at the current point and ends at (x,y). The points (cx1,cy1 ) and
(cx2,cy2 ) are the two control points for the curve. (Bezier curves and their control points
were discussed in Subsection 2.2.3.)
• graphics.quadraticCurveTo(cx,cy,x,y) — adds a quadratic Bezier curve from the
current point to (x,y), with control point (cx,cy).
• graphics.arc(x,y,r,startAngle,endAngle) — adds an arc of the circle with center
(x,y) and radius r. The next two parameters give the starting and ending angle of the arc.
They are measured in radians. The arc extends in the positive direction from the start
angle to the end angle. (The positive rotation direction is from the positive x-axis towards
the positive y-axis; this is clockwise in the default coordinate system.) An optional fifth
parameter can be set to true to get an arc that extends in the negative direction. After
drawing the arc, the current point is at the end of the arc. If there is a current point
before graphics.arc is called, then before the arc is drawn, a line is added to the path that
extends from the current point to the starting point of the arc. (Recall that immediately
after graphics.beginPath(), there is no current point.)
• graphics.closePath() — adds to the path a line from the current point back to the
starting point of the current segment of the curve. (Recall that you start a new segment
of the curve every time you use moveTo.)
Creating a curve with these commands does not draw anything. To get something visible to
appear in the image, you must fill or stroke the path.
The commands graphics.fill () and graphics.stroke() are used to fill and to stroke the current
path. If you fill a path that has not been closed, the fill algorithm acts as though a final line
segment had been added to close the path. When you stroke a shape, it’s the center of the
virtual pen that moves along the path. So, for high-precision canvas drawing, it’s common
to use paths that pass through the centers of pixels rather than through their corners. For
example, to draw a line that extends from the pixel with coordinates (100,200) to the pixel with
coordinates (300,200), you would actually stroke the geometric line with endpoints (100.5,200.5)
and (100.5,300.5). We should look at some examples. It takes four steps to draw a line:
graphics.beginPath(); // start a new path
graphics.moveTo(100.5,200.5); // starting point of the new path
58 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
graphics.lineCap = "round";
Similarly, graphics.lineJoin controls the appearance of the point where one segment of a stroke
joins another segment; its possible values are “round”, “bevel”, or “miter”. (Line endpoints
and joins were discussed in Subsection 2.2.1.)
Note that the values for graphics.lineCap and graphics.lineJoin are strings. This is a some-
what unusual aspect of the API. Several other properties of the graphics context take values
that are strings, including the properties that control the colors used for drawing and the font
that is used for drawing text.
Color is controlled by the values of the properties graphics.fillStyle and graphics.strokeStyle.
The graphics context maintains separate styles for filling and for stroking. A solid color for
stroking or filling is specified as a string. Valid color strings are ones that can be used in CSS,
the language that is used to specify colors and other style properties of elements on web pages.
Many solid colors can be specified by their names, such as “red”, “black”, and “beige”. An
RGB color can be specified as a string of the form “rgb(r,g,b)”, where the parentheses contain
three numbers in the range 0 to 255 giving the red, green, and blue components of the color.
Hexadecimal color codes are also supported, in the form “#XXYYZZ” where XX, YY, and ZZ
are two-digit hexadecimal numbers giving the RGB color components. For example,
graphics.fillStyle = "rgb(200,200,255)"; // light blue
graphics.strokeStyle = "#0070A0"; // a darker, greenish blue
The style can actually be more complicated than a simple solid color: Gradients and patterns
are also supported. As an example, a gradient can be created with a series of steps such as
var lineargradient = graphics.createLinearGradient(420,420,550,200);
lineargradient.addColorStop(0,"red");
lineargradient.addColorStop(0.5,"yellow");
lineargradient.addColorStop(1,"green");
graphics.fillStyle = lineargradient; // Use a gradient fill!
The first line creates a linear gradient that will vary in color along the line segment from the
point (420,420) to the point (550,200). Colors for the gradient are specified by the addColorStop
function: the first parameter gives the fraction of the distance from the initial point to the final
point where that color is applied, and the second is a string that specifies the color itself. A
color stop at 0 specifies the color at the initial point; a color stop at 1 specifies the color at the
final point. Once a gradient has been created, it can be used both as a fill style and as a stroke
style in the graphics context.
Finally, I note that the font that is used for drawing text is the value of the property
graphics.font. The value is a string that could be used to specify a font in CSS. As such, it can
be fairly complicated, but the simplest versions include a font-size (such as 20px or 150% ) and
a font-family (such as serif, sans-serif, monospace, or the name of any font that is accessible to
the web page). You can add italic or bold or both to the front of the string. Some examples:
graphics.font = "2cm monospace"; // the size is in centimeters
graphics.font = "bold 18px sans-serif";
graphics.font = "italic 150% serif"; // size is 150% of the usual size
The default is “10px sans-serif,” which is usually too small. Note that text, like all drawing,
is subject to coordinate transforms. Applying a scaling operation changes the size of the text,
and a negative scaling factor can produce mirror-image text.
60 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
2.6.4 Transforms
A graphics context has three basic functions for modifying the current transform by scaling,
rotation, and translation. There are also functions that will compose the current transform
with an arbitrary transform and for completely replacing the current transform:
• graphics.scale(sx,sy) — scale by sx in the x -direction and sy in the y-direction.
• graphics.rotate(angle) — rotate by angle radians about the origin. A positive rotation
is clockwise in the default coordinate system.
• graphics.translate(tx,ty) — translate by tx in the x -direction and ty in the y-
direction.
• graphics.transform(a,b,c,d,e,f) — apply the transformation x1 = a*x + c*y + e,
and y1 = b*x + d*y + f.
• graphics.setTransform(a,b,c,d,e,f) — discard the current transformation, and set
the current transformation to be x1 = a*x + c*y + e, and y1 = b*x + d*y + f.
Note that there is no shear transform, but you can apply a shear as a general transform. For
example, for a horizontal shear with shear factor 0.5, use
graphics.transform(1, 0, 0.5, 1, 0, 0)
To implement hierarchical modeling, as discussed in Section 2.4, you need to be able to save
the current transformation so that you can restore it later. Unfortunately, no way is provided
to read the current transformation from a canvas graphics context. However, the graphics
context itself keeps a stack of transformations and provides methods for pushing and popping
the current transformation. In fact, these methods do more than save and restore the current
transformation. They actually save and restore almost the entire state of the graphics context,
including properties such as current colors, line width, and font (but not the current path):
• graphics.save() — push a copy of the current state of the graphics context, including
the current transformation, onto the stack.
• graphics.restore() — remove the top item from the stack, containing a saved state of
the graphics context, and restore the graphics context to that state.
Using these methods, the basic setup for drawing an object with a modeling transform
becomes:
graphics.save(); // save a copy of the current state
graphics.translate(a,b); // apply modeling transformations
graphics.rotate(r);
graphics.scale(s,s);
.
. // Draw the object!
.
graphics.restore(); // restore the saved state
Note that if drawing the object includes any changes to attributes such as drawing color, those
changes will be also undone by the call to graphics.restore(). In hierarchical graphics, this is
usually what you want, and it eliminates the need to have extra statements for saving and
restoring things like color.
To draw a hierarchical model, you need to traverse a scene graph, either procedurally or
as a data structure. It’s pretty much the same as in Java. In fact, you should see that the
2.6. HTML CANVAS GRAPHICS 61
basic concepts that you learned about transformations and modeling carry over to the canvas
graphics API. Those concepts apply very widely and even carry over to 3D graphics APIs, with
just a little added complexity. The demo program c2/cart-and-windmills.html from Section 2.4
implements hierarchical modeling using the 2D canvas API.
∗ ∗ ∗
Now that we know how to do transformations, we can see how to draw an oval using the
canvas API. Suppose that we want an oval with center at (x,y), with horizontal radius r1 and
with vertical radius r2. The idea is to draw a circle of radius 1 with center at (0,0), then
transform it. The circle needs to be scaled by a factor of r1 horizontally and r2 vertically. It
should then be translated to move its center from (0,0) to (x,y). We can use graphics.save()
and graphics.restore() to make sure that the transformations only affect the circle. Recalling
that the order of transforms in the code is the opposite of the order in which they are applied
to objects, this becomes:
graphics.save();
graphics.translate( x, y );
graphics.scale( r1, r2 );
graphics.beginPath();
graphics.arc( 0, 0, 1, 0, Math.PI ); // a circle of radius 1
graphics.restore();
graphics.stroke();
Note that the current path is not affected by the calls to graphics.save() and graphics.restore().
So, in the example, the oval-shaped path is not discarded when graphics.restore() is called.
When graphics.stroke() is called at the end, it is the oval-shaped path that is stroked. On the
other hand, the line width that is used for the stroke is not affected by the scale transform that
was applied to the oval. Note that if the order of the last two commands were reversed, then
the line width would be subject to the scaling.
There is an interesting point here about transforms and paths. In the HTML canvas API,
the points that are used to create a path are transformed by the current transformation before
they are saved. That is, they are saved in pixel coordinates. Later, when the path is stroked
or filled, the current transform has no effect on the path (although it can affect, for example,
the line width when the path is stroked). In particular, you can’t make a path and then apply
different transformations. For example, you can’t make an oval-shaped path, and then use it to
draw several ovals in different positions. Every time you draw the oval, it will be in the same
place, even if different translation transforms are applied to the graphics context.
The situation is different in Java, where the coordinates that are stored in the path are
the actual numbers that are used to specify the path, that is, the object coordinates. When
the path is stroked or filled, the transformation that is in effect at that time is applied to the
path. The path can be reused many times to draw copies with different transformations. This
comment is offered as an example of how APIs that look very similar can have subtle differences.
The on-line version of this section has a live demo version of the program that has the same
functionality. You can try it out to see how the various drawing tools work. Don’t forget to
try the “Smudge” tool! (It has to be applied to shapes that you have already drawn.)
For JavaScript, a web page is represented as a data structure, defined by a standard called
the DOM, or Document Object model. For an off-screen canvas, we can use a <canvas> that is
not part of that data structure and therefore is not part of the page. In JavaScript, a <canvas>
can be created with the function call document.createElement(“canvas”). There is a way to add
this kind of dynamically created canvas to the DOM for the web page, but it can be used as an
off-screen canvas without doing so. To use it, you have to set its width and height properties,
and you need a graphics context for drawing on it. Here, for example, is some code that creates
a 640-by-480 canvas, gets a graphics context for the canvas, and fills the whole canvas with
white:
OSC = document.createElement("canvas"); // off-screen canvas
The sample program lets the user drag the mouse on the canvas to draw some shapes. The
off-screen canvas holds the official copy of the picture, but it is not seen by the user. There is
also an on-screen canvas that the user sees. The off-screen canvas is copied to the on-screen
canvas whenever the picture is modified. While the user is dragging the mouse to draw a line,
oval, or rectangle, the new shape is actually drawn on-screen, over the contents of the off-screen
canvas. It is only added to the off-screen canvas when the user finishes the drag operation. For
the other tools, changes are made directly to the off-screen canvas, and the result is then copied
to the screen. This is an exact imitation of the Java program.
(The demo version mentioned above actually uses a somewhat different technique to accom-
plish the same thing. It uses two on-screen canvases, one located exactly on top of the other.
The lower canvas holds the actual image. The upper canvas is completely transparent, except
when the user is drawing a line, oval, or rectangle. While the user is dragging the mouse to
draw such a shape, the new shape is drawn on the upper canvas, where it hides the part of the
lower canvas that is beneath the shape. When the user releases the mouse, the shape is added
to the lower canvas and the upper canvas is cleared to make it completely transparent again.
Again, the other tools operate directly on the lower canvas.)
This returns the color data for a 20-by-10 rectangle in the upper left corner of the canvas. The
return value, colors, is an object with properties colors.width, colors.height, and colors.data.
The width and height give the number of rows and columns of pixels in the returned data.
(According to the documentation, on a high-resolution screen, they might not be the same as
the width and height in the function call. The data can be for real, physical pixels on the
display device, not the “nominal” pixels that are used in the pixel coordinate system on the
canvas. There might be several device pixels for each nominal pixel. I’m not sure whether this
can really happen.)
The value of colors.data is an array, with four array elements for each pixel. The four
elements contain the red, blue, green, and alpha color components of the pixel, given as integers
in the range 0 to 255. For a pixel that lies outside the canvas, the four component values will
all be zero. The array is a value of type Uint8ClampedArray whose elements are 8-bit unsigned
integers limited to the range 0 to 255. This is one of JavaScript’s typed array datatypes,
which can only hold values of a specific numerical type. As an example, suppose that you just
want to read the RGB color of one pixel, at coordinates (x,y). You can set
pixel = graphics.getImageData(x,y,1,1);
Then the RGB color components for the pixel are R = pixel.data[0], G = pixel.data[1], and B
= pixel.data[2].
The function graphics.putImageData(imageData,x,y) is used to copy the colors from an
image data object into a canvas, placing it into a rectangle in the canvas with upper left corner
at (x,y). The imageData object can be one that was returned by a call to graphics.getImageData,
possibly with its color data modified. Or you can create a blank image data object by calling
graphics.createImageData(w,h) and fill it with data.
Let’s consider the “Smudge” tool in the sample program. When the user clicks the mouse
with this tool, I use OSG.getImageData to get the color data from a 9-by-9 square of pixels
surrounding the mouse location. OSG is the graphics context for the canvas that contains
the image. Since I want to do real-number arithmetic with color values, I copy the color
components into another typed array, one of type Float32Array, which can hold 32-bit floating
point numbers. Here is the function that I call to do this:
function grabSmudgeData(x, y) { // (x,y) gives mouse location
var colors = OSG.getImageData(x-5,y-5,9,9);
if (smudgeColorArray == null) {
// Make image data & array the first time this function is called.
smudgeImageData = OSG.createImageData(9,9);
smudgeColorArray = new Float32Array(colors.data.length);
}
for (var i = 0; i < colors.data.length; i++) {
// Copy the color component data into the Float32Array.
smudgeColorArray[i] = colors.data[i];
}
}
The floating point array, smudgeColorArray, will be used for computing new color values for
the image as the mouse moves. The color values from this array will be copied into the image
data object, smudgeImageData, which will them be used to put the color values into the image.
This is done in another function, which is called for each point that is visited as the user drags
the Smudge tool over the canvas:
64 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
In this function, a new color is computed for each pixel in a 9-by-9 square of pixels around the
mouse location. The color is replaced by a weighted average of the current color of the pixel
and the color of the corresponding pixel in the smudgeColorArray. At the same time, the color
in smudgeColorArray is replaced by a similar weighted average.
It would be worthwhile to try to understand this example to see how pixel-by-pixel process-
ing of color data can be done. See the source code of the example for more details.
2.6.7 Images
For another example of pixel manipulation, we can look at image filters that modify an image
by replacing the color of each pixel with a weighted average of the color of that pixel and the
8 pixels that surround it. Depending on the weighting factors that are used, the result can be
as simple as a slightly blurred version of the image, or it can be something more interesting.
The on-line version of this section includes an interactive demo that lets you apply several
different image filters to a variety of images.
The filtering operation in the demo uses the image data functions getImageData, createIm-
ageData, and putImageData that were discussed above. Color data from the entire image is
obtained with a call to getImageData. The results of the averaging computation are placed
in a new image data object, and the resulting image data is copied back to the image using
putImageData.
The remaining question is, where do the original images come from, and how do they get
onto the canvas in the first place? An image on a web page is specified by an element in the
web page source such as
<img src="pic.jpg" width="400" height="300" id="mypic">
2.6. HTML CANVAS GRAPHICS 65
The src attribute specifies the URL from which the image is loaded. The optional id can be
used to reference the image in JavaScript. In the script,
image = document.getElementById("mypic");
gets a reference to the object that represents the image in the document structure. Once you
have such an object, you can use it to draw the image on a canvas. If graphics is a graphics
context for the canvas, then
graphics.drawImage(image, x, y);
draws the image with its upper left corner at (x,y). Both the point (x,y) and the image itself
are transformed by any transformation in effect in the graphics context. This will draw the
image using its natural width and height (scaled by the transformation, if any). You can also
specify the width and height of the rectangle in which the image is drawn:
graphics.drawImage(image, x, y, width, height);
With this version of drawImage, the image is scaled to fit the specified rectangle.
Now, suppose that the image you want to draw onto the canvas is not part of the web page?
In that case, it is possible to load the image dynamically. This is much like making an off-screen
canvas, but you are making an “off-screen image.” Use the document object to create an img
element:
newImage = document.createElement("img");
An img element needs a src attribute that specifies the URL from which it is to be loaded. For
example,
newImage.src = "pic2.jpg";
As soon as you assign a value to the src attribute, the browser starts loading the image. The
loading is done asynchronously; that is, the computer continues to execute the script without
waiting for the load to complete. This means that you can’t simply draw the image on the line
after the above assignment statement: The image is very likely not done loading at that time.
You want to draw the image after it has finished loading. For that to happen, you need to
assign a function to the image’s onload property before setting the src. That function will be
called when the image has been fully loaded. Putting this together, here is a simple JavaScript
function for loading an image from a specified URL and drawing it on a canvas after it has
loaded:
function loadAndDraw( imageURL, x, y ) {
var image = document.createElement("img");
image.onload = doneLoading;
image.src = imageURL;
function doneLoading() {
graphics.drawImage(image, x, y);
}
}
A similar technique is used to load the images in the filter demo.
There is one last mystery to clear up. When discussing the use of an off-screen canvas in the
SimplePaintProgram example earlier in this section, I noted that the contents of the off-screen
canvas have to be copied to the main canvas, but I didn’t say how that can be done. In fact,
it is done using drawImage. In addition to drawing an image onto a canvas, drawImage can
be used to draw the contents of one canvas into another canvas. In the sample program, the
command
66 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
graphics.drawImage( OSC, 0, 0 );
is used to draw the off-screen canvas to the main canvas. Here, graphics is a graphics context
for drawing on the main canvas, and OSC is the object that represents the off-screen canvas.
The first three lines say that this is an XML SVG document. The rest of the document is an
<svg> element that acts as a container for the entire scene description. You’ll need to know a
little about XML syntax. First, an XML “element” in its general form looks like this:
<elementname attrib1="value1" attrib2="value2">
...content...
</elementname>
The element starts with a “start tag,” which begins with a “<” followed by an identifier that
is the name of the tag, and ending with a ”>”. The start tag can include “attributes,” which
have the form name=“value”. The name is an identifier; the value is a string. The value
must be enclosed in single or double quotation marks. The element ends with an “end tag,”
which has an element name that matches the element name in the start tag and has the form
</elementname>. Element names and attribute names are case-sensitive. Between the start
and end tags comes the “content” of the element. The content can consist of text and nested
elements. If an element has no content, you can replace the “>” at the end of the start tag with
“/>”, and leave out the end tag. This is called a “self-closing tag.” For example,
<circle cx="5" cy="5" r="4" fill="red"/>
This is an actual SVG element that specifies a circle. It’s easy to forget the “/” at the end of
a self-closing tag, but it has to be there to have a legal XML document.
Looking back at the SVG document, the five lines starting with <svg are just a long start
tag. You can use the tag as shown, and customize the values of the width, height, viewBox,
and preserveAspectRatio attributes. The next line is a comment; comments in XML start with
“<!--” and end with “-->”.
The width and height attributes of the <svg> tag specify a natural or preferred size for the
image. It can be forced into a different size, for example if it is used in an <img> element on
a web page that specifies a different width and height. The size can be specified using units
of measure such as in for inches, cm for centimeters, and px, for pixels, with 90 pixels to the
inch. If no unit of measure is specified, pixels are used. There cannot be any space between
the number and the unit of measure.
The viewBox attribute sets up the coordinate system that will be used for drawing the
image. It is what I called the view window in Subsection 2.3.1. The value for viewBox is a
list of four numbers, giving the minimum x-value, the minimum y-value, the width, and the
height of the view window. The width and the height must be positive, so x increases from
left-to-right, and y increases from top-to-bottom. The four numbers in the list can be separated
either by spaces or by commas; this is typical for lists of numbers in SVG.
Finally, the preserveAspectRatio attribute tells what happens when the aspect ratio of the
viewBox does not match the aspect ratio of the rectangle in which the image is displayed.
The default value, “xMidYMid”, will extend the limts on the viewBox either horizontally or
vertically to preserve the aspect ratio, and the viewBox will appear in the center of the display
rectangle. If you would like your image to stretch to fill the display rectangle, ignoring the aspect
ratio, set the value of preserveAspectRatio to “none”. (The aspect ratio issue was discussed in
Subsection 2.3.7.)
Let’s look at a complete SVG document that draws a few simple shapes. Here’s the doc-
ument. You could probably figure out what it draws even without knowing any more about
SVG:
<?xml version="1.0"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
68 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
width="300px" height="200px"
viewBox="0 0 3 2"
preserveAspectRatio="xMidYMid">
<rect x="0" y="0" width="3" height="2"
stroke="blue" fill="none" stroke-width="0.05"/>
<text x="0.2" y="0.5" font-size="0.4" fill="red">Hello World!</text>
<line x1="0.1" y1="0.7" x2="2.9" y2="0.7" stroke-width="0.05" stroke="blue"/>
<ellipse cx="1.5" cy="1.4" rx=".6" ry=".4" fill="rgb(0,255,180)"/>
<circle cx="0.4" cy="1.4" r="0.3"
fill="magenta" stroke="black" stroke-width="0.03"/>
<polygon points="2.2,1.7 2.4,1 2.9,1.7"
fill="none" stroke="green" stroke-width="0.02"/>
</svg>
Hello World!
In the drawing coordinate system for this example, x ranges from 0 to 3, and y ranges from
0 to 2. All values used for drawing, including stroke width and font size, are given in terms
of this coordinate system. Remember that you can use any coordinate system that you find
convenient! Note, by the way, that parts of the image that are not covered by the shapes that
are drawn will be transparent.
Here’s another example, with a larger variety of shapes. The source code for this example
has a lot of comments. It uses features that we will discuss in the remainer of this section.
2.7. SVG: A SCENE DESCRIPTION LANGUAGE 69
You can take a look at the source code, svg/svg-starter.svg. (For example, open it in a text
editor, or open it in a web browser and use the browser’s “view source” command.)
gives a rectangle with corner at (100,200), width 640, and height 480. (Note, by the way, that
the attributes in an XML element can be given in any order.) The rect element also has optional
attributes rx and ry that can be used to make “roundRects,” with their corners replaced by
elliptical arcs. The values of rx and ry give the horizontal and vertical radii of the elliptical
arcs.
Style attributes can be added to say how the shape should be stroked and filled. The default
is to use a black fill and no stroke. (More precisely, as we will see later, the default for is for a
shape to inherit the values of style attributes from its environment. Black fill and no stroke is
the initial environment.) Here are some common style attributes:
• fill — specifies how to fill the shape. The value can be “none” to indicate that the shape
is not filled. It can be be a color, in the same format as the CSS colors that are used in
the HTML canvas API. For example, it can be a common color name such as “black” or
“red”, or an RGB color such as “rgb(255,200,180)”. There are also gradient and pattern
fills, though I will not discuss them here.
• stroke — specifies how to stroke the shape, with the same possible values as “fill”.
• stroke-opacity and fill-opacity — are numbers between 0.0 and 1.0 that specify the
opacity of the stroke and fill. Values less than 1.0 give a translucent stroke or fill. The
default value, 1.0, means fully opaque.
• stroke-width — is a number that sets the line width to use for the stroke. Note that the
line width is subject to transforms. The default value is ”1”, which is fine if the coordinate
system is using pixels as the unit of measure, but often too wide in custom coordinate
systems.
• stroke-linecap — determines the appearance of the endpoints of a stroke. The value
can be “square”, “round”, or “butt”. The default is “butt”. (See Subsection 2.2.1 for a
discussion of line caps and joins.)
70 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
∗ ∗ ∗
The transform attribute can be used to apply a transform or a series of transforms to a
shape. As an example, we can make a rectangle tilted 30 degrees from the horizontal:
<rect width="100" height="50" transform="rotate(30)"/>
The value “rotate(30)” represents a rotation of 30 degrees (not radians!) about the origin, (0,0).
The positive direction of rotation, as usual, rotates the positive x-axis in the direction of the
positive y-axis. You can specify a different center of rotation by adding arguments to rotate.
For example, to rotate the same rectangle about its center
<rect width="100" height="50" transform="rotate(30,50,25)"/>
Translation and scaling work as you probably expect, with transform values of the form
“translate(dx,dy)” and “scale(sx,sy)”. There are also shear transforms, but they go by the
names skewX and skewY, and the argument is a skew angle rather than a shear amount. For
example, the transform “skewX(45)” tilts the y-axis by 45 degrees and is equivalent to an
x-shear with shear factor 1. (The function that tilts the y-axis is called skewX because it
modifies, or skews, the x-coordinates of points while leaving their y-coordinates unchanged.)
For example, we can use skewX to tilt a rectangle and make it into a parallelogram:
<rect width="100" height="50" transform="skewX(-30)"/>
I used an angle of -30 degrees to make the rectangle tilt to the right in the usual pixel coordinate
system.
The value of the transform attribute can be a list of transforms, separated by spaces or
commas. The transforms are applied to the object, as usual, in the opposite of the order in
which they are listed. So,
<rect width="100" height="50"
transform="translate(0,50) rotate(45) skewX(-30)"/>
would first skew the rectangle into a parallelogram, then rotate the parallelogram by 45 degrees
about the origin, then translate it by 50 units in the y-direction.
∗ ∗ ∗
In addition to rectangles, SVG has lines, circles, ellipses, and text as basic shapes. Here
are some details. A <line> element represents a line segement and has geometric attributes
x1, y1, x2, and y2 to specify the coordinates of the endpoints of the line segment. These four
attributes have zero as default value, which makes it easier to specify horizontal and vertical
lines. For example,
2.7. SVG: A SCENE DESCRIPTION LANGUAGE 71
are already familiar with are coded by the letters M, L, C, and Q. The command for closing a
path segment is Z, and it requires no data. For example the path data “M 10 20 L 100 200”
would draw a line segment from the point (10,20) to the point (100,200). You can combine
several connected line segments into one L command. For example, the <polygon> example
given above could be created using the <path> element
<path d="M 0,0 L 100,0 100,75 50,100 0,75 Z"/>
The Z at the end of the data closes the path by adding the final side to the polygon. (Note
that, as usual, you can use either commas or spaces in the data.)
The C command takes six numbers as data, to specify the two control points and the final
endpoint of the cubic Bezier curve segment. You can also give a multiple of six values to get
a connected sequence of curve segements. Similarly, the Q command uses four data values to
specify the control point and final endpoint of the quadratic Bezier curve segment. The large,
curvy, yellow shape shown in the picture earlier in this section was created as a path with two
line segments and two Bezier curve segments:
<path
d="M 20,70 C 150,70 250,350 380,350 L 380,380 C 250,380 150,100 20,100 Z"
fill="yellow" stroke-width="2" stroke="black"/>
SVG paths add flexibility by defining “relative” versions of the path commands, where the
data for the command is given relative to the current position. A relative move command, for
example, instead of telling where to move, tells how far to move from the current position. The
names of the relative versions of the path commands are lower case letters instead of upper
case. “M 10,20” means to move to the point with coordinates (10,20), while “m 10,20” means
to move 10 units horizontally and 20 units vertically from the current position. Similarly, if the
current position is (x,y), then the command “l 3,5”, where the first character is a lower case L,
draws a line from (x,y) to (x +3,y+5 ).
The nested shapes use fill=“none” stroke=“black” stroke-width=“2” for the default values
of the attributes. The default can be overridden by specifying a different value for the
element, as is done for the stroke-width of the <path> element in this example. Setting
transform=“scale(1,−1)” for the group flips the entire image vertically. I do this only because I
am more comfortable working in a coordinate system in which y increases from bottom-to-top
rather than top-to-bottom. Here is the simple line drawing of a face that is produced by this
group:
Now, suppose that we want to include multiple copies of an object in a scene. It shouldn’t
be necessary to repeat the code for drawing the object. It would be nice to have something like
reusable subroutines. In fact, SVG has something very similar: You can define reusable objects
inside a <defs> element. An object that is defined inside <defs> is not added to the scene, but
copies of the object can be added to the scene with a single command. For this to work, the
object must have an id attribute to identify it. For example, we could define an object that
looks like a plus sign:
<defs>
<g id="plus" stroke="black">
<line x1="-20" y1="0" x2="20" y2="0"/>
<line x1="0" y1="-20" x2="0" y2="20"/>
</g>
</defs>
A <use> element can then be used to add a copy of the plus sign object to the scene. The
syntax is
<use xlink:href="#plus"/>
The value of the xlink:href attribute must be the id of the object, with a “#” character added
at the beginning. (Don’t forget the #. If you leave it out, the <use> element will simply be
ignored.) You can add a transform attribute to the <use> element to apply a transformation
to the copy of the object. You can also apply style attributes, which will be used as default
values for the attributes in the copy. For example, we can draw several plus signs with different
transforms and stroke widths:
<use xlink:href="#plus" transform="translate(50,20)" stroke-width="5"/>
<use xlink:href="#plus" transform="translate(0,30) rotate(45)"/>
Note that we can’t change the color of the plus sign, since it already specifies its own stroke
color.
An object that has been defined in the <defs> section can also be used as a sub-object in
other object definitions. This makes it possible to create a hierarchy with multiple levels. Here
is an example from svg/svg-hierarchy.svg that defines a “wheel” object, then uses two copies of
the wheel as sub-objects in a “cart” object:
74 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
<defs>
<!-- Define an object that represents a wheel centered at (0,0) and with
radius 1. The wheel is made out of several filled circles, with
thin rectangles for the spokes. -->
<g id="wheel">
<circle cx="0" cy="0" r="1" fill="black"/>
<circle cx="0" cy="0" r="0.8" fill="lightGray"/>
<rect x="-0.9" y="-0.05" width="1.8" height=".1" fill="black"/>
<rect x="-0.9" y="-0.05" width="1.8" height=".1" fill="black"
transform="rotate(120)"/>
<rect x="-0.9" y="-0.05" width="1.8" height=".1" fill="black"
transform="rotate(240)"/>
<circle cx="0" cy="0" r="0.2" fill="black"/>
</g>
<!-- Define an object that represents a cart made out of two wheels,
with two rectangles for the body of the cart. -->
<g id="cart">
<use xlink:href="#wheel" transform="translate(-1.5,-0.1) scale(0.8,0.8)"/>
<use xlink:href="#wheel" transform="translate(1.5,-0.1) scale(0.8,0.8)"/>
<rect x="-3" y="0" width="6" height="2"/>
<rect x="-2.3" y="1.9" width="2.6" height="1"/>
</g>
</defs>
The SVG file goes on to add one copy of the wheel and four copies of the cart to the image.
The four carts have different colors and transforms. Here is the image:
2.7.5 Animation
SVG has a number of advanced features that I won’t discuss here, but I do want to mention one:
animation. It is possible to animate almost any property of an SVG object, including geometry,
style, and transforms. The syntax for animation is itself fairly complex, and I will only do a few
examples. But I will tell you enough to produce a fairly complex hierarchical animation like
the “cart-and-windmills” example that was discussed and used as a demo in Subsection 2.4.1.
An SVG version of that animation can be found in svg/cart-and-windmills.svg. (But note that
some web browsers do not implement SVG animations correctly or at all.)
Many attributes of a shape element can be animated by adding an <animate> element to
the content of the shape element. Here is an example that makes a rectangle move across the
image from left to right:
2.7. SVG: A SCENE DESCRIPTION LANGUAGE 75
Note that the <animate> is nested inside the <rect>. The attributeName attribute tells which
attribute of the <rect> is being animated, in this case, x. The from and to attributes say that
x will take on values from 0 to 430. The dur attribute is the “duration”, that is, how long
the animation lasts; the value “7s” means “7 seconds.” The attribute repeatCount=“indefinite”
means that after the animation completes, it will start over, and it will repeat indefinitely, that
is, as long as the image is displayed. If the repeatCount attribute is omitted, then after the
animation runs once, the rectangle will jump back to its original position and remain there. If
repeatCount is replaced by fill =“freeze”, then after the animation runs, the rectangle will br
frozen in its final position, instead of jumping back to the starting position. The animation
begins when the image first loads. If you want the animation to start at a later time, you can
add a begin attribute whose value gives the time when the animation should start, as a number
of seconds after the image loads.
What if we want the rectangle to move back and forth between its initial and final position?
For that, we need something called keyframe animation, which is an important idea in its
own right. The from and to attributes allow you to specify values only for the beginning and
end of the animation. In a keyframe animation, values are specified at additional times in
the middle of the animation. For a keyframe animation in SVG, the from and to attributes
are replaced by keyTimes and values. Here is our moving rectangle example, modified to use
keyframes:
<rect x="0" y="210" width="40" height="40">
<animate attributeName="x"
keyTimes="0;0.5;1" values="0;430;0" dur="7s"
repeatCount="indefinite"/>
</rect>
The keyTimes attribute is a list of numbers, separated by semicolons. The numbers are in
the range 0 to 1, and should be in increasing order. The first number should be 0 and the
last number should be 1. A number specifies a time during the animation, as a fraction of the
complete animation. For example, 0.5 is a point half-way through the animation, and 0.75 is
three-quarters of the way. The values attribute is a list of values, with one value for each key
time. In this case, the value for x is 0 at the start of the animation, 430 half-way through the
animation, and 0 again at the end of the animation. Between the key times, the value for x
is obtained by interpolating between the values specified for the key times. The result in this
case is that the rectangle moves from left to right during the first half of the animation and
then back from right to left in the second half.
Transforms can also be animated, but you need to use the <animateTransform> tag instead
of <animate>, and you need to add a type attribute to specify which transform you are ani-
mating, such as “rotate” or “translate”. Here, for example, is a transform animation applied
to a group:
<g transform="scale(0,0)">
<animateTransform attributeName="transform" type="scale"
from="0,0" to="0.4,0.7"
begin="3s" dur="15s" fill="freeze"/>
76 CHAPTER 2. TWO-DIMENSIONAL GRAPHICS
77
78 CHAPTER 3. OPENGL 1.1: GEOMETRY
to the special characteristics of arrays in the C language. My examples will follow the C syntax,
with a few notes about how things can be different in other languages. Since I’m following the C
API, I will refer to “functions” rather than “subroutines” or “methods.” Section 3.6 explains in
detail how to write OpenGL programs in C and in Java. You will need to consult that section
before you can do any actual programming. The live OpenGL 1.1 demos for this book are
written using a JavaScript simulator that implements a subset of OpenGL 1.1. That simulator
is also discussed in Section 3.6.
Each vertex of the triangle is specified by a call to the function glVertex2f. Vertices must be
specified between calls to glBegin and glEnd. The parameter to glBegin tells which type of
primitive is being drawn. The GL TRIANGLES primitive allows you to draw more than one
triangle: Just specify three vertices for each triangle that you want to draw.
(I should note that these functions actually just send commands to the GPU. OpenGL can
save up batches of commands to transmit together, and the drawing won’t actually be done
until the commands are transmitted. To ensure that that happens, the function glFlush() must
be called. In some cases, this function might be called automatically by an OpenGL API, but
you might well run into times when you have to call it yourself.)
For OpenGL, vertices have three coordinates. The function glVertex2f specifies the x and y
coordinates of the vertex, and the z coordinate is set to zero. There is also a function glVertex3f
that specifies all three coordinates. The “2” or “3” in the name tells how many parameters are
passed to the function. The “f” at the end of the name indicates that the parameters are of type
float. In fact, there are other “glVertex” functions, including versions that take parameters of
type int or double, with named like glVertex2i and glVertex3d. There are even versions that
take four parameters, although it won’t be clear for a while why they should exist. And, as we
will see later, there are versions that take an array of numbers instead of individual numbers
as parameters. The entire set of vertex functions is often referred to as “glVertex*”, with the
“*” standing in for the parameter specification. (The proliferation of names is due to the fact
that the C programming language doesn’t support overloading of function names; that is, C
distinguishes functions only by their names and not by the number and type of parameters that
are passed to the function.)
OpenGL 1.1 has ten kinds of primitive. Seven of them still exist in modern OpenGL; the
other three have been dropped. The simplest primitive is GL POINTS, which simply renders
a point at each vertex of the primitive. By default, a point is rendered as a single pixel. The
size of point primitives can be changed by calling
glPointSize(size);
3.1. SHAPES AND COLORS IN OPENGL 1.1 79
where the parameter, size, is of type float and specifies the diameter of the rendered point, in
pixels. By default, points are squares. You can get circular points by calling
glEnable(GL POINT SMOOTH);
The functions glPointSize and glEnable change the OpenGL “state.” The state includes
all the settings that affect rendering. We will encounter many state-changing functions. The
functions glEnable and glDisable can be used to turn many features on and off. In general, the
rule is that any rendering feature that requires extra computation is turned off by default. If you
want that feature, you have to turn it on by calling glEnable with the appropriate parameter.
There are three primitives for drawing line segments: GL LINES, GL LINE STRIP, and
GL LINE LOOP. GL LINES draws disconnected line segments; specify two vertices for each
segment that you want to draw. The other two primitives draw connected sequences of line
segments. The only difference is that GL LINE LOOP adds an extra line segment from the
final vertex back to the first vertex. Here is what you get if use the same six vertices with the
four primitives we have seen so far:
A A A A
B B B B
C C C C
F F F F
D D D D
E E E E
The points A, B, C, D, E, and F were specified in that order. In this illustration, all the points
lie in the same plane, but keep in mind that in general, points can be anywhere in 3D space.
The width for line primitives can be set by calling glLineWidth(width). The line width is
always specified in pixels. It is not subject to scaling by transformations.
Let’s look at an example. OpenGL does not have a circle primitive, but we can approximate
a circle by drawing a polygon with a large number of sides. To draw an outline of the polygon,
we can use a GL LINE LOOP primitive:
glBegin( GL LINE LOOP );
for (i = 0; i < 64; i++) {
angle = 6.2832 * i / 64; // 6.2832 represents 2*PI
x = 0.5 * cos(angle);
y = 0.5 * sin(angle);
glVertex2f( x, y );
}
glEnd();
This draws an approximation for the circumference of a circle of radius 0.5 with center at (0,0).
Remember that to learn how to use examples like this one in a complete, running program,
you will have to read Section 3.6. Also, you might have to make some changes to the code,
depending on which OpenGL implementation you are using.
The next set of primitives is for drawing triangles. There are three of them:
GL TRIANGLES, GL TRIANGLE STRIP, and GL TRIANGLE FAN.
80 CHAPTER 3. OPENGL 1.1: GEOMETRY
F
A A A E
C C F
E
B E B E D
B G
G G
D D A
C H
D
F I F I
B C
H H
GL_TRIANGLES GL_TRIANGLE_STRIP GL_TRIANGLE_FAN
The three triangles on the left make up one GL TRIANGLES primitive, with nine ver-
tices. With that primitive, every set of three vertices makes a separate triangle. For a
GL TRIANGLE STRIP primitive, the first three vertices produce a triangle. After that,
every new vertex adds another triangle to the strip, connecting the new vertex to the two
previous vertices. Two GL TRIANGLE FAN primitives are shown on the right. Again for a
GL TRIANGLE FAN, the first three vertices make a triangle, and every vertex after that adds
anther triangle, but in this case, the new triangle is made by connecting the new vertex to the
previous vertex and to the very first vertex that was specified (vertex “A” in the picture). Note
that Gl TRIANGLE FAN can be used for drawing filled-in polygons. In this picture, by the
way, the dots and lines are not part of the primitive; OpenGL would only draw the filled-in,
green interiors of the figures.
The three remaining primitives, which have been removed from modern OpenGL, are
GL QUADS, GL QUAD STRIP, and GL POLYGON. The name “quad” is short for quadrilat-
eral, that is, a four-sided polygon. A quad is determined by four vertices. In order for a quad
to be rendered correctly in OpenGL, all vertices of the quad must lie in the same plane. The
same is true for polygon primitives. Similarly, to be rendered correctly, quads and polygons
must be convex (see Subsection 2.2.3). Since OpenGL doesn’t check whether these conditions
are satisfied, the use of quads and polygons is error-prone. Since the same shapes can easily be
produced with the triangle primitives, they are not really necessary, but here for the record are
some examples:
D B F
A
C D
A A D
H F
H E
B G C
L J B
E E E C
F K G L C
D
I I
J K A B
GL_QUADS GL_QUAD_STRIP GL_POLYGON
The vertices for these primitives are specified in the order A, B, C, . . . . Note how the order
differs for the two quad primitives: For GL QUADS, the vertices for each individual quad
should be specified in counterclockwise order around the quad; for GL QUAD STRIP, the
vertices should alternate from one side of the strip to the other.
a suffix that gives the number and type of the parameters. I should warn you now that for
realistic 3D graphics, OpenGL has a more complicated notion of color that uses a different
set of functions. You will learn about that in the next chapter, but for now we will stick to
glColor*.
For example, the function glColor3f has three parameters of type float. The parameters
give the red, green, and blue components of the color as numbers in the range 0.0 to 1.0. (In
fact, values outside this range are allowed, even negative values. When color values are used in
computations, out-of-range values will be used as given. When a color actually appears on the
screen, its component values are clamped to the range 0 to 1. That is, values less than zero are
changed to zero, and values greater than one are changed to one.)
You can add a fourth component to the color by using glColor4f (). The fourth component,
known as alpha, is not used in the default drawing mode, but it is possible to configure OpenGL
to use it as the degree of transparency of the color, similarly to the use of the alpha component in
the 2D graphics APIs that we have looked at. You need two commands to turn on transparency:
glEnable(GL BLEND);
glBlendFunc(GL SRC ALPHA, GL ONE MINUS SRC ALPHA);
The first command enables use of the alpha component. It can be disabled by calling glD-
isable(GL BLEND). When the GL BLEND option is disabled, alpha is simply ignored. The
second command tells how the alpha component of a color will be used. The parameters shown
here are the most common; they implement transparency in the usual way. I should note that
while transparency works fine in 2D, it is much more difficult to use transparency correctly in
3D.
If you would like to use integer color values in the range 0 to 255, you can use glColor3ub()
or glColor4ub to set the color. In these function names, “ub” stands for “unsigned byte.”
Unsigned byte is an eight-bit data type with values in the range 0 to 255. Here are some
examples of commands for setting drawing colors in OpenGL:
glColor3f(0,0,0); // Draw in black.
glColor3f(1,1,1); // Draw in white.
glColor3f(1,0,0); // Draw in full-intensity red.
glColor3ub(1,0,0); // Draw in a color just a tiny bit different from
// black. (The suffix, "ub" or "f", is important!)
glColor3ub(255,0,0); // Draw in full-intensity red.
glColor4f(1, 0, 0, 0.5); // Draw in transparent red, but only if OpenGL
// has been configured to do transparency. By
// default this is the same as drawing in plain red.
Using any of these functions sets the value of a “current color,” which is part of the OpenGL
state. When you generate a vertex with one of the glVertex* functions, the current color is
saved along with the vertex coordinates, as an attribute of the vertex. We will see that vertices
can have other kinds of attribute as well as color. One interesting point about OpenGL is
that colors are associated with individual vertices, not with complete shapes. By changing the
current color between calls to glBegin() and glEnd (), you can get a shape in which different
vertices have different color attributes. When you do this, OpenGL will compute the colors
of pixels inside the shape by interpolating the colors of the vertices. (Again, since OpenGL is
extremely configurable, I have to note that interpolation of colors is just the default behavior.)
82 CHAPTER 3. OPENGL 1.1: GEOMETRY
For example, here is a triangle in which the three vertices are assigned the colors red, green,
and blue:
This image is often used as a kind of “Hello World” example for OpenGL. The triangle can be
drawn with the commands
glBegin(GL TRIANGLES);
glColor3f( 1, 0, 0 ); // red
glVertex2f( -0.8, -0.8 );
glColor3f( 0, 1, 0 ); // green
glVertex2f( 0.8, -0.8 );
glColor3f( 0, 0, 1 ); // blue
glVertex2f( 0, 0.9 );
glEnd();
Note that when drawing a primitive, you do not need to explicitly set a color for each vertex,
as was done here. If you want a shape that is all one color, you just have to set the current
color once, before drawing the shape (or just after the call to glBegin(). For example, we can
draw a solid yellow triangle with
glColor3ub(255,255,0); // yellow
glBegin(GL TRIANGLES);
glVertex2f( -0.5, -0.5 );
glVertex2f( 0.5, -0.5 );
glVertex2f( 0, 0.5 );
glEnd();
Also remember that the color for a vertex is specified before the call to glVertex* that generates
the vertex.
The on-line version of this section has an interactive demo that draws the basic OpenGL
triangle, with different colored vertices. That demo is our first OpenGL example. The demo
actually uses WebGL, so you can use it as a test to check whether your web browser supports
WebGL.
The sample program jogl/FirstTriangle.java draws the basic OpenGL triangle using Java.
The program glut/first-triangle.c does the same using the C programming language. And
glsim/first-triangle.html is a version that uses my JavaScript simulator, which implements just
the parts of OpenGL 1.1 that are covered in this book. Any of those programs could be used
to experiment with 2D drawing in OpenGL. And you can use them to test your OpenGL
programming environment.
3.1. SHAPES AND COLORS IN OPENGL 1.1 83
∗ ∗ ∗
A common operation is to clear the drawing area by filling it with some background color.
It is be possible to do that by drawing a big colored rectangle, but OpenGL has a potentially
more efficient way to do it. The function
glClearColor(r,g,b,a);
sets up a color to be used for clearing the drawing area. (This only sets the color; the color
isn’t used until you actually give the command to clear the drawing area.) The parameters
are floating point values in the range 0 to 1. There are no variants of this function; you must
provide all four color components, and they must be in the range 0 to 1. The default clear color
is all zeros, that is, black with an alpha component also equal to zero. The command to do the
actual clearing is:
glClear( GL COLOR BUFFER BIT );
The correct term for what I have been calling the drawing area is the color buffer , where
“buffer” is a general term referring to a region in memory. OpenGL uses several buffers in
addition to the color buffer. We will encounter the “depth buffer” in just a moment. The
glClear command can be used to clear several different buffers at the same time, which can
be more efficient than clearing them separately since the clearing can be done in parallel. The
parameter to glClear tells it which buffer or buffers to clear. To clear several buffers at once,
combine the constants that represent them with an arithmetic OR operation. For example,
glClear( GL COLOR BUFFER BIT | GL DEPTH BUFFER BIT );
This is the form of glClear that is generally used in 3D graphics, where the depth buffer plays
an essential role. For 2D graphics, the depth buffer is generally not used, and the appropriate
parameter for glClear is just GL COLOR BUFFER BIT.
float coords[] = { -0.5, -0.5, 0.5, -0.5, 0.5, 0.5, -0.5, 0.5 };
glBegin(GL TRIANGLE FAN);
glVertex2fv(coords); // Uses coords[0] and coords[1].
glVertex2fv(coords + 2); // Uses coords[2] and coords[3].
glVertex2fv(coords + 4); // Uses coords[4] and coords[5].
glVertex2fv(coords + 6); // Uses coords[6] and coords[7].
glEnd();
This example uses “pointer arithmetic,” in which coords + N represents a pointer to the N-th
element of the array. An alternative notation would be &coords[N ], where “&” is the address
operator, and &coords[N ] means “a pointer to coords[N ]”. This will all seem very alien to
people who are only familiar with Java or JavaScript. In my examples, I will avoid using
pointer arithmetic, but I will occasionally use address operators.
As for Java, the people who designed JOGL wanted to preserve the ability to pull data
out of the middle of an array. However, it’s not possible to work with pointers in Java. The
solution was to replace a pointer parameter in the C API with a pair of parameters in the JOGL
API—one parameter to specify the array that contains the data and one to specify the starting
index of the data in the array. For example, here is how the square-drawing code translates
into Java:
float[] coords = { -0.5F, -0.5F, 0.5F, -0.5F, 0.5F, 0.5F, -0.5F, 0.5F };
gl2.glBegin(GL2.GL TRIANGLES);
gl2.glVertex2fv(coords, 0); // Uses coords[0] and coords[1].
gl2.glVertex2fv(coords, 2); // Uses coords[2] and coords[3].
gl2.glVertex2fv(coords, 4); // Uses coords[4] and coords[5].
gl2.glVertex2fv(coords, 6); // Uses coords[6] and coords[7].
gl2.glEnd();
There is really not much difference in the parameters, although the zero in the first glVertex2fv
is a little annoying. The main difference is the prefixes “gl2” and “GL2”, which are required
by the object-oriented nature of the JOGL API. I won’t say more about JOGL here, but if you
need to translate my examples into JOGL, you should keep in mind the extra parameter that
is required when working with arrays.
For the record, here are the glVertex* and glColor* functions that I will use in this book.
This is not the complete set that is available in OpenGL:
glVertex2f( x, y ); glVertex2fv( xyArray );
glVertex2d( x, y ); glVertex2dv( xyArray );
glVertex2i( x, y ); glVertex2iv( xyArray );
glVertex3f( x, y, z ); glVertex3fv( xyzArray );
glVertex3d( x, y, z ); glVertex3dv( xyzArray );
glVertex3i( x, y, z ); glVertex3iv( xyzArray );
glColor3f( r, g, b ); glColor3f( rgbArray );
glColor3d( r, g, b ); glColor3d( rgbArray );
glColor3ub( r, g, b ); glColor3ub( rgbArray );
glColor4f( r, g, b, a); glColor4f( rgbaArray );
glColor4d( r, g, b, a); glColor4d( rgbaArray );
glColor4ub( r, g, b, a); glColor4ub( rgbaArray );
For glColor*, keep in mind that the “ub” variations require integers in the range 0 to 255, while
the “f” and “d” variations require floating-point numbers in the range 0.0 to 1.0.
3.1. SHAPES AND COLORS IN OPENGL 1.1 85
It can be turned off by calling glDisable(GL DEPTH TEST ). If you forget to enable the depth
test when drawing in 3D, the image that you get will likely be confusing and will make no
sense physically. You can also get quite a mess if you forget to clear the depth buffer, using
the glClear command shown earlier in this section, at the same time that you clear the color
buffer.
The demo c3/first-cube.html in the online version of this section lets you experiment with
the depth test. It also lets you see what happens when part of your geometry extends outside
the visible range of z -values.
86 CHAPTER 3. OPENGL 1.1: GEOMETRY
Here are are a few details about the implementation of the depth test: For each pixel,
the depth buffer stores a representation of the distance from the viewer to the point that is
currently visible at that pixel. This value is essentially the z -coordinate of the point, after any
transformations have been applied. (In fact, the depth buffer is often called the “z-buffer”.)
The range of possible z -coordinates is scaled to the range 0 to 1. The fact that there is only a
limited range of depth buffer values means that OpenGL can only display objects in a limited
range of distances from the viewer. A depth value of 0 corresponds to the minimal distance; a
depth value of 1 corresponds to the maximal distance. When you clear the depth buffer, every
depth value is set to 1, which can be thought of as representing the background of the image.
You get to choose the range of z -values that is visible in the image, by the transformations
that you apply. The default range, in the absence of any transformations, is -1 to 1. Points
with z -values outside the range are not visible in the image. It is a common problem to use
too small a range of z -values, so that objects are missing from the scene, or have their fronts
or backs cut off, because they lie outside of the visible range. You might be tempted to use a
huge range, to make sure that the objects that you want to include in the image are included
within the range. However, that’s not a good idea: The depth buffer has a limited number
of bits per pixel and therefore a limited amount of accuracy. The larger the range of values
that it must represent, the harder it is to distinguish between objects that are almost at the
same depth. (Think about what would happen if all objects in your scene have depth values
between 0.499999 and 0.500001—the depth buffer might see them all as being at exactly the
same depth!)
There is another issue with the depth buffer algorithm. It can give some strange results
when two objects have exactly the same depth value. Logically, it’s not even clear which object
should be visible, but the real problem with the depth test is that it might show one object
at some points and the second object at some other points. This is possible because numerical
calculations are not perfectly accurate. Here an actual example:
In the two pictures shown here, a gray square was drawn, followed by a white square, followed
by a black square. The squares all lie in the same plane. A very small rotation was applied,
to force the computer do some calculations before drawing the objects. The picture on the left
was drawn with the depth test disabled, so that, for example, when a pixel of the white square
was drawn, the computer didn’t try to figure out whether it lies in front of or behind the gray
square; it simply colored the pixel white. On the right, the depth test was enabled, and you
can see the strange result.
Finally, by the way, note that the discussion here assumes that there are no transparent ob-
jects. Unfortunately, the depth test does not handle transparency correctly, since transparency
means that two or more objects can contribute to the color of the pixel, but the depth test
assumes that the pixel color is the color of the object nearest to the viewer at that point. To
3.2. 3D COORDINATES AND TRANSFORMS 87
handle 3D transparency correctly in OpenGL, you pretty much have to resort to implementing
the painter’s algorithm by hand, at least for the transparent objects in the scene.
3.2.1 3D Coordinates
A coordinate system is a way of assigning numbers to points. In two dimensions, you need a
pair of numbers to specify a point. The coordinates are often referred to as x and y, although
of course, the names are arbitrary. More than that, the assignment of pairs of numbers to
points is itself arbitrary to a large extent. Points and objects are real things, but coordinates
are just numbers that we assign to them so that we can refer to them easily and work with
them mathematically. We have seen the power of this when we discussed transforms, which are
defined mathematically in terms of coordinates but which have real, useful physical meanings.
In three dimensions, you need three numbers to specify a point. (That’s essentially what
it means to be three dimensional.) The third coordinate is often called z. The z -axis is
perpendicular to both the x -axis and the y-axis.
This image illustrates a 3D coordinate system. The positive directions of the x, y, and z
axes are shown as big arrows. The x -axis is green, the y-axis is blue, and the z -axis is red. The
on-line version of this section has a demo version of this image in which you drag on the axes
to rotate the image.
This example is a 2D image, but it has a 3D look. (The illusion is much stronger if you can
rotate the image.) Several things contribute to the effect. For one thing, objects that are farther
away from the viewer in 3D look smaller in the 2D image. This is due to the way that the 3D
scene is “projected” onto 2D. We will discuss projection in the next section. Another factor
is the “shading” of the objects. The objects are shaded in a way that imitates the interaction
of objects with the light that illuminates them. We will put off a discussion of lighting until
Chapter 4. In this section, we will concentrate on how to construct a scene in 3D—what we
have referred to as modeling.
OpenGL programmers usually think in terms of a coordinate system in which the x - and
y-axes lie in the plane of the screen, and the z -axis is perpendicular to the screen with the
88 CHAPTER 3. OPENGL 1.1: GEOMETRY
positive direction of the z -axis pointing out of the screen towards the viewer. Now, the default
coordinate system in OpenGL, the one that you are using if you apply no transformations at
all, is similar but has the positive direction of the z -axis pointing into the screen. This is not
a contradiction: The coordinate system that is actually used is arbitrary. It is set up by a
transformation. The convention in OpenGL is to work with a coordinate system in which the
positive z -direction points toward the viewer and the negative z -direction points away from the
viewer. The transformation into default coordinates reverses the direction of the z -axis.
This conventional arrangement of the axes produces a right-handed coordinate system.
This means that if you point the thumb of your right hand in the direction of the positive
z -axis, then when you curl the fingers of that hand, they will curl in the direction from the
positive x -axis towards the positive y-axis. If you are looking at the tip of your thumb, the curl
will be in the counterclockwise direction. Another way to think about it is that if you curl the
figures of your right hand from the positive x to the positive y-axis, then your thumb will point
in the direction of the positive z -axis. The default OpenGL coordinate system (which, again,
is hardly ever used) is a left-handed system. You should spend some time trying to visualize
right- and left-handed coordinates systems. Use your hands!
All of that describes the natural coordinate system from the viewer’s point of view, the
so-called “eye” or “viewing” coordinate system. However, these eye coordinates are not
necessarily the natural coordinates on the world. The coordinate system on the world—the
coordinate system in which the scene is assembled—is referred to as world coordinates.
Recall that objects are not usually specified directly in world coordinates. Instead, objects
are specified in their own coordinate system, known as object coordinates, and then modeling
transforms are applied to place the objects into the world, or into more complex objects. In
OpenGL, object coordinates are the numbers that are used in the glVertex* function to specify
the vertices of the object. However, before the objects appear on the screen, they are usually
subject to a sequence of transformations, starting with a modeling transform.
or by the command
glTranslated( dx, dy, dz );
The translation will affect any drawing that is done after the command is given. Note that
there are two versions of the command. The first, with a name ending in “f”, takes three float
values as parameters. The second, with a name ending in “d”, takes parameters of type double.
As an example,
glTranslatef( 0, 0, 1 );
3.2. 3D COORDINATES AND TRANSFORMS 89
Remember that transforms are applied to objects that are drawn after the transformation
function is called, and that transformations apply to objects in the opposite order of the order
in which they appear in the code.
Of course, OpenGL can draw in 2D as well as in 3D. For 2D drawing in OpenGL, you can
draw on the xy-plane, using zero for the z coordinate. When drawing in 2D, you will probably
want to apply 2D versions of rotation, scaling, and translation. OpenGL does not have 2D
transform functions, but you can just use the 3D versions with appropriate parameters:
• For translation by (dx,dy) in 2D, use glTranslatef (dx, dy, 0 ). The zero translation in the
z direction means that the transform doesn’t change the z coordinate, so it maps the
xy-plane to itself. (Of course, you could use glTranslated instead of glTranslatef.)
• For scaling by (sx,sy) in 2D, use glScalef (sx, sy, 1 ), which scales only in the x and y
directions, leaving the z coordinate unchanged.
• For rotation through an angle r about the origin in 2D, use glRotatef (r, 0, 0, 1 ). This
is rotation about the z -axis, which rotates the xy-plane into itself. In the usual OpenGL
coordinate system, the z -axis points out of the screen, and the right-hand rule says that
rotation by a positive angle will be in the counterclockwise direction in the xy-plane.
Since the x -axis points to the right and the y-axis points upwards, a counterclockwise
rotation rotates the positive x -axis in the direction of the positive y-axis. This is the same
convention that we have used previously for the positive direction of rotation.
using a stack of transforms. Before drawing an object, push a copy of the current transform
onto the stack. After drawing the object and its sub-objects, using any necessary temporary
transformations, restore the previous transform by popping it from the stack.
OpenGL 1.1 maintains a stack of transforms and provides functions for manipulating that
stack. (In fact it has several transform stacks, for different purposes, which introduces some
complications that we will postpone to the next section.) Since transforms are represented as
matrices, the stack is actually a stack of matrices. In OpenGL, the functions for operating on
the stack are named glPushMatrix () and glPopMatrix ().
These functions do not take parameters or return a value. OpenGL keeps track of a current
matrix, which is the composition of all transforms that have been applied. Calling a function
such as glScalef simply modifies the current matrix. When an object is drawn, using the
glVertex* functions, the coordinates that are specified for the object are transformed by the
current matrix. There is another function that affects the current matrix: glLoadIdentity().
Calling glLoadIdentity sets the current matrix to be the identity transform, which represents
no change of coordinates at all and is the usual starting point for a series of transformations.
When the function glPushMatrix () is called, a copy of the current matrix is pushed onto
the stack. Note that this does not change the current matrix; it just saves a copy on the
stack. When glPopMatrix () is called, the matrix on the top of the stack is popped from the
stack, and that matrix replaces the current matrix. Note that glPushMatrix and glPopMatrix
must always occur in corresponding pairs; glPushMatrix saves a copy of the current matrix,
and a corresponding call to glPopMatrix restores that copy. Between a call to glPushMatrix
and the corresponding call to glPopMatrix, there can be additional calls of these functions, as
long as they are properly paired. Usually, you will call glPushMatrix before drawing an object
and glPopMatrix after finishing that object. In between, drawing sub-objects might require
additional pairs of calls to those functions.
As an example, suppose that we want to draw a cube. It’s not hard to draw each face
using glBegin/glEnd, but let’s do it with transformations. We can start with a function that
draws a square in the position of the front face of the cube. For a cube of size 1, the front face
would sit one-half unit in front of the screen, in the plane z = 0.5, and it would have vertices
at (-0.5, -0.5, 0.5), (0.5, -0.5, 0.5), (0.5, 0.5, 0.5), and (-0.5, 0.5, 0.5). Here is a function that
draws the square. The parameters are floating point numbers in the range 0.0 to 1.0 that give
the RGB color of the square:
To make a red front face for the cube, we just need to call square(1,0,0). Now, consider the
right face, which is perpendicular to the x -axis, in the plane x = 0.5. To make a right face, we
can start with a front face and rotate it 90 degrees about the y-axis. Think about rotating the
front face (red) to the position of the right face (green) in this illustration by rotating the front
face about the y-axis:
92 CHAPTER 3. OPENGL 1.1: GEOMETRY
So, we can draw a green right face for the cube with
glPushMatrix();
glRotatef(90, 0, 1, 0);
square(0, 1, 0);
glPopMatrix();
The calls to glPushMatrix and glPopMatrix ensure that the rotation that is applied to the
square will not carry over to objects that are drawn later. The other four faces can be made in
a similar way, by rotating the front face about the coordinate axes. You should try to visualize
the rotation that you need in each case. We can combine it all into a function that draws a
cube. To make it more interesting, the size of the cube is a parameter:
void cube(float size) { // draws a cube with side length = size
glPushMatrix(); // Save a copy of the current matrix.
glScalef(size,size,size); // scale unit cube to desired size
square(1, 0, 0); // red front face
glPushMatrix();
glRotatef(90, 0, 1, 0);
square(0, 1, 0); // green right face
glPopMatrix();
glPushMatrix();
glRotatef(-90, 1, 0, 0);
square(0, 0, 1); // blue top face
glPopMatrix();
glPushMatrix();
glRotatef(180, 0, 1, 0);
square(0, 1, 1); // cyan back face
glPopMatrix();
glPushMatrix();
glRotatef(-90, 0, 1, 0);
square(1, 0, 1); // magenta left face
glPopMatrix();
glPushMatrix();
glRotatef(90, 1, 0, 0);
square(1, 1, 0); // yellow bottom face
glPopMatrix();
glPopMatrix(); // Restore matrix to its state before cube() was called.
}
3.3. PROJECTION AND VIEWING 93
The sample program glut/unlit-cube.c uses this function to draw a cube, and lets you rotate
the cube by pressing the arrow keys. A Java version is jogl/UnlitCube.java, and a web version
is glsim/unlit-cube.html. Here is an image of the cube, rotated by 15 degrees about the x -axis
and -15 degrees about the y-axis to make the top and right sides visible:
For a more complex example of hierarchical modeling with glPushMatrix and glPop-
Matrix, you can check out an OpenGL equivalent of the “cart and windmills” animation
that was used as an example in Subsection 2.4.1. The three versions of the example are:
glut/opengl-cart-and-windmill-2d.c, jogl/CartAndWindmillJogl2D.java, and glsim/opengl-cart-
and-windmill.html. This program is an example of hierarchical 2D graphics in OpenGL.
system, the viewer is at the origin, (0,0,0), looking in the direction of the negative z -axis, the
positive direction of the y-axis is pointing straight up, and the x -axis is pointing to the right.
This is a viewer-centric coordinate system. In other words, eye coordinates are (almost) the
coordinates that you actually want to use for drawing on the screen. The transform from world
coordinates to eye coordinates is called the viewing transformation.
If this is confusing, think of it this way: We are free to use any coordinate system that we
want on the world. Eye coordinates are the natural coordinate system for making a picture
of the world as seen by a viewer. If we used a different coordinate system (world coordinates)
when building the world, then we have to transform those coordinates to eye coordinates to
find out what the viewer actually sees. That transformation is the viewing transform.
Note, by the way, that OpenGL doesn’t keep track of separate modeling and viewing trans-
forms. They are combined into a single transform, which is known as the modelview trans-
formation. In fact, even though world coordinates might seem to be the most important and
natural coordinate system, OpenGL doesn’t have any representation for them and desn’t use
them internally. For OpenGL, only object and eye coordinates have meaning. OpenGL goes
directly from object coordinates to eye coordinates by applying the modelview transformation.
We are not done. The viewer can’t see the entire 3D world, only the part that fits into the
viewport, which is the rectangular region of the screen or other display device where the image
will be drawn. We say that the scene is “clipped” by the edges of the viewport. Furthermore,
in OpenGL, the viewer can see only a limited range of z -values in the eye coordinate system.
Points with larger or smaller z -values are clipped away and are not rendered into the image.
(This is not, of course, the way that viewing works in the real world, but it’s required by the
use of the depth test in OpenGL. See Subsection 3.1.4.) The volume of space that is actually
rendered into the image is called the view volume. Things inside the view volume make it
into the image; things that are not in the view volume are clipped and cannot be seen. For
purposes of drawing, OpenGL applies a coordinate transform that maps the view volume onto
a cube. The cube is centered at the origin and extends from -1 to 1 in the x-direction, in
the y-direction, and in the z-direction. The coordinate system on this cube is referred to as
clip coordinates. The transformation from eye coordinates to clip coordinates is called the
projection transformation. At this point, we haven’t quite projected the 3D scene onto a
2D surface, but we can now do so simply by discarding the z-coordinate. (The z-coordinate,
however, is still needed to provide the depth information that is needed for the depth test.)
We still aren’t done. In the end, when things are actually drawn, there are device coordi-
nates, the 2D coordinate system in which the actual drawing takes place on a physical display
device such as the computer screen. Ordinarily, in device coordinates, the pixel is the unit of
measure. The drawing region is a rectangle of pixels. This is the rectangle that is called the
viewport. The viewport transformation takes x and y from the clip coordinates and scales
them to fit the viewport.
Let’s go through the sequence of transformations one more time. Think of a primitive, such
as a line or triangle, that is part of the scene and that might appear in the image that we want
to make of the scene. The primitive goes through the following sequence of operations:
Modelview Transform
3.3. PROJECTION AND VIEWING 95
1. The points that define the primitive are specified in object coordinates, using methods
such as glVertex3f.
2. The points are first subjected to the modelview transformation, which is a combination of
the modeling transform that places the primitive into the world and the viewing transform
that maps the primitive into eye coordinates.
3. The projection transformation is then applied to map the view volume that is visible to
the viewer onto the clip coordinate cube. If the transformed primitive lies outside that
cube, it will not be part of the image, and the processing stops. If part of the primitive
lies inside and part outside, the part that lies outside is clipped away and discarded, and
only the part that remains is processed further.
4. Finally, the viewport transform is applied to produce the device coordinates that will
actually be used to draw the primitive on the display device. After that, it’s just a matter
of deciding how to color the individual pixels that are part of the primitive.
We need to consider these transforms in more detail and see how to use them in OpenGL 1.1.
The first glViewport command establishes a 300-by-400 pixel viewport with its lower left corner
at (0,0). That is, the lower left corner of the viewport is at the lower left corner of the drawing
surface. This viewport fills the left half of the drawing surface. Similarly, the second viewport,
with its lower left corner at (300,0), fills the right half of the drawing surface.
If you want to go back to working on the modelview matrix, you must call
glMartrixMode(GL MODELVIEW);
In my programs, I generally set the matrix mode to GL PROJECTION, set up the projection
transformation, and and then immediately set the matrix mode back to GL MODELVIEW.
This means that anywhere else in the program, I can be sure that the matrix mode is
GL MODELVIEW.
∗ ∗ ∗
To help you to understand projection, remember that a 3D image can show only a part
of the infinite 3D world. The view volume is the part of the world that is visible in the
image. The view volume is determined by a combination of the viewing transformation and the
projection transformation. The viewing transform determines where the viewer is located and
what direction the viewer is facing, but it doesn’t say how much of the world the viewer can
see. The projection transform does that: It specifies the shape and extent of the region that
is in view. Think of the viewer as a camera, with a big invisible box attached to the front of
the camera that encloses the part of the world that that camera has in view. The inside of the
box is the view volume. As the camera moves around in the world, the box moves with it, and
the view volume changes. But the shape and size of the box don’t change. The shape and size
of the box correspond to the projection transform. The position and orientation of the camera
correspond to the viewing transform.
This is all just another way of saying that, mathematically, the OpenGL projection trans-
formation transforms eye coordinates to clip coordinates, mapping the view volume onto the
2-by-2-by-2 clipping cube that contains everything that will be visible in the image. To specify
a projection just means specifying the size and shape of the view volume, relative to the viewer.
3.3. PROJECTION AND VIEWING 97
There are two general types of projection, perspective projection and orthographic
projection. Perspective projection is more physically realistic. That is, it shows what you
would see if the OpenGL display rectangle on your computer screen were a window into an
actual 3D world (one that could extend in front of the screen as well as behind it). It shows
a view that you could get by taking a picture of a 3D world with a camera. In a perspective
view, the apparent size of an object depends on how far it is away from the viewer. Only things
that are in front of the viewer can be seen. In fact, ignoring clipping in the z -direction for the
moment, the part of the world that is in view is an infinite pyramid, with the viewer at the apex
of the pyramid, and with the sides of the pyramid passing through the sides of the viewport
rectangle.
However, OpenGL can’t actually show everything in this pyramid, because of its use of the
depth test to solve the hidden surface problem. Since the depth buffer can only store a finite
range of depth values, it can’t represent the entire range of depth values for the infinite pyramid
that is theoretically in view. Only objects in a certain range of distances from the viewer can
be part of the image. That range of distances is specified by two values, near and far. For a
perspective transformation, both of these values must be positive numbers, and far must be
greater than near. Anything that is closer to the viewer than the near distance or farther away
than the far distance is discarded and does not appear in the rendered image. The volume of
space that is represented in the image is thus a “truncated pyramid.” This pyramid is the view
volume for a perspective projection:
EYE
near
far
The view volume is bounded by six planes—the four sides plus the top and bottom of the
truncated pyramid. These planes are called clipping planes because anything that lies on the
wrong side of each plane is clipped away. The projection transformation maps the six sides of
the truncated pyramid in eye coordinates to the six sides of the clipping cube in clip coordinates.
In OpenGL, setting up the projection transformation is equivalent to defining the view
volume. For a perspective transformation, you have to set up a view volume that is a truncated
pyramid. A rather obscure term for this shape is a frustum. A perspective transformation
can be set up with the glFrustum command:
glFrustum( xmin, xmax, ymin, ymax, near, far );
The last two parameters specify the near and far distances from the viewer, as already discussed.
The viewer is assumed to be at the origin, (0,0,0), facing in the direction of the negative z-axis.
(This is the eye coordinate system.) So, the near clipping plane is at z = −near, and the far
clipping plane is at z = −far. (Notice the minus signs!) The first four parameters specify the
sides of the pyramid: xmin, xmax, ymin, and ymax specify the horizontal and vertical limits of
the view volume at the near clipping plane. For example, the coordinates of the upper-left
98 CHAPTER 3. OPENGL 1.1: GEOMETRY
corner of the small end of the pyramid are (xmin, ymax, -near ). The x and y limits at the
far clipping plane are larger, usually much larger, than the limits specified in the glFrustum
command.
Note that x and y limits in glFrustum are usually symmetrical about zero. That is, xmin
is usually equal to the negative of xmax and ymin is usually equal to the negative of ymax.
However, this is not required. It is possible to have asymmetrical view volumes where the z-axis
does not point directly down the center of the view.
Since the matrix mode must be set to GL PROJECTION to work on the projection trans-
formation, glFrustum is often used in a code segment of the form
glMatrixMode(GL PROJECTION);
glLoadIdentity();
glFrustum( xmin, xmax, ymin, ymax, near, far );
glMatrixMode(GL MODELVIEW);
The call to glLoadIdentity ensures that the starting point is the identity transform. This is
important since glFrustum modifies the existing projection matrix rather than replacing it, and
although it is theoretically possible, you don’t even want to try to think about what would
happen if you combine several projection transformations into one.
∗ ∗ ∗
Compared to perspective projections, orthographic projections are easier to understand:
In an orthographic projection, the 3D world is projected onto a 2D image by discarding the
z -coordinate of the eye-coordinate system. This type of projection is unrealistic in that it is not
what a viewer would see. For example, the apparent size of an object does not depend on its
distance from the viewer. Objects in back of the viewer as well as in front of the viewer can be
visible in the image. Orthographic projections are still useful, however, especially in interactive
modeling programs where it is useful to see true sizes and angles, undistorted by perspective.
In fact, it’s not really clear what it means to say that there is a viewer in the case of ortho-
graphic projection. Nevertheless, for orthographic projection in OpenGL, there is considered
to be a viewer. The viewer is located at the eye-coordinate origin, facing in the direction of the
negative z-axis. Theoretically, a rectangular corridor extending infinitely in both directions, in
front of the viewer and in back, would be in view. However, as with perspective projection, only
a finite segment of this infinite corridor can actually be shown in an OpenGL image. This finite
view volume is a parallelepiped—a rectangular solid—that is cut out of the infinite corridor by
a near clipping plane and a far clipping plane. The value of far must be greater than near, but
for an orthographic projection, the value of near is allowed to be negative, putting the “near”
clipping plane behind the viewer, as shown in the lower section of this illustration:
3.3. PROJECTION AND VIEWING 99
EYE
near
far
EYE
-near
far
Note that a negative value for near puts the near clipping plane on the positive z -axis, which
is behind the viewer.
An orthographic projection can be set up in OpenGL using the glOrtho method, which is
has the following form:
glOrtho( xmin, xmax, ymin, ymax, near, far );
The first four parameters specify the x - and y-coordinates of the left, right, bottom, and top
of the view volume. Note that the last two parameters are near and far, not zmin and zmax.
In fact, the minimum z-value for the view volume is −far and the maximum z-value is −near.
However, it is often the case that near = −far, and if that is true then the minimum and
maximum z-values turn out to be near and far after all!
As with glFrustum, glOrtho should be called when the matrix mode is GL PROJECTION.
As an example, suppose that we want the view volume to be the box centered at the origin
containing x, y, and z values in the range from -10 to 10. This can be accomplished with
glMatrixMode(GL PROJECTION);
glLoadIdentity();
glOrtho( -10, 10, -10, 10, -10, 10 );
glMatrixMode(GL MODELVIEW);
Now, as it turns out, the effect of glOrtho in this simple case is exactly the same as the effect
of glScalef (0.1, 0.1, -0.1), since the projection just scales the box down by a factor of 10. But
it’s usually better to think of projection as a different sort of thing from scaling. (The minus
sign on the z scaling factor is there because projection reverses the direction of the z -axis,
transforming the conventionally right-handed eye coordinate system into OpenGL’s left-handed
default coordinate system.)
100 CHAPTER 3. OPENGL 1.1: GEOMETRY
∗ ∗ ∗
The glFrustum method is not particularly easy to use. There is a library known as GLU
that contains some utility functions for use with OpenGL. The GLU library includes the method
gluPerspective as an easier way to set up a perspective projection. The command
gluPerspective( fieldOfViewAngle, aspect, near, far );
can be used instead of glFrustum. The fieldOfViewAngle is the vertical angle, measured in
degrees, between the upper side of the view volume pyramid and the lower side. Typical values
are in the range 30 to 60 degrees. The aspect parameter is the aspect ratio of the view, that
is, the width of a cross-section of the pyramid divided by its height. The value of aspect
should generally be set to the aspect ratio of the viewport. The near and far parameters in
gluPerspective have the same meaning as for glFrustum.
That is, this command represents either a modeling transformation that rotates an object by
90 degrees or a viewing transformation that rotates the viewer by -90 degrees about the y-axis.
Note that the effect on the viewer is the inverse of the effect on the object. Modeling and
viewing transforms are always related in this way. For example, if you are looking at an object,
you can move yourself 5 feet to the left (viewing transform), or you can move the object 5 feet
to the right (modeling transform). In either case, you end up with the same view of the object.
Both transformations would be represented in OpenGL as
glTranslatef(5,0,0);
This even works for scaling: If the viewer shrinks, it will look to the viewer exactly the same
as if the world is expanding, and vice-versa.
∗ ∗ ∗
3.3. PROJECTION AND VIEWING 101
Although modeling and viewing transformations are the same in principle, they remain very
different conceptually, and they are typically applied at different points in the code. In general
when drawing a scene, you will do the following: (1) Load the identity matrix, for a well-defined
starting point; (2) apply the viewing transformation; and (3) draw the objects in the scene,
each with its own modeling transformation. Remember that OpenGL keeps track of several
transformations, and that this must all be done while the modelview transform is current; if you
are not sure of that then before step (1), you should call glMatrixMode(GL MODELVIEW ).
During step (3), you will probably use glPushMatrix () and glPopMatrix () to limit each modeling
transform to a particular object.
After loading the identity matrix, the viewer is in the default position, at the origin, looking
down the negative z -axis, with the positive y-axis pointing upwards in the view. Suppose, for
example, that we would like to move the viewer from its default location at the origin back
along the positive z-axis to the point (0,0,20). This operation has exactly the same effect as
moving the world, and the objects that it contains, 20 units in the negative direction along
the z-axis. Whichever operation is performed, the viewer ends up in exactly the same position
relative to the objects. Both operations are implemented by the same OpenGL command,
glTranslatef (0,0,-20). For another example, suppose that we use two commands
glRotatef(90,0,1,0);
glTranslatef(10,0,0);
to establish the viewing transformation. As a modeling transform, these commands would
first translate an object 10 units in the positive x -direction, then rotate the object 90 degrees
about the y-axis. (Remember that modeling transformations are applied to objects in the order
opposite to their order in the code.) What do these commands do as a viewing transformation?
The effect on the view is the inverse of the effect on objects. The inverse of “translate 90 then
rotate 10” is “rotate -10 then translate -90.” That is, to do the inverse, you have to undo the
rotation before you undo the translation. The effect as a viewing transformation is first to
rotate the viewer by -90 degrees about the y-axis, then to translate the viewer by -10 along the
x -axis. (You should think about how the two interpretations affect the view of an object that
starts out at the origin.) Note that the order in which viewing transformations are applied is
the same as the order in which they occur in the code.
The on-line version of this section includes the live demo c3/transform-equivalence-3d.html
that can help you to understand the equivalence between modeling and viewing. This picture,
taken from that demo, visualizes the view volume as a translucent gray box. The scene contains
eight cubes, but not all of them are inside the view volume, so not all of them would appear in
the rendered image:
102 CHAPTER 3. OPENGL 1.1: GEOMETRY
In this case, the projection is a perspective projection, and the view volume is a frustum.
This picture might have been made either by rotating the frustum towards the right (viewing
transformation) or by rotating the cubes towards the left (modeling transform). Read the help
text in the demo for more information
It can be difficult to set up a view by combining rotations, scalings, and translations, so
OpenGL provides an easier way to set up a typical view. The command is not part of OpenGL
itself but is part of the GLU library.
The GLU library provides the following convenient method for setting up a viewing trans-
formation:
gluLookAt( eyeX,eyeY,eyeZ, refX,refY,refZ, upX,upY,upZ );
This method places the viewer at the point (eyeX,eyeY,eyeZ ), looking towards the point
(refX,refY,refZ ). The viewer is oriented so that the vector (upX,upY,upZ ) points upwards in
the viewer’s view. For example, to position the viewer on the positive x -axis, 10 units from the
origin, looking back at the origin, with the positive direction of the y-axis pointing up as usual,
use
gluLookAt( 10,0,0, 0,0,0, 0,1,0 );
∗ ∗ ∗
With all this, we can give an outline for a typical display routine for drawing an image of a
3D scene with OpenGL 1.1:
// possibly set clear color here, if not set elsewhere
glMatrixMode( GL MODELVIEW );
glLoadIdentity();
glPushMatrix();
.
. // apply modeling transform and draw an object
.
glPopMatrix();
glPushMatrix();
.
. // apply another modeling transform and draw another object
.
glPopMatrix();
.
.
.
3.3. PROJECTION AND VIEWING 103
In many cases, the default settings are sufficient. Note in particular how cameraLookAt and
cameraSetLimits work together to set up the view and projection. The parameters to camer-
aLookAt represent three points in world coordinates. The view reference point, (refX,refY,refZ ),
should be somewhere in the middle of the scene that you want to render. The parameters to
cameraSetLimits define a box about that view reference point that should contain everything
that you want to appear in the image.
∗ ∗ ∗
For use with JOGL in Java, the camera API is implemented as a class named Camera,
defined in the file jogl/Camera.java. The camera is meant for use with a GLPanel or GLCanvas
that is being used as an OpenGL drawing surface. To use a camera, create an object of type
Camera as an instance variable:
camera = new Camera();
where gl2 is the OpenGL drawing context of type GL2. (Note the presence of the parameter
gl2, which was not necessary in C; it is required because the OpenGL drawing context in
JOGL is implemented as an object.) As in the C version, this sets the viewing and projection
transformations and can replace any other code that you would use for that purpose. The
functions for configuring the camera are the same in Java as in C, except that they become
methods in the camera object, and true/false parameters are boolean instead of int:
camera.lookAt( eyeX,eyeY,eyeZ, refX,refY,refZ, upX,upY,upZ );
camera.setLimits( xmin,xmax, ymin,ymax, zmin,zmax );
camera.setScale( limit );
camera.setOrthographic( ortho ); // ortho is of type boolean
camera.setPreserveAspect( preserve ); // preserve is of type boolean
∗ ∗ ∗
The camera comes with a simulated “trackball.” The trackball allows the user to rotate the
view by clicking and dragging the mouse on the display. To use it with GLUT in C, you just
need to install a mouse function and a mouse motion function by calling
glutMouseFunc( trackballMouseFunction );
glutMotionFunc( trackballMotionFunction );
v5=(-1.5,1.5,0) v4=(1.5,1.5,0)
v2=(2,1,-2)
v8=(-2,1,-2)
v7=(-2,1,2)
v3=(2,1,2)
v9=(-2,-1,-2) v1=(2,-1,-2)
v6=(-2,-1,2) v0=(2,-1,2)
The order of the vertices is completely arbitrary. The purpose is simply to allow each vertex to
be identified by an integer.
To describe one of the polygonal faces of a mesh, we just have to list its vertices, in order
going around the polygon. For an IFS, we can specify a vertex by giving its index in the list.
For example, we can say that one of the triangular faces of the pyramid is the polygon formed
by vertex #3, vertex #2, and vertex #4. So, we can complete our data for the mesh by giving
a list of vertex indices for each face. Here is the face data for the house. Remember that the
numbers in parenthese are indices into the vertex list:
Face #0: (0, 1, 2, 3)
Face #1: (3, 2, 4)
Face #2: (7, 3, 4, 5)
Face #3: (2, 8, 5, 4)
Face #4: (5, 8, 7)
Face #5: (0, 3, 7, 6)
Face #6: (0, 6, 9, 1)
Face #7: (2, 1, 9, 8)
Face #8: (6, 7, 8, 9)
Again, the order in which the faces are listed in arbitrary. There is also some freedom in how
the vertices for a face are listed. You can start with any vertex. Once you’ve picked a starting
vertex, there are two possible orderings, corresponding to the two possible directions in which
you can go around the circumference of the polygon. For example, starting with vertex 0, the
first face in the list could be specified either as (0,1,2,3) or as (0,3,2,1). However, the first
possibility is the right one in this case, for the following reason. A polygon in 3D can be viewed
from either side; we can think of it as having two faces, facing in opposite directions. It turns
out that it is often convenient to consider one of those faces to be the “front face” of the polygon
and one to be the “back face.” For a polyhedron like the house, the front face is the one that
faces the outside of the polyhedron. The usual rule is that the vertices of a polygon should be
listed in counter-clockwise order when looking at the front face of the polygon. When looking
at the back face, the vertices will be listed in clockwise order. This is the default rule used by
OpenGL.
3 0 0 3
The vertex and face data for an indexed face set can be represented as a pair of two-
dimensional arrays. For the house, in a version for Java, we could use
double[][] vertexList =
{ {2,-1,2}, {2,-1,-2}, {2,1,-2}, {2,1,2}, {1.5,1.5,0},
{-1.5,1.5,0}, {-2,-1,2}, {-2,1,2}, {-2,1,-2}, {-2,-1,-2} };
int[][] faceList =
{ {0,1,2,3}, {3,2,4}, {7,3,4,5}, {2,8,5,4}, {5,8,7},
{0,3,7,6}, {0,6,9,1}, {2,1,9,8}, {6,7,8,9} };
3.4. POLYGONAL MESHES AND GLDRAWARRAYS 107
In most cases, there will be additional data for the IFS. For example, if we want to color the
faces of the polyhedron, with a different color for each face, then we could add another array,
faceColors, to hold the color data. Each element of faceColors would be an array of three
double values in the range 0.0 to 1.0, giving the RGB color components for one of the faces.
With this setup, we could use the following code to draw the polyhedron, using Java and JOGL:
for (int i = 0; i < faceList.length; i++) {
gl2.glColor3dv( faceColors[i], 0 ); // Set color for face number i.
gl2.glBegin(GL2.GL TRIANGLE FAN);
for (int j = 0; j < faceList[i].length; j++) {
int vertexNum = faceList[i][j]; // Index for vertex j of face i.
double[] vertexCoords = vertexList[vertexNum]; // The vertex itself.
gl2.glVertex3dv( vertexCoords, 0 );
}
gl2.glEnd();
}
Note that every vertex index is used three or four times in the face data. With the IFS
representation, a vertex is represented in the face list by a single integer. This representation
uses less memory space than the alternative, which would be to write out the vertex in full each
time it occurs in the face data. For the house example, the IFS representation uses 64 numbers
to represent the vertices and faces of the polygonal mesh, as opposed to 102 numbers for the
alternative representation.
Indexed face sets have another advantage. Suppose that we want to modify the shape of
the polygon mesh by moving its vertices. We might do this in each frame of an animation, as a
way of “morphing” the shape from one form to another. Since only the positions of the vertices
are changing, and not the way that they are connected together, it will only be necessary to
update the 30 numbers in the vertex list. The values in the face list will remain unchanged.
∗ ∗ ∗
There are other ways to store the data for an IFS. In C, for example, where two-dimensional
arrays are more problematic, we might use one dimensional arrays for the data. In that case,
we would store all the vertex coordinates in a single array. The length of the vertex array would
be three times the number of vertices, and the data for vertex number N will begin at index
3*N in the array. For the face list, we have to deal with the fact that not all faces have the
same number of vertices. A common solution is to add a -1 to the array after the data for each
face. In C, where it is not possible to determine the length of an array, we also need variables
to store the number of vertices and the number of faces. Using this representation, the data for
the house becomes:
int vertexCount = 10; // Number of vertices.
double vertexData[] =
{ 2,-1,2, 2,-1,-2, 2,1,-2, 2,1,2, 1.5,1.5,0,
-1.5,1.5,0, -2,-1,2, -2,1,2, -2,1,-2, -2,-1,-2 };
int faceCount = 9; // Number of faces.
int[][] faceData =
{ 0,1,2,3,-1, 3,2,4,-1, 7,3,4,5,-1, 2,8,5,4,-1, 5,8,7,-1,
0,3,7,6,-1, 0,6,9,1,-1, 2,1,9,8,-1, 6,7,8,9,-1 };
After adding a faceColors array to hold color data for the faces, we can use the following C
code to draw the house:
108 CHAPTER 3. OPENGL 1.1: GEOMETRY
int i,j;
j = 0; // index into the faceData array
for (i = 0; i < faceCount; i++) {
glColor3dv( &faceColors[ i*3 ] ); // Color for face number i.
glBegin(GL TRIANGLE FAN);
while ( faceData[j] != -1) { // Generate vertices for face number i.
int vertexNum = faceData[j]; // Vertex number in vertexData array.
glVertex3dv( &vertexData[ vertexNum*3 ] );
j++;
}
j++; // increment j past the -1 that ended the data for this face.
glEnd();
}
Note the use of the C address operator, &. For example, &faceColors[i*3] is a pointer
to element number i*3 in the faceColors array. That element is the first of the three color
component values for face number i. This matches the parameter type for glColor3dv in C,
since the parameter is a pointer type.
∗ ∗ ∗
We could easily draw the edges of the polyhedron instead of the faces simply by using
GL LINE LOOP instead of GL TRIANGLE FAN in the drawing code (and probably leaving
out the color changes). An interesting issue comes up if we want to draw both the faces and
the edges. This can be a nice effect, but we run into a problem with the depth test: Pixels
along the edges lie at the same depth as pixels on the faces. As discussed in Subsection 3.1.4,
the depth test cannot handle this situation well. However, OpenGL has a solution: a feature
called “polygon offset.” This feature can adjust the depth, in clip coordinates, of a polygon, in
order to avoid having two objects exactly at the same depth. To apply polygon offset, you need
to set the amount of offset by calling
glPolygonOffset(1,1);
The second parameter gives the amount of offset, in units determined by the first parameter.
The meaning of the first parameter is somewhat obscure; a value of 1 seems to work in all cases.
You also have to enable the GL POLYGON OFFSET FILL feature while drawing the faces.
An outline for the procedure is
glPolygonOffset(1,1);
glEnable( GL POLYGON OFFSET FILL );
.
. // Draw the faces.
.
glDisable( GL POLYGON OFFSET FILL );
.
. // Draw the edges.
.
There is a sample program that can draw the house and a number of other polyhedra.
It uses drawing code very similar to what we have looked at here, including polygon offset.
The program is also an example of using the camera and trackball API that was discussed in
Subsection 3.3.5, so that the user can rotate a polyhedron by dragging it with the mouse. The
program has menus that allow the user to turn rendering of edges and faces on and off, plus
some other options. The Java version of the program is jogl/IFSPolyhedronViewer.java, and the
C version is glut/ifs-polyhedron-viewer.c. To get at the menu in the C version, right-click on the
3.4. POLYGONAL MESHES AND GLDRAWARRAYS 109
display. The data for the polyhedra are created in jogl/Polyhedron.java and glut/polyhedron.c.
There is also a live demo version of the program in this section on line.
This function call corresponds to one use of glBegin/glEnd. The primitiveType tells which
primitive type is being drawn, such as GL QUADS or GL TRIANGLE STRIP. The same ten
primitive types that can be used with glBegin can be used here. The parameter firstVertex is
the number of the first vertex that is to used for drawing the primitive. Note that the position
is given in terms of vertex number; the corresponding array index would be the vertex number
times the number of coordinates per vertex, which was set in the call to glVertexPointer. The
vertexCount parameter is the number of vertices to be used, just as if glVertex* were called
vertexCount times. Often, firstVertex will be zero, and vertexCount will be the total number
of vertices in the array. The command for drawing the square in our example would be
glDrawArrays( GL TRIANGLE FAN, 0, 4 );
Often there is other data associated with each vertex in addition to the vertex coordinates.
For example, you might want to specify a different color for each vertex. The colors for the
vertices can be put into another array. You have to specify the location of the data by calling
void glColorPointer(int size, int type, int stride, void* array)
which works just like gVertexPointer. And you need to enable the color array by calling
glEnableClientState(GL COLOR ARRAY);
With this setup, when you call glDrawArrays, OpenGL will pull a color from the color array for
each vertex at the same time that it pulls the vertex coordinates from the vertex array. Later,
we will encounter other kinds of vertex data besides coordinates and color that can be dealt
with in much the same way.
Let’s put this together to draw the standard OpenGL red/green/blue triangle, which we
drew using glBegin/glEnd in Subsection 3.1.2. Since the vertices of the triangle have different
colors, we will use a color array in addition to the vertex array.
float coords[6] = { -0.9,-0.9, 0.9,-0.9, 0,0.7 }; // two coords per vertex.
float colors[9] = { 1,0,0, 0,1,0, 1,0,0 }; // three RGB values per vertex.
glVertexPointer( 2, GL FLOAT, 0, coords ); // Set data type and location.
glColorPointer( 3, GL FLOAT, 0, colors );
glEnableClientState( GL VERTEX ARRAY ); // Enable use of arrays.
glEnableClientState( GL COLOR ARRAY );
glDrawArrays( GL TRIANGLES, 0, 3 ); // Use 3 vertices, starting with vertex 0.
In practice, not all of this code has to be in the same place. The function that does the actual
drawing, glDrawArrays, must be in the display routine that draws the image. The rest could
be in the display routine, but could also be done, for example, in an initialization routine.
∗ ∗ ∗
The function glDrawElements is similar to glDrawArrays, but it is designed for use with data
in a format similar to an indexed face set. With glDrawArrays, OpenGL pulls data from the
enabled arrays in order, vertex 0, then vertex 1, then vertex 2, and so on. With glDrawElements,
you provide a list of vertex numbers. OpenGL will go through the list of vertex numbers, pulling
data for the specified vertices from the arrays. The advantage of this comes, as with indexed
face sets, from the fact that the same vertex can be reused several times.
To use glDrawElements to draw a primitive, you need an array to store the vertex numbers.
The numbers in the array can be 8, 16, or 32 bit integers. (They are supposed to be unsigned
integers, but arrays of regular positive integers will also work.) You also need arrays to store
3.4. POLYGONAL MESHES AND GLDRAWARRAYS 111
the vertex coordinates and other vertex data, and you must enable those arrays in the same
way as for glDrawArrays, using functions such as glVertexArray and glEnableClientState. To
actually draw the primitive, call the functions
void glDrawElements( int primitiveType, vertexCount, dataType, void *array)
Here, primitiveType is one of the ten primitive types such as GL LINES, vertexCount is the
number of vertices to be drawn, dataType specifies the type of data in the array, and array is the
array that holds the list of vertex numbers. The dataType must be given as one of the constants
GL UNSIGNED BYTE, GL UNSIGNED SHORT, or GL UNSIGNED INT to specify 8, 16, or
32 bit integers respectively.
As an example, we can draw a cube. We can draw all six faces of the cube as one primitive
of type GL QUADS. We need the vertex coordinates in one array and the vertex numbers for
the faces in another array. I will also use a color array for vertex colors. The vertex colors will
be interpolated to pixels on the faces, just like the red/green/blue triangle. Here is code that
could be used to draw the cube. Again, this would not necessarily be all in the same part of a
program:
float vertexCoords[24] = { // Coordinates for the vertices of a cube.
1,1,1, 1,1,-1, 1,-1,-1, 1,-1,1,
-1,1,1, -1,1,-1, -1,-1,-1, -1,-1,1 };
float vertexColors[24] = { // An RGB color value for each vertex
1,1,1, 1,0,0, 1,1,0, 0,1,0,
0,0,1, 1,0,1, 0,0,0, 0,1,1 };
int elementArray[24] = { // Vertex numbers for the six faces.
0,1,2,3, 0,3,7,4, 0,4,5,1,
6,2,1,5, 6,5,4,7, 6,7,3,2 };
glVertexPointer( 3, GL FLOAT, 0, vertexCoords );
glColorPointer( 3, GL FLOAT, 0, vertexColors );
glEnableClientState( GL VERTEX ARRAY );
glEnableClientState( GL COLOR ARRAY );
glDrawElements( GL QUADS, 24, GL UNSIGNED INT, elementArray );
Note that the second parameter is the number of vertices, not the number of quads.
The sample program glut/cubes-with-vertex-arrays.c uses this code to draw a cube. It draws
a second cube using glDrawArrays. The Java version is jogl/CubesWithVertexArrays.java, but
you need to read the next subsection before you can understand it. There is also a JavaScript
version, glsim/cubes-with-vertex-arrays.html.
for example, contains a numbered sequence of values of type float. There are subclasses of
Buffer for all of Java’s primitive data types except boolean.
Nio buffers are used in JOGL in several places where arrays are used in the C API. For
example, JOGL has the following glVertexPointer method in the GL2 class:
public void glVertexPointer(int size, int type, int stride, Buffer buffer)
Only the last parameter differs from the C version. The buffer can be of type FloatBuffer,
IntBuffer, or DoubleBuffer. The type of buffer must match the type parameter in the method.
Functions such as glColorPointer work the same way, and glDrawElements takes the form
where the buffer can be of type IntBuffer, ShortBuffer, or ByteBuffer to match the dataType
UNSIGNED INT, UNSIGNED SHORT, or UNSIGNED BYTE.
The class com.jogamp.common.nio.Buffers contains static utility methods for working with
direct nio buffers. The easiest to use are methods that create a buffer from a Java array. For
example, the method Buffers.newDirectFloatBuffer (array) takes a float array as its parameter
and creates a FloatBuffer of the same length and containing the same data as the array. These
methods are used to create the buffers in the sample program jogl/CubesWithVertexArrays.java.
For example,
There are also methods such as Buffers.newDirectFloatBuffer (n), which creates a FloatBuffer
of length n. Remember that an nio Buffer, like an array, is simply a linear sequence of elements
of a given type. In fact, just as for an array, it is possible to refer to items in a buffer by their
index or position in that sequence. Suppose that buffer is a variable of type FloatBuffer, i is an
int and x is a float. Then
buffer.put(i,x);
copies the value of x into position number i in the buffer. Similarly, buffer.get(i ) can be used
to retrieve the value at index i in the buffer. These methods make it possible to work with
buffers in much the same way that you can work with arrays.
3.4. POLYGONAL MESHES AND GLDRAWARRAYS 113
The return value is an int which will be the identifier for the list. The parameter to
glGenLists is also an int, which is usually 1. (You can actually ask for several list IDs at once;
the parameter tells how many you want. The list IDs will be consecutive integers, so that if
listA is the return value from glGenLists(3), then the identifiers for the three lists will be listA,
listA + 1, and listA + 2.)
Once you’ve allocated a list in this way, you can store commands into it. If listID is the ID
for the list, you would do this with code of the form:
glNewList(listID, GL COMPILE);
... // OpenGL commands to be stored in the list.
glEndList();
The parameter GL COMPILE means that you only want to store commands into the list, not
execute them. If you use the alternative parameter GL COMPILE AND EXECUTE, then the
commands will be executed immediately as well as stored in the list for later reuse.
Once you have created a display list in this way, you can call the list with the command
glCallList(listID);
The effect of this command is to tell the GPU to execute a list that it has already stored. You
can tell the graphics card that a list is no longer needed by calling
gl.glDeleteLists(listID, 1);
114 CHAPTER 3. OPENGL 1.1: GEOMETRY
The second parameter in this method call plays the same role as the parameter in glGenLists;
that is, it allows you delete several sequentially numbered lists. Deleting a list when you are
through with it allows the GPU to reuse the memory that was used by that list.
∗ ∗ ∗
Vertex buffer objects take a different approach to reusing information. They only store data,
not commands. A VBO is similar to an array. In fact, it is essentially an array that can be
stored on the GPU for efficiency of reuse. There are OpenGL commands to create and delete
VBOs and to transfer data from an array on the CPU side into a VBO on the GPU. You can
configure glDrawArrays() and glDrawElements() to take the data from a VBO instead of from
an ordinary array (in C) or from an nio Buffer (in JOGL). This means that you can send the
data once to the GPU and use it any number of times.
I will not discuss how to use VBOs here, since it was not a part of OpenGL 1.1. However,
there is a sample program that lets you compare different techniques for rendering a complex
image. The C version of the program is glut/color-cube-of-spheres.c, and the Java version is
jogl/ColorCubeOfSpheres.java. The program draws 1331 spheres, arranged in an 11-by-11-by-
11 cube. The spheres are different colors, with the amount of red in the color varying along
one axis, the amount of green along a second axis, and the amount of blue along the third.
Each sphere has 66 vertices, whose coordinates can be computed using the math functions sin
and cos. The program allows you to select from five different rendering methods, and it shows
the time that it takes to render the spheres using the selected method. (The Java version has
a drop-down menu for selecting the method; in the C version, right-click the image to get the
menu.) You can use your mouse to rotate the cube of spheres, both to get a better view and to
generate more data for computing the average render time. The five rendering techniques are:
• Direct Draw, Recomputing Vertex Data — A remarkably foolish way to draw 1331 spheres,
by recomputing all of the vertex coordinates every time a sphere is drawn.
• Direct Draw, Precomputed Data — The vertex coordinates are computed once and stored
in an array. The spheres are drawn using glBegin/glEnd, but the data used in the calls to
glVertex* are taken from the array rather than recomputed each time they are needed.
• Display List — A display list is created containing all of the commands and data needed
to draw a sphere. Each sphere can then be drawn by a single call of that display list.
• DrawArrays with Arrays — The data for the sphere is stored in a vertex array (or, for
Java, in an nio buffer), and each sphere is drawn using a call to glDrawArrays, using the
techniques discussed earlier in this section. The data has to be sent to the GPU every
time a sphere is drawn.
• DrawArrays with VBOs — Again, glDrawArrays is used to draw the spheres, but this time
the data is stored in a VBO instead of in an array, so the data only has to be transmitted
to the GPU once.
In my own experiments, I found, as expected, that display lists and VBOs gave the shortest
rendering times, with little difference between the two. There were some interesting differences
between the results for the C version and the results for the Java version, which seem to be due
to the fact that function calls in C are more efficient than method calls in Java. You should try
the program on your own computer, and compare the rendering times for the various rendering
methods.
3.5. SOME LINEAR ALGEBRA 115
where length is the length of v. Dividing a vector by its length is said to normalize the vector:
The result is a unit vector that points in the same direction as the original vector.
Two vectors can be added. Given two vectors v1 = (x1,y1,z1 ) and v2 = (x2,y2,z2 ), their
sum is defined as
v1 + v2 = ( x1+x2, y1+y2, z1+z2 );
The sum has a geometric meaning:
Multiplication is more complicated. The obvious definition of the product of two vectors,
similar to the definition of the sum, does not have geometric meaning and is rarely used.
However, there are three kinds of vector multiplication that are used: the scalar product, the
dot produt, and the cross product.
If v = (x,y,z ) is a vector and a is a number, then the scalar product of a and v is defined
as
av = ( a*x, a*y, a*z );
Assuming that a is positive and v is not zero, av is a vector that points in the same direction as
v, whose length is a times the length of v. If a is negative, av points in the opposite direction
from v, and its length is |a| times the length of v. This type of product is called a scalar product
because a number like a is also referred to as a “scalar,” perhaps because multiplication by a
scales v to a new length.
Given two vectors v1 = (x1,y1,z1 ) and v2 = (x2,y2,z2 ), the dot product of v1 and v2 is
denoted by v1 ·v2 and is defined by
v1·v2 = x1*x2 + y1*y2 + z1*z2
Note that the dot product is a number, not a vector. The dot product has several very important
geometric meanings. First of all, note that the length of a vector v is just the square root of
v ·v. Furthermore, the dot product of two non-zero vectors v1 and v2 has the property that
cos(angle) = v1·v2 / (|v1|*|v2|)
where angle is the measure of the angle between v1 and v2. In particular, in the case of two
unit vectors, whose lengths are 1, the dot product of two unit vectors is simply the cosine
of the angle between them. Furthermore, since the cosine of a 90-degree angle is zero, two
non-zero vectors are perpendicular if and only if their dot product is zero. Because of these
properties, the dot product is particularly important in lighting calculations, where the effect
of light shining on a surface depends on the angle that the light makes with the surface.
The scalar product and dot product are defined in any dimension. For vectors in 3D, there
is another type of product called the cross product, which also has an important geometric
meaning. For vectors v1 = (x1,y1,z1 ) and v2 = (x2,y2,z2 ), the cross product of v1 and v2 is
denoted v1 ×v2 and is the vector defined by
v1×v2 = ( y1*z2 - z1*y2, z1*x2 - x1*z2, x1*y2 - y1*x2 )
3.5. SOME LINEAR ALGEBRA 117
If v1 and v2 are non-zero vectors, then v1 ×v2 is zero if and only if v1 and v2 point in the same
direction or in exactly opposite directions. Assuming v1 ×v2 is non-zero, then it is perpendicular
both to v1 and to v2 ; furthermore, the vectors v1, v2, v1 ×v2 follow the right-hand rule; that is,
if you curl the fingers of your right hand from v1 to v2, then your thumb points in the direction
of v1 ×v2. If v1 and v2 are perpendicular unit vectors, then the cross product v1 ×v2 is also a
unit vector, which is perpendicular both to v1 and to v2.
Finally, I will note that given two points P1 = (x1,y1,z1 ) and P2 = (x2,y2,z2 ), the difference
P2−P1 is defined by
P2 − P1 = ( x2 − x1, y2 − y1, z2 − z1 )
This difference is a vector that can be visualized as an arrow that starts at P1 and ends at P2.
Now, suppose that P1, P2, and P3 are vertices of a polygon. Then the vectors P1−P2 and
P3−P2 lie in the plane of the polygon, and so the cross product
(P3−P2) × (P1−P2)
(P3-P2) X (P1-P2)
Try to visualize this in 3D!
A vector that is perpendicular to
P1 the triangle is obtained by taking
P3 the cross product of P3-P2 and
P1-P2, which are vectors that lie
2
-P2
P3
P2
This vector is said to be a normal vector for the polygon. A normal vector of length one is
called a unit normal . Unit normals will be important in lighting calculations, and it will be
useful to be able to calculate a unit normal for a polygon from its vertices.
Note that the i -th coordinate in the product Av is simply the dot product of the i -th row of
the matrix A and the vector v.
Using this definition of the multiplication of a vector by a matrix, a matrix defines a trans-
formation that can be applied to one vector to yield another vector. Transformations that
are defined in this way are linear transformations, and they are the main object of study in
linear algebra. A linear transformation L has the properties that for two vectors v and w,
L(v+w) = L(v) + L(w), and for a number s, L(sv) = sL(v).
Rotation and scaling are linear transformations, but translation is not a linear transfor-
maton. To include translations, we have to widen our view of transformation to include affine
transformations. An affine transformation can be defined, roughly, as a linear transformation
followed by a translation. Geometrically, an affine transformation is a transformation that
preserves parallel lines; that is, if two lines are parallel, then their images under an affine
transformation will also be parallel lines. For computer graphics, we are interested in affine
transformations in three dimensions. However—by what seems at first to be a very odd trick—
we can narrow our view back to the linear by moving into the fourth dimension.
Note first of all that an affine transformation in three dimensions transforms a vector
(x1,y1,z1 ) into a vector (x2,y2,z2 ) given by formulas
x2 = a1*x1 + a2*y1 + a3*z1 + t1
y2 = b1*x1 + b2*y1 + b3*z1 + t2
z2 = c1*x1 + c2*y1 + c3*z1 + t3
These formulas express a linear transformation given by multiplication by the 3-by-3 matrix
a1 a2 a3
b1 b2 b3
c1 c2 c3
a1 a2 a3 t1
b1 b2 b3 t2
c1 c2 c3 t3
0 0 0 1
If the vector (x1,y1,z1,1) is multiplied by this 4-by-4 matrix, the result is precisely the vector
(x2,y2,z2,1). That is, instead of applying an affine transformation to the 3D vector (x1,y1,z1 ),
we can apply a linear transformation to the 4D vector (x1,y1,z1,1).
This might seem pointless to you, but nevertheless, that is what is done in OpenGL and
other 3D computer graphics systems: An affine transformation is represented as a 4-by-4 ma-
trix in which the bottom row is (0,0,0,1), and a three-dimensional vector is changed into a four
dimensional vector by adding a 1 as the final coordinate. The result is that all the affine trans-
formations that are so important in computer graphics can be implemented as multiplication
of vectors by matrices.
3.5. SOME LINEAR ALGEBRA 119
1 0 0 0 1 0 0 tx sx 0 0 0
0 1 0 0 0 1 0 ty 0 sy 0 0
0 0 1 0 0 0 1 tz 0 0 sz 0
0 0 0 1 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 1 0 0 0 1
It is even possible to use an arbitrary transformation matrix in OpenGL, using the function
glMultMatrixf (T ) or glMultMatrixd (T ). The parameter, T, is an array of numbers of type
float or double, representing a transformation matrix. The array is a one-dimensional array of
length 16. The items in the array are the numbers from the transformation matrix, stored in
column-major order, that is, the numbers in the fist column, followed by the numbers in the
second column, and so on. These functions multiply the current matrix by the matrix T, on
the right. You could use them, for example, to implement a shear transform, which is not easy
to represent as a sequence of scales, rotations, and translations.
with a non-zero value for w, generates the 3D point (x/w,y/w,z/w ). Fortunately, you will almost
never have to deal with homogeneous coordinates directly. The only real exception to this is
that homogeneous coordinates are used, surprisingly, when configuring OpenGL lighting, as
we’ll see in the next chapter.
a JavaScript library that I have written to simulate the subset of OpenGL 1.1 that is used in
this book.
The “-o glutprog” tells the compiler to use “glutprog” as the name of its output file, which
can then be run as a normal executable file; without this option, the executable file would be
named “a.out”. The “-lglut” and “-lGL” options tell the compiler to link the program with
the GLUT and OpenGL libraries. (The character after the “-” is a lower case “L”.) Without
these options, the linker won’t recognize any GLUT or OpenGL functions. If the program also
uses the GLU library, compiling it would require the option “-lGLU, and if it uses the math
library, it would need the option “-lm”. If a program requires additional .c files, they should
be included as well. For example, the sample program glut/color-cube-of-spheres.c depends on
camera.c, and it can be compiled with the Linux gcc compiler using the command:
gcc -o cubes color-cube-of-spheres.c camera.c -lGL -lglut -lGLU -lm
The sample program glut/glut-starter.c can be used as a starting point for writing programs
that use GLUT. While it doesn’t do anything except open a window, the program contains the
framework needed to do OpenGL drawing, including doing animation, responding to mouse
and keyboard events, and setting up a menu. The source code contains comments that tell you
how to use it.
∗ ∗ ∗
The GLUT library makes it easy to write basic OpenGL applications in C. GLUT uses
event-handling functions. You write functions to handle events that occur when the display
needs to be redrawn or when the user clicks the mouse or presses a key on the keyboard.
To use GLUT, you need to include the header file glut.h (or freeglut.h) at the start of any
source code file that uses it, along with the general OpenGL header file, gl.h. The header files
should be installed in a standard location, in a folder named GL. So, the program usually begins
with
#include <GL/gl.h>
#include <GL/glut.h>
that have been registered to handle them. The event loop runs until the program ends, which
happens when the user closes the window or when the program calls the standard exit() function.
To set up the event-handling functions, GLUT uses the fact that in C, it is possible to pass a
function name as a parameter to another function. For example, if display() is the function that
should be called to draw the content of the window, then the program would use the command
glutDisplayFunc(display);
to install this function as an event handler for display events. A display event occurs when the
contents of the window need to be redrawn, including when the window is first opened. Note
that display must have been previously defined, as a function with no parameters:
void display() {
.
. // OpenGL drawing code goes here!
.
}
Keep in mind that it’s not the name of this function that makes it an OpenGL display func-
tion. It has to be set as the display function by calling glutDisplayFunc(display). All of the
GLUT event-handling functions work in a similar way (except many of them do need to have
parameters).
There are a lot of possible event-handling functions, and I will only cover some of them
here. Let’s jump right in and look at a possible main() routine for a GLUT program that uses
most of the common event handlers:
int main(int argc, char** argv) {
glutInit(&argc, argv); // Required initialization!
glutInitDisplayMode(GLUT DOUBLE | GLUT DEPTH);
glutInitWindowSize(500,500); // size of display area, in pixels
glutInitWindowPosition(100,100); // location in screen coordinates
glutCreateWindow("OpenGL Program"); // parameter is window title
glutDisplayFunc(display); // called when window needs to be redrawn
glutReshapeFunc(reshape); // called when size of the window changes
glutKeyboardFunc(keyFunc); // called when user types a character
glutSpecialFunc(specialKeyFunc);// called when user presses a special key
glutMouseFunc(mouseFunc); // called for mousedown and mouseup events
glutMotionFunc(mouseDragFunc); // called when mouse is dragged
glutIdleFunc(idleFun); // called when there are no other events
glutMainLoop(); // Run the event loop! This function never returns.
return 0; // (This line will never actually be reached.)
}
The first five lines do some necessary initialization, the next seven lines install event handlers,
and the call to glutMainLoop() runs the GLUT event loop. I will discuss all of the functions that
are used here. The first GLUT function call must be glutInit, with the parameters as shown.
(Note that argc and argv represent command-line arguments for the program. Passing them to
glutInit allows it to process certain command-line arguments that are recognized by GLUT. I
won’t discuss those arguments here.) The functions glutInitWindowSize and glutInitWindow-
Position do the obvious things; size is given in pixels, and window position is given in terms
of pixel coordinates on the computer screen, with (0,0) at the upper left corner of the screen.
The function glutCreateWindow creates the window, but note that nothing can happen in that
3.6. USING GLUT AND JOGL 123
For example, you might use this method to set up the projection transform, if the projection
depends only on the window size. A reshape function is not required, but if one is provided,
it should always set the OpenGL viewport, which is the part of the window that is used for
drawing. Do this by calling
glViewport(0,0,width,height);
Whenever you make any changes to the program’s data that require the display to be redrawn,
you should call glutPostRedisplay(). This is similar to calling repaint() in Java. It is better to
call glutPostRedisplay() than to call the display function directly. (I also note that it’s possible
to call OpenGL drawing commands directly in the event-handling functions, but it probably
only makes sense if you are using single buffering; if you do this, call glFlush() to make sure
that the drawing appears on the screen.)
glutSpecialFunc(specialKeyFunc) — The “special” function is called when the user
presses certain special keys, such as an arrow key or the Home key. The parameters are an
integer code for the key that was pressed, plus the mouse position when the key was pressed:
124 CHAPTER 3. OPENGL 1.1: GEOMETRY
Once the menu has been created, commands are added to the menu by calling the function
glutAddMenuEntry(name,commandID). The first parameter is the string that will appear in
the menu. The second is an int that identifies the command; it is the integer that will be
passed to the menu-handling function when the user selects the command from the menu.
Finally, the function glutAttachMenu(button) attaches the menu to the window. The
parameter specifies which mouse button will trigger the menu. Possible values are
GLUT LEFT BUTTON, GLUT MIDDLE BUTTON, and GLUT RIGHT BUTTON. As far
as I can tell, if a mouse click is used to trigger the popup menu, than the same mouse click will
not also produce a call to the mouse-handler function.
Note that a call to glutAddMenuEntry doesn’t mention the menu, and a call to glutAttach-
Menu doesn’t mention either the menu or the window. When you call glutCreateMenu, the
menu that is created becomes the “current menu” in the GLUT state. When glutAddMenu-
Entry is called, it adds a command to the current menu. When glutAttachMenu is called, it
attaches the current menu to the current window, which was set by a call to glutCreateWindow.
All this is consistent with the OpenGL “state machine” philosophy, where functions act by
modifying the current state.
As an example, suppose that we want to let the user set the background color for the display.
We need a function to carry out commands that we will add to the menu. For example, we
might define
function doMenu( int commandID ) {
if ( commandID == 1)
glClearColor(0,0,0,1); // BLACK
else if ( commandID == 2)
glClearColor(1,1,1,1); // WHITE
else if ( commandID == 3)
glClearColor(0,0,0.5,1); // DARK BLUE
else if (commandID == 10)
exit(0); // END THE PROGRAM
glutPostRedisplay(); // redraw the display, with the new background color
}
We might have another function to create the menu. This function would be called in main(),
after calling glutCreateWindow :
function createMenu() {
glutCreateMenu( doMenu ); // Call doMenu() in response to menu commands.
glutAddMenuEntry( "Black Background", 1 );
glutAddMenuEntry( "White Background", 2 );
glutAddMenuEntry( "Blue Background", 3 );
glutAddMenuEntry( "EXIT", 10 );
glutAttachMenu(GLUT RIGHT BUTTON); // Show menu on right-click.
}
It’s possible to have submenus in a menu. I won’t discuss the procedure here, but you can
look at the sample program glut/ifs-polyhedron-viewer.c for an example of using submenus.
∗ ∗ ∗
In addition to window and event handling, GLUT includes some functions for drawing basic
3D shapes such as spheres, cones, and regular polyhedra. It has two functions for each shape,
a “solid” version that draws the shape as a solid object, and a wireframe version that draws
126 CHAPTER 3. OPENGL 1.1: GEOMETRY
something that looks like it’s made of wire mesh. (The wireframe is produced by drawing just
the outlines of the polygons that make up the object.) For example, the function
void glutSolidSphere(double radius, int slices, int stacks)
draws a solid sphere with the given radius, centered at the origin. Remember that this is just an
approximation of a sphere, made up of polygons. For the approximation, the sphere is divided
by lines of longitude, like the slices of an orange, and by lines of latitude, like a stack of disks.
The parameters slices and stacks tell how many subdivisions to use. Typical values are 32 and
16, but the number that you need to get a good approximation for a sphere depends on the size
of the sphere on the screen. The function glutWireframeSphere has the same parameters but
draws only the lines of latitude and longitude. Functions for a cone, a cylinder, and a torus
(doughnut) are similar:
void glutSolidCone(double base, double height,
int slices, int stacks)
void glutSolidTorus(double innerRadius, double outerRadius,
int slices, int rings)
void glutSolidCylinder(double radius, double height,
int slices, int stacks)
// NOTE: Cylinders are available in FreeGLUT and in Java,
// but not in the original GLUT library.
For a torus, the innerRadius is the size of the doughnut hole. The function
void glutSolidCube(double size)
draws a cube of a specified size. There are functions for the other regular polyhedra that
have no parameters and draw the object at some fixed size: glutSolidTetrahedron(), glut-
SolidOctahedron(), glutSolidDodecahedron(), and glutSolidIcosahedron(). There is also glut-
SolidTeapot(size) that draws a famous object that is often used as an example. Here’s what
the teapot looks like:
Wireframe versions of all of the shapes are also available. For example, glutWireTeapot(size)
draws a wireframe teapot. Note that GLUT shapes come with normal vectors that are re-
quired for lighting calculations. However, except for the teapot, they do not come with texture
coordinates, which are required for applying textures to objects.
GLUT also includes some limited support for drawing text in an OpenGL drawing context. I
won’t discuss that possibility here. You can check the API documentation if you are interested,
and you can find an example in the sample program glut/color-cube-of-spheres.c.
3.6. USING GLUT AND JOGL 127
It’s similar for Windows, except that the classpath uses a “;” instead of a “:” to separate the
items in the list:
javac -cp jogl-all.jar;gluegen-rt.jar;. MyOpenGLProg.java
There is an essential period at the end of the classpath, which makes it possible for Java to find
.java files in the current directory. If the jar files are not in the current directory, you can use
full path names or relative path names to the files. For example,
128 CHAPTER 3. OPENGL 1.1: GEOMETRY
Click “OK.” The user library has been created. You will only have to do this once, and then
you can use it in all of your JOGL projects.
Now, to use OpenGL in a project, create a new Java project as usual in Eclipse. Right-click
the project in the Project Explorer view, and select “Build Path” / “Configure Build Path”
from the menu. You will see the project Properties dialog, with “Build Path” selected on the
left. (You can also access this through the “Properties” command in the “Project” menu.)
Select the “Libraries” tab at the top of the window, and then click the “Add Library” button.
In the popup window, select “User Library” and click “Next.” In the next window, select your
3.6. USING GLUT AND JOGL 129
JOGL User Library and click “Finish.” Finally, click “OK” in the main Properties window.
Your project should now be set up to do JOGL development. You should see the JOGL User
Library listed as part of the project in the Project Explorer. Any time you want to start a new
JOGL project, you can go through the same setup to add the JOGL User Library to the build
path in the project.
∗ ∗ ∗
With all that setup out of the way, it’s time to talk about actually writing OpenGL programs
with Java. With JOGL, we don’t have to talk about mouse and keyboard handling or animation,
since that can be done in the same way as in any Java program. You will only need to know
about a few classes from the JOGL API.
First, you need a GUI component on which you can draw using OpenGL. For that, you can
use GLJPanel, which is a subclass of JPanel. (GLJPanel is for use in programs based on the
Swing API; an alternative is GLCanvas, which is a subclass of the older AWT class Canvas.)
The class is defined in the package com.jogamp.opengl.awt. All of the other classes that we will
need for basic OpenGL programming are in the package com.jogamp.opengl.
JOGL uses Java’s event framework to manage OpenGL drawing contexts, and it defines
a custom event listener interface, GLEventListener, to manage OpenGL events. To draw on
a GLJPanel with OpenGL, you need to create an object that implements the GLEventListener
interface, and register that listener with your GLJPanel. The GLEventListener interface defines
the following methods:
public void init(GLAutoDrawable drawable)
public void display(GLAutoDrawable drawable)
public void dispose(GLAutoDrawable drawable)
public void reshape(GLAutoDrawable drawable,
int x, int y, int width, int height)
The drawable parameter in these methods tells which OpenGL drawing surface is involved. It
will be a reference to the GLJPanel. (GLAutoDrawable is an interface that is implemented by
GLJPanel and other OpenGL drawing surfaces.) The init() method is a place to do OpenGL
initialization. (According to the documentation, it can actually be called several times, if the
OpenGL context needs to be recreated for some reason. So init() should not be used to do
initialization that shouldn’t be done more than once.) The dispose() method will be called to
give you a chance to do any cleanup before the OpenGL drawing context is destroyed. The
reshape() method is called when the window first opens and whenever the size of the GLJPanel
changes. OpenGL’s glViewport() function is called automatically before reshape() is called, so
you won’t need to do it yourself. Usually, you won’t need to write any code in dispose() or
reshape(), but they have to be there to satisfy the definition of the GLEventListener interface.
The display() method is where the actual drawing is done and where you will do most of
your work. It should ordinarily clear the drawing area and completely redraw the scene. Take
a minute to study an outline for a minimal JOGL program. It creates a GLJPanel which also
serves as the GLEventListener :
import com.jogamp.opengl.*;
import com.jogamp.opengl.awt.GLJPanel;
import java.awt.Dimension;
import javax.swing.JFrame;
130 CHAPTER 3. OPENGL 1.1: GEOMETRY
∗ ∗ ∗
At this point, the only other thing you need to know is how to use OpenGL functions in the
program. In JOGL, the OpenGL 1.1 functions are collected into an object of type GL2. (There
are different classes for different versions of OpenGL; GL2 contains OpenGL 1.1 functionality,
along with later versions that are compatible with 1.1.) An object of type GL2 is an OpenGL
graphics context, in the same way that an object of type Graphics2D is a graphics context for
ordinary Java 2D drawing. The statement
GL2 gl = drawable.getGL().getGL2();
in the above program obtains the drawing context for the GLAutoDrawable, that is, for the
GLJPanel in that program. The name of the variable could, of course, be anything, but gl or
gl2 is conventional.
3.6. USING GLUT AND JOGL 131
For the most part, using OpenGL functions in JOGL is the same as in C, except that
the functions are now methods in the object gl. For example, a call to glClearColor (r,g,b,a)
becomes
gl.glClearColor(r,g,b,a);
The redundant “gl.gl” is a little annoying, but you get used to it. OpenGL constants such as
GL TRIANGLES are static members of GL2, so that, for example, GL TRIANGLES becomes
GL2.GL TRIANGLES in JOGL. Parameter lists for OpenGL functions are the same as in
the C API in most cases. One exception is for functions such as glVertex3fv () that take an
array/pointer parameter in C. In JOGL, the parameter becomes an ordinary Java array, and
an extra integer parameter is added to give the position of the data in the array. Here, for
example, is how one might draw a triangle in JOGL, with all the vertex coordinates in one
array:
float[] coords = { 0,0.5F, -0.5F,-0.5F, 0.5F,-0.5F };
gl.glBegin(GL2.GL TRIANGLES);
gl.glVertex2fv(coords, 0); // first vertex data starts at index 0
gl.glVertex2fv(coords, 2); // second vertex data starts at index 2
gl.glVertex2fv(coords, 4); // third vertex data starts at index 4
gl.glEnd();
The biggest change in the JOGL API is the use of nio buffers instead of arrays in functions
such as glVertexPointer. This is discussed in Subsection 3.4.3. We will see in Subsection 4.3.9
that texture images also get special treatment in JOGL.
∗ ∗ ∗
The JOGL API includes a class named GLUT that makes GLUT’s shape-drawing functions
available in Java. (Since you don’t need GLUT’s window or event functions in Java, only the
shape functions are included.) Class GLUT is defined in the package com.jogamp.opengl.util.gl2.
To draw shapes using this class, you need to create an object of type GLUT. It’s only necessary
to make one of these for use in a program:
GLUT glut = new GLUT();
The methods in this object include all the shape-drawing functions from the GLUT C API,
with the same names and parameters. For example:
glut.glutSolidSphere( 2, 32, 16 );
glut.glutWireTeapot( 5 );
glut.glutSolidIcosahedron();
(I don’t know why these are instance methods in an object rather than static methods in a
class; logically, there is no need for the object.)
The GLU library is available through the class com.jogamp.opengl.glu.GLU, and it works
similarly to GLUT. That is, you have to create an object of type GLU, and the GLU functions
will be available as methods in that object. We have encountered GLU only for the functions
gluLookAt and gluPerspective, which are discussed in Section 3.3. For example,
GLU glu = new GLU();
One of the goals of computer graphics is physical realism, that is, making images that
look like they could be photographs of reality. This is not the only goal. For example, for
scientific visualization, the goal is to use computer graphics to present information accurately
and clearly. Artists can use computer graphics to create abstract rather than realistic art.
However, realism is a major goal of some of the most visible uses of computer graphics, such as
video games, movies, and advertising.
One important aspect of physical realism is lighting: the play of light and shadow, the way
that light reflects from different materials, the way it can bend or be diffracted into a spectrum
as it passes through translucent objects. The techniques that are used to produce the most
realistic graphics can take all these factors and more into account.
However, another goal of computer graphics is speed. OpenGL, in particular, was designed
for real-time graphics, where the time that is available for rendering an image is a fraction
of a second. For an animated movie, it’s OK if it takes hours to render each frame. But a
video game is expected to render sixty frames every second. Even with the incredible speed of
modern computer graphics hardware, compromises are necessary to get that speed. And twenty
years ago, when OpenGL was still new, the compromises were a lot bigger
In this chapter, we look at light and material in OpenGL 1.1. You will learn how to configure
light sources and how to assign material properties to objects. Material properties determine
how the objects interact with light. And you will learn how to apply an image to a surface as
a texture. The support for light, material, and texture in OpenGL 1.1 is relatively crude and
incomplete, by today’s standards. But the concepts that it uses still serve as the foundation
for modern real-time graphics and, to a significant extent, even for the most realistic computer
graphics.
133
134 CHAPTER 4. OPENGL 1.1: LIGHT AND MATERIAL
material of the surface. A surface can have several different material properties. Before we
study the OpenGL API for light and material, there are a few general ideas about light and
material properties that you need to understand. Those ideas are introduced in this section.
We postpone discussion of how lighting is actually done in OpenGL 1.1 until the next section.
e ection e ection
Incoming Incoming
rays of light Viewer rays of light Viewer
om all
points on the
Viewer sees a
surface reaches
re ection at
the viewer.
just one point
In perfect specular (“mirror-like”) reflection, an incoming ray of light is reflected from the
surface intact. The reflected ray makes the same angle with the surface as the incoming ray. A
viewer can see the reflected ray only if the viewer is in exactly the right position, somewhere
along the path of the reflected ray. Even if the entire surface is illuminated by the light source,
the viewer will only see the reflection of the light source at those points on the surface where
the geometry is right. Such reflections are referred to as specular highlights. In practice, we
think of a ray of light as being reflected not as a single perfect ray, but as a cone of light, which
can be more or less narrow.
e ection
Viewer at center of cone
l re e
Incoming
ray of light
Specular reflection from a very shiny surface produces very narrow cones of reflected light;
specular highlights on such a material are small and sharp. A duller surface will produce wider
cones of reflected light and bigger, fuzzier specular highlights. In OpenGL, the material property
that determines the size and sharpness of specular highlights is called shininess. Shininess in
OpenGL is a number in the range 0 to 128. As the number increases, specular highlights get
smaller. This image shows eight spheres that differ only in the value of the shininess material
property:
4.1. INTRODUCTION TO LIGHTING 135
For the sphere on the left, the shininess is 0, which leads to an ugly specular “highlight” that
almost covers an entire hemisphere. Going from left to right, the shininess increases by 16 from
one sphere to the next.
In pure diffuse reflection, an incoming ray of light is scattered in all directions equally. A
viewer would see reflected light from all points on the surface. If the incoming light arrives in
parallel rays that evenly illuminate the surface, then the surface would appear to the viewer to
be evenly illuminated. (If different rays strike the surface at different angles, as they would if
they come from a nearby lamp or if the surface is curved, then the amount of illumination at
a point depends on the angle at which the ray hits the surface at that point.)
When light strikes a surface, some of the light can be absorbed, some can be reflected
diffusely, and some can be reflected specularly. The amount of reflection can be different for
different wavelengths. The degree to which a material reflects light of various wavelengths is
what constitutes the color of the material. We now see that a material can have two different
colors—a diffuse color that tells how the material reflects light diffusely, and a specular color
that tells how it reflects light specularly. The diffuse color is the basic color of the object. The
specular color determines the color of specular highlights. The diffuse and specular colors can
be the same; for example, this is often true for metallic surfaces. Or they can be different; for
example, a plastic surface will often have white specular highlights no matter what the diffuse
color.
(The demo c4/materials-demo.html in the on-line version of this section lets the user ex-
periment with the material properties that we have discussed so far.)
∗ ∗ ∗
OpenGL goes even further. In fact, there are two more colors associated with a material.
The third color is the ambient color of the material, which tells how the surface reflects
ambient light. Ambient light refers to a general level of illumination that does not come
directly from a light source. It consists of light that has been reflected and re-reflected so many
times that it is no longer coming from any particular direction. Ambient light is why shadows
are not absolutely black. In fact, ambient light is only a crude approximation for the reality of
multiply reflected light, but it is better than ignoring multiple reflections entirely. The ambient
color of a material determines how it will reflect various wavelengths of ambient light. Ambient
color is generally set to be the same as the diffuse color.
The fourth color associated with a material is an emission color , which is not really a
color in the same sense as the first three color properties. That is, it has nothing to do with
how the surface reflects light. The emission color is color that does not come from any external
source, and therefore seems to be emitted by the material itself. This does not mean that the
object is giving off light that will illuminate other objects, but it does mean that the object
can be seen even if there is no source of light (not even ambient light). In the presence of light,
the object will be brighter than can be accounted for by the light that illuminates it, and in
that sense it appears to glow. The emission color is usually black; that is, the object has no
emission at all.
Each of the four material color properties is specified in terms of three numbers giving
the RGB (red, green, and blue) components of the color. Real light can contain an infinite
136 CHAPTER 4. OPENGL 1.1: LIGHT AND MATERIAL
number of different wavelengths. An RGB color is made up of just three components, but
the nature of human color vision makes this a pretty good approximation for most purposes.
(See Subsection 2.1.4.) Material colors can also have alpha components, but the only alpha
component that is ever used in OpenGL is the one for the diffuse material color.
In the case of the red, blue, and green components of the ambient, diffuse, or specular
color, the term “color” really means reflectivity. That is, the red component of a color gives
the proportion of red light hitting the surface that is reflected by that surface, and similarly
for green and blue. There are three different types of reflective color because there are three
different types of light in OpenGL, and a material can have a different reflectivity for each type
of light.
A light can have color. In fact, in OpenGL, each light source has three colors: an ambient
color, a diffuse color, and a specular color. Just as the color of a material is more properly
referred to as reflectivity, color of a light is more properly referred to as intensity or energy.
More exactly, color refers to how the light’s energy is distributed among different wavelengths.
Real light can contain an infinite number of different wavelengths; when the wavelengths are
separated, you get a spectrum or rainbow containing a continuum of colors. Light as it is
usually modeled on a computer contains only the three basic colors, red, green, and blue. So,
just like material color, light color is specified by giving three numbers representing the red,
green, and blue intensities of the light.
The diffuse intensity of a light is the aspect of the light that interacts with diffuse material
color, and the specular intensity of a light is what interacts with specular material color. It is
common for the diffuse and specular light intensities to be the same.
The ambient intensity of a light works a little differently. Recall that ambient light is light
that is not directly traceable to any light source. Still, it has to come from somewhere and
we can imagine that turning on a light should increase the general level of ambient light in
the environment. The ambient intensity of a light in OpenGL is added to the general level of
ambient light. (There can also be global ambient light, which is not associated with any of the
light sources in the scene.) Ambient light interacts with the ambient color of a material, and
this interaction has no dependence on the position of the light sources or viewer. So, a light
4.1. INTRODUCTION TO LIGHTING 137
doesn’t have to shine on an object for the object’s ambient color to be affected by the light
source; the light source just has to be turned on.
I should emphasize again that this is all just an approximation, and in this case not one that
has a basis in the physics of the real world. Real light sources do not have separate ambient,
diffuse, and specular colors, and some computer graphics systems model light sources using just
one color.
The two objects in this picture are made up of bands of rectangles. The two objects have exactly
the same geometry, yet they look quite different. This is because different normal vectors are
138 CHAPTER 4. OPENGL 1.1: LIGHT AND MATERIAL
used in each case. For the top object, the band of rectangles is supposed to approximate a
smooth surface. The vertices of the rectangles are points on that surface, and I really didn’t want
to see the rectangles at all—I wanted to see the curved surface, or at least a good approximation.
So for the top object, when I specified the normal vector at each of the vertices, I used a vector
that is perpendicular to the surface rather than one perpendicular to the rectangle. For the
object on the bottom, on the other hand, I was thinking of an object that really is a band
of rectangles, and I used normal vectors that were actually perpendicular to the rectangles.
Here’s a two-dimensional illustration that shows the normal vectors that were used for the two
pictures:
The thick blue lines represent the rectangles, as seen edge-on from above. The arrows represent
the normal vectors. Each rectangle has two normals, one at each endpoint.
In the bottom half of the illustration, two rectangles that meet at a point have different
normal vectors at that point. The normal vectors for a rectangle are actually perpendicular
to the rectangle. There is an abrupt change in direction as you move from one rectangle to
the next, so where one rectangle meets the next, the normal vectors to the two rectangles
are different. The visual effect on the rendered image is an abrupt change in shading that is
perceived as a corner or edge between the two rectangles.
In the top half, on the other hand, the vectors are perpendicular to a curved surface that
passes through the endpoints of the rectangles. When two rectangles share a vertex, they also
share the same normal at that vertex. Visually, this eliminates the abrupt change in shading,
resulting in something that looks more like a smoothly curving surface.
The two ways of assigning normal vectors are called “flat shading” and “smooth shading”.
Flat shading makes a surface look like it is made of flat sides or facets. Smooth shading makes
it look more like a smooth surface. The on-line demo c4/smooth-vs-flat.html can help you to
understand the difference. It shows a polygonal mesh being used to approximate a sphere, with
your choice of smooth or flat shading.
The upshot of all this is that you get to make up whatever normal vectors suit your purpose.
A normal vector at a vertex is whatever you say it is, and it does not have to be literally
perpendicular to the polygon. The normal vector that you choose should depend on the object
that you are trying to model.
There is one other issue in choosing normal vectors: There are always two possible unit
normal vectors at a point on a surface, pointing in opposite directions. A polygon in 3D has
two faces, facing in opposite directions. OpenGL considers one of these to be the front face
and the other to be the back face. OpenGL tells them apart by the order in which the vertices
are specified. (See Subsection 3.4.1.) The default rule is that the order of the vertices is
4.1. INTRODUCTION TO LIGHTING 139
counterclockwise when looking at the front face and is clockwise when looking at the back face.
When the polygon is drawn on the screen, this rule lets OpenGL tell whether it is the front
face or the back face that is being shown. When specifying a normal vector for the polygon,
the vector should point out of the front face of the polygon. This is another example of the
right-hand rule. If you curl the fingers of your right hand in the direction in which the vertices of
the polygon were specified, then the normal vector should point in the direction of your thumb.
Note that when you are looking at the front face of a polygon, the normal vector should be
pointing towards you. If you are looking at the back face, the normal vector should be pointing
away from you.
It can be a difficult problem to come up with the correct normal vectors for an object.
Complex geometric models often come with the necessary normal vectors included. This is
true, for example, for the solid shapes drawn by the GLUT library.
What does it actually mean to say that OpenGL performs “lighting calculations”? The goal of
the calculation is to produce a color, (r,g,b,a), for a point on a surface. In OpenGL 1.1, lighting
calculations are actually done only at the vertices of a primitive. After the color of each vertex
has been computed, colors for interior points of the primitive are obtained by interpolating the
vertex colors.
The alpha component of the vertex color, a, is easy: It’s simply the alpha component of the
diffuse material color at that vertex. The calculation of r, g, and b is fairly complex and rather
mathematical, and you don’t necessarily need to understand it. But here is a short description
of how it’s done. . .
Ignoring alpha components, let’s assume that the ambient, diffuse, specular, and emission
colors of the material have RGB components (mar ,mag ,mab ), (mdr ,mdg ,mdb ), (msr ,msg ,msb ),
and (mer ,meg ,meb ), respectively. Suppose that the global ambient intensity, which represents
ambient light that is not associated with any light source in the environment, is (gar ,gag ,gab ).
There can be several point and directional light sources, which we refer to as light number 0,
light number 1, light number 2, and so on. With this setup, the red component of the vertex
color will be:
where I0,r is the contribution to the color that comes from light number 0; I1,r is the con-
tribution from light number 1; and so on. A similar equation holds for the green and blue
components of the color. This equation says that the emission color, mer , is simply added to
any other contributions to the color. And the contribution of global ambient light is obtained
by multiplying the global ambient intensity, gar , by the material ambient color, mar . This is
the mathematical way of saying that the material ambient color is the fraction of the ambient
light that is reflected by the surface.
The terms I0,r , I1,r , and so on, represent the contribution to the final color from the various
light sources in the environment. The contributions from the light sources are complicated.
Consider just one of the light sources. Note, first of all, that if a light source is disabled (that
is, if it is turned off), then the contribution from that light source is zero. For an enabled light
source, we have to look at the geometry as well as the colors:
140 CHAPTER 4. OPENGL 1.1: LIGHT AND MATERIAL
Light Source
N Viewer
L R
Surface
In this illustration, N is the normal vector at the point whose color we want to compute. L is
a vector that points back along the direction from which the light arrives at the surface. V is
a vector that points in the direction of the viewer. And R is the direction of the reflected ray,
that is, the direction in which a light ray from the source would be reflected specularly when
it strikes the surface at the point in question. The angle between N and L is the same as the
angle between N and R; this is a basic fact about the physics of light. All of the vectors are
unit vectors, with length 1. Recall that for unit vectors A and B, the inner product A · B is
equal to the cosine of the angle between the two vectors. Inner products occur at several points
in the lighting equation, as the way of accounting for the angles between various vectors.
Now, let’s say that the light has ambient, diffuse, and specular color components (lar ,lag ,lab ),
(ldr ,ldg ,ldb ), and (lsr ,lsg ,lsb ). Also, let mh be the value of the shininess property of the mate-
rial. Then, assuming that the light is enabled, the contribution of this light source to the red
component of the vertex color can be computed as
Ir = lar *mar + f*( ldr *mdr *(L·N) + lsr *msr *max(0,V·R)mh )
with similar equations for the green and blu