0% found this document useful (0 votes)

222 views864 pages

Advanced Graphics Game Programming

This document summarizes an article about using RenderMonkey, a shader authoring tool created by ATI, to implement modular HLSL shaders. It discusses how RenderMonkey allows shader programmers to focus on writing shaders without worrying about infrastructure integration. RenderMonkey uses XML files to define shaders and effects, allowing them to be imported and exported for use in game rendering loops. The document provides an overview of HLSL and how shaders fit into the graphics pipeline, and demonstrates how to get started using basic shaders with RenderMonkey.

Uploaded by

Proyec Yectos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

222 views864 pages

Advanced Graphics Game Programming

Uploaded by

Proyec Yectos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 864

Gama Network Presents:

Implementing Modular HLSL with

RenderMonkey
By Ron Fosner
Gamasutra
May 14, 2003

URL: http://www.gamasutra.com/features/20030514/fosner_01.shtml

One of the largest problems with getting shaders into a game seems to be the learning curve
associated with shaders. Simply stated, shaders are not something that your lead graphics
programmer can implement over the weekend. There are two main issues with getting shaders
implemented in your game:

1. Understanding what shaders can do and how they replace the existing graphics pipeline.
2. Getting the supporting code implemented into your game so that you can use shaders as a
resource.

In this article we're going to continue the series of Gamasutra articles about shaders by
examining how to make shaders work. The actual integration of shader support is the stuff for a
future article. (Note: You don't need a high-end video card to try your hand at writing shaders.
All you need is the DirectX 9.0 SDK installed. With that you can select the reference device
(REF). While this software driver will be slow, it'll still give you the same results as DirectX 9
capable video card.) RenderMonkey works on any hardware that supports shaders, not just
ATI's hardware.

If you have already read Wolfgang Engel's article, Mark Kilgard's and Randy Fernando's Cg
article or you've perused the DirectX 9 SDK documentation, then you've got a fairly good idea
of the capabilities of the High-Level Shader Language (HLSL) that's supported by DirectX 9.
HLSL, Cg, and the forthcoming OpenGL shading language are all attempts to make it as easy to
write shaders as possible. You no longer have to worry (as much) about allocating registers,
using scratch variables, or learning a new form of assembly language. Instead, once you've set
up your stream data format and associated your constant input registers with more user-
friendly labels, using shaders in a program is no more difficult than using a texture.

Rather than go through the tedious setup on how to use shaders in your program, I'll refer you
to the DirectX 9 documentation. Instead I'm going to focus on a tool ATI created called
RenderMonkey. While RenderMonkey currently works on DirectX high and low-level shader
languages, ATI and 3Dlabs are working to implement support for OpenGL 2.0's shader language
in RenderMonkey that we should see in the next few months. The advantage of a tool like
RenderMonkey is that it lets you focus on writing shaders, not worrying about infrastructure. It
has a nice hierarchical structure that lets you set up a default rendering environment and make
changes at lower levels as necessary. Perhaps the biggest potential advantage of using
RenderMonkey is that the RenderMonkey files are XML files. Thus by adding a RenderMonkey
XML importer to your code or an exporter plug-in to RenderMonkey you can use RenderMonkey
files in your rendering loop to set effects for individual passes. This gives RenderMonkey an
advantage over DirectX's FX files because you can use RenderMonkey as an effects editor.
RenderMonkey even supports an "artist's mode" where only selected items in a pass are
editable.

Using HLSL

While HLSL is very C-like in its semantics, there is the challenge of relating the input and output
of the shaders with what is provided and expected by the pipeline. While shaders can have
constants set prior to their execution, when a primitive is rendered (i.e. when some form of a
DrawPrimitive call is made) then the input for each vertex shader is the vertex values
provided in the selected vertex streams. After each vertex shader call the pipeline then breaks
that vertex call into individual pixel calls and uses the (typically) interpolated values as input to
the pixel shader, which then calculates the resulting color(s) as output from the pixel shader.
This is shown in Figure 1, where the path from application space, through vertex processing
then finally to a rendered pixel is shown. The application space shows where shaders and
constants are set in blue text. The blue boxes show where vertex and pixel shaders live in the
pipeline.
Figure 1. How shaders fit into the graphics pipe

The inputs to the vertex shader function contain the things you'd expect like position, normals,
colors, etc. HLSL can also use things like blend weights and indices (used for things like
skinning), and tangents and binormals (used for various shading effects). The following tables
show the inputs and output for vertex and pixel shaders. The [n] notation indicates an optional
index.
The output of vertex shaders hasn't changed from the DirectX 8.1 days. You can have up to two
output colors, eight output texture coordinates, the transformed vertex position, and a fog and
point size value.

The output from the vertex shader is used to calculate the input for the pixel shaders. Note
there is nothing preventing you from placing any kind of data into the vertex shader's color or
texture coordinate output registers and using them for some other calculations in the pixel
shader. Just keep in mind that the output registers might be clamped and range limited,
particularly on hardware that doesn't support 2.0 shaders.
DirectX 8 pixel shaders supported only a single color register to specify the final color of a pixel.
DirectX 9 has support for multiple render targets (for example, the back buffer and a texture
surface simultaneously) and multi-element textures (typically used to generate intermediate
textures used in a later pass). However you'll need to check the CAPS bits to see what's
supported by your particular hardware. For more information, check the DirectX 9
documentation. While RenderMonkey supports rendering to a texture on one pass and reading
it in another, I'm going to keep the pixel shader simple in the following examples.

Aside from the semantics of the input and output mapping, HLSL gives you a great deal of
freedom to create shader code. In fact, HLSL looks a lot like a version of "C" written for
graphics. (Which is why NVIDIA calls their "C" like shader language Cg, as in "C-for-Graphics").
If you're familiar with C (or pretty much any procedural programming language) you can pick
up HLSL pretty quickly. What is a bit intimidating if you're not expecting it is the graphics traits
of the language itself. Not only are there the expected variable types of boolean, integer and
float, but there's also native support for vectors, matrices, and texture samplers, as well as
swizzles and masks for floats, that allow you to selectively read, write, or replicate individual
elements of vectors and matrices.

This is due to the single-instruction multiple-data (SIMD) nature of the graphics hardware. An
operation such as;

results in an element-by-element multiplication since type vector is an array of four floats. This
is the same as:

where I've used the element selection swizzle and write masks to show the individual
operations. Since the hardware is designed to operate on vectors, performing an operation on a
vector is just as expensive as performing one on a single float. A ps_1_x pixel shader can
actually perform one operation on the red-green-blue elements of a vector while simultaneously
performing a different operation on the alpha element.

In addition to graphics oriented data types there is also a collection of intrinsic functions that
are oriented to graphics, such as dot product, cross product, vector length and normalization
functions, etc. The language also supports things like multiplication of vectors by matrices and
the like. Talking about it is one thing, but it's much easier to comprehend when you have an
example of in front of you, so let's start programming.

HLSL with RenderMonkey

When you first open RenderMonkey, you'll be greeted with a blank workspace. The first thing to
do is create an Effect Group. To do this, right-click on the Effect Workspace item in the
RenderMonkey Workspace view and select Add Effect Group. This will add a basic Effect Group
that will contain editable effects elements. If you have the same capabilities as the default
group (currently a RADEON 8500, GeForceFX or better) then you'll see a red teapot. If you're
running on older hardware (like a GeForce3) then you'll have to edit the pixel shader version in
the default effect from ps 1.4 to ps 1.1.

RenderMonkey automatically creates a vertex stream mapping for the positional data of the
model, places the view/projection matrix in a shader constant for you, and creates the high
level vertex and pixel shaders for you. The default vertex shader is shown below:

The default vertex shader in RenderMonkey.

Both the high-level vertex and pixel shader editor windows have three areas. The top area lets
you manage the interface between "external" parameters (either RenderMonkey supplied or
user-created variables) and the shader and lets you pick the target shader version. The middle
area is a read-only area that shows the parameter declaration block used by the HLSL. When
you add a parameter to an effect, it will become available as an external parameter, and the
parameter declaration block lets you see the association between these parameters and the
shader registers. The bottom area contains the actual shader code that you edit directly. In
Figure 1, you can see that the RenderMonkey supplied view/projection matrix is mapped to
shader constant c0 (c0 though c3 is implied by the float4x4 mapping), and this name is used
in the actual vertex shader. These variables can be considered global declarations. The input
variables from the vertex stream show up as the parameters to the entry point function,
typically called main.

As you can see in the Figure 1, RenderMonkey has provided the minimal shader as the default.
The default vertex shader transforms the incoming vertex position by the view/projection
matrix while the default pixel shader (not shown) sets the outgoing pixel color to red. You can
edit the shader code in the lower window till you get the shader you want. To see what the
shader looks like, click on the Commit Changes button on the main toolbar (or press F7) to
internally save and compile the shader. If the shader has any errors, there will be an
informative message displayed in the output pane at the bottom of the RenderMonkey window.
If the shader compiled successfully, then you'll immediately see the shader results in the
preview window.

And that's about all you need to know to edit shaders in RenderMonkey! The interface is very
intuitive - just about everything can be activated or edited by double-clicking. You can insert
nodes to add textures, set render state, or add additional passes with just a few clicks. The
documentation for RenderMonkey comes with the RenderMonkey download and is also available
at http://www.ati.com/developer/sdk/radeonSDK/html/Tools/RenderMonkey.html, along with a
number of documents on using RenderMonkey.

Finally, you'll need to know some internal variables that are available to RenderMonkey, shown
in Figure 2. If you add the RenderMonkey names (case sensitive) as variables they'll be
connected to the internal RenderMonkey variables. The time-based values are vectors, but all
elements are the same value. You can use these to vary values programmatically instead of
connecting a variable to a slider.

Writing Modular code in HLSL

If you've been writing low-level shader code, you probably haven't been thinking about writing
modular code. It's tough to think modularly when you don't have any support in the language
for any type of control statements. And surprisingly, there's still no actual support for modular
code. . A shader written in HLSL still compiles to a monolithic assembly shader. However, the
HLSL compiler does hide a lot of the details and does let you write like we can write a modular
shader. I mention this because it's easy to get lulled into thinking that you're working with a
mature language, not one that's less than a year old. You should be aware of these limitations.
There's no support (yet) for recursion. All functions are inlined. Function parameters are passed
by value. Statements are always evaluated entirely - there's no short-circuited evaluation as in
a C program.

Even with those limitations, it's surprisingly easy to write modular code. In Wolfgang Engel's
article, he discussed the lighting equation for computing the intensity of the light at a surface as
the contribution of the ambient light, the diffuse light and the specular light.

I've made a slight change by adding in a term for the light color and intensity, which multiplies
the contributions from the diffuse and specular terms and by using I for intensity and C for
color. Note that the color values are RGBA vectors, so there are actually four color elements
that will get computed by this equation. HLSL will automatically do the vector multiplication for
us. Wolfgang also created a HLSL shader for this basic lighting equation, so if you're new to
HLSL you might want to review what he wrote, since I'm going to build on his example.

Let's rewrite the basic shader, setting things up so that we can modularize our lighting
functions. If I add a color element to the output structure (calling it Color1), we can edit the
main function to add in the vertex normal as a parameter from the input stream and write the
output color. Insert two scalar variables, Iamb for ambient intensity and Camb for ambient color
(correspond the above equation) in the RenderMonkey workspace. This will allow us to
manipulate these variables from RenderMonkey's variable interface. RenderMonkey has a very
nice interface that supports vectors, scalars, and colors quite intuitively. To implement the
lighting equation we'll need to compute the lighting vector and the view vector, so I added
these calculations for later use. The ambient lighting values and light properties (position and
color) need to be provided to RenderMonkey by assigning them to variables. The basic vertex
shader computing the output color from the product of the ambient intensity and the ambient
color looks like this.
Note that vector is a HLSL native type for an array of four floats, it's the same as writing
float4. Also note the use of swizzles when calculating the normalized vectors - this leaves the
vector's w parameter out of the calculation. I also modified the default pixel shader to simply
pass along the color created in the vertex shader as shown below. This simple pixel shader
simply returns the (interpolated) color provided by the vertex shader.

Functions in HLSL

So let's start off by making the ambient calculation a function just to see how it's done in HLSL.
Making the ambient calculation a function is pretty simple.

The static inline attributes are optional at this point, but I've placed them there to emphasize
that currently all functions are inlined, so creating and using a function like this adds no
overhead to the shader. This Ambient() function just computes the ambient color and returns
it.

Creating the Diffuse function requires that we pass in the lighting vector and the normal vector.
In addition to the argument type description you'd expect to see in a C program, HLSL allows
you to specify if a value is strictly input, output or both through the in, out and inout
attributes. A parameter that is specified as out or inout will be copied back to the calling code,
allowing functions another way to return values. If not specified, in is assumed. Since this
diffuse equation is an implementation of what's called a Lambertian diffuse, I've named it as
such. The LambertianDiffuse() function looks like this.

Note the use of the HLSL intrinsic dot product function. The specular equation is taken from
Phong's lighting equation and requires calculation of the reflection vector. The reflection vector
is calculated from the normalized normal and light vectors.

The dot product of the reflection vector and the view vector is raised to a power that is
inversely proportional to the roughness of the surface. This is a more intuitive value than letting
a user specify a specular power value. To limit the specular contribution to only the times when
the angle between these vectors is less than 90 degrees, we limit the dot product to only
positive values. The specular color contribution becomes;

Implementing this in HLSL looks like the following:

Note the use of the intrinsic saturate function to limit the range from the dot product to [0,1].
Roughness is added to the RenderMonkey Effect Workspace and added in the shader editor as a
parameter.

Using these functions we can now implement our main shader function as follows:
The three functions that we added are either placed above the main function or below, in which
case you'd need to add a function prototype. As you can see, it's fairly easy to write functional
modules in HLSL code.

Finally, Modular Code

The real utility of this comes when we create modules that can replace other modules. For
example, suppose that you wanted to duplicate the original functionality of the fixed-function-
pipeline, which implemented a particular type of specular called Blinn-Phong. This particular
specular lighting equation is similar to Phong's but uses something called the half-angle vector
instead of the reflection vector. An implementation of it looks like this:

To change our shader to use Blinn-Phong, all we need to do is change the function we call in
main. The color computation would look like this;
Since all of these functions are inlined, any unused code is optimized out from the shader. As
long as there's no reference to a function from main or any of the functions that are called from
main, then we can pick which implementation we want in our shader code simply by selecting
the functions we want, and we don't have to worry about unused code since it's not included in
the compiled shader.

As we get more real-time programmability it becomes easier to implement features that have
been in the artist's domain for years. Suppose your art lead creates some really cool scenes
that look great in Maya™ or 3DS Max™, but don't look right because the Lambertian diffuse in
your engine makes everything look like plastic? Why can't you just render with the same
shading options that Maya has? Well, now you can! If your artist really has to have gentler
diffuse tones provided by Oren-Nayar diffuse shading, then you can now implement it.

Oren-Nayar Diffuse Diffuse Shading

One of the problems of the standard Lambertian model is that it considers the reflecting surface
as a smooth diffuse surface. Surfaces that are really rough, like stone, dirt, and sandpaper
exhibit much more of a backscattering effect, particularly when the light source and the view
direction are in the same direction.

The classic example is of a full moon shown in Figure 3. If you look at the picture of the moon,
it's pretty obvious that this doesn't follow the Lambertian distribution - if it did the edges of the
moon would be in near darkness. In fact the edges look as bright as the center of the moon.
This is because the moon's surface is rough - the surface is made of a jumble of dust and rock
with diffuse reflecting surfaces at all angles - thus the quantity of reflecting surfaces is uniform
no matter the orientation of the surface, hence no matter the orientation of the surface to the
viewer the amount of light reflecting off of any point on the surface is nearly the same.
Figure 3. On rough surfaces like that of the moon, the amount of
light reflecting off of any point on the surface is nearly the same.

In an effort to better model rough surfaces, Oren and Nayar came up with a generalized version
of a Lambertian diffuse shading model that tries to account for the roughness of the surface.
They took a theoretical model and simplified it to the terms that had the most significant
impact. The Oren-Nayar diffuse shading model looks like this;

Now this may look daunting, but it can be simplified to something we can appreciate if we
replace the original notation with the notation we've already been using. p is a surface
reflectivity property, which we can replace with our surface color. E0 is a light input energy
term, which we can replace with our light color. And the 0i term is just our familiar angle
between the vertex normal and the light direction. Making these exchanges give us;
which looks a little easier to compute. There are still some parameters to explain.

Note that if the roughness value is zero, the model is the same as the Lambertian diffuse
model. Since this model gives a closer visual representation to rough surfaces such as sand,
plaster, dirt, and unglazed clay than Labertian shading, it's become a popular shading model in
most 3D graphics modeling packages. With HLSL, it's fairly easy to write your own version of an
Oren-Nayar diffuse shader. The shader code below is based upon a RenderMan shader written
by Larry Gritz. Using this function will probably make the entire shader is so long it requires
that your hardware supports 2.0 shaders or you run on the reference rasterizer.

In most implementations this is paired up with a Phong or Blinn-Phong specular term.

I hope that you're getting the idea that it's pretty easy to write snippets of code for specific
purposes and place them in a library. When I was writing my book on shaders I focused more
on writing it such that I had a variety of shader subroutines rather than just a collection of
stand-alone shaders. As you can see this approach is very powerful and allows you to pick and
choose the pieces that make up the shader to customize the overall effect you want to realize.

Like C, HLSL supports the #include preprocessor directive, but only when compiling from a file
- currently RenderMonkey doesn't implement #include. The filename specified can be either an
absolute or relative path. If it's a relative path then it's assumed to be relative to the directory
of the file issuing the #include. Unlike C, there's no environmental variable support, so the
angle bracket include notation isn't supported, just the include file name in quotation marks.
It's easy to see that when function overloading gets implemented it's going to be very easy to
quickly write shader code that's easy to customize. For now you can use the preprocessor and
some #ifdef / #else / #endif directives to #define your own shading equations.

Shading Outside the Box

There's no reason to be stuck with the lighting equation that we've been working with. Shaders
give you the ability to create whatever shading effect you want and I encourage you to try
creating you own lighting equations, either by implementing academic models such as Oren-
Nayar, or creating your own. Cel shading is a simple example of non-photo-realistic (NPR)
rendering, but there are many, many artistic styles that are starting to show up in computer
graphics, just check out the SIGGRAPH proceedings since 1999. You can also look to the real
world for inspiration as well. There's a beautiful example of this type of shading done by ATI to
demonstrate the RADEON 9700. In order to duplicate the deep, color-shifting hues seen on
metallic paint jobs on cars, ATI created a demo that has (among other effects) a dual-specular
highlight term. This creates a highlight of one color surrounded by a highlight of a different
color as seen in a closeup of the car's side mirror in Figure 4.

Figure 4. An example of dual specular-highlight effects.

The metallic flakes are from a noise map and the environment mapping finishes off the effect.
As shading hardware becomes more powerful and commonplace you'll start to see more and
more creative shading show up in games and then in mainstream applications. The next release
of the Windows OS is rumored to be designed to natively support graphics hardware
acceleration for the entire desktop, and programmable shading is going to be a big part of that.
With the prices of DirectX 9 (and OpenGL 2.0) capable hardware continually dropping, if your
current project doesn't incorporate shaders, you haven't investigated HLSL, or the low-level
shader language intimidated you, I hope this article has shown you that not only is writing HLSL
easy, but with tools like RenderMonkey you can be writing shaders within minutes.

Article Reviewers

The author would like to thank the following individuals for reviewing this article prior to
publication: Wolfgang Engel, Randy Fernando, Tadej Fius, Muhammad Haggag, Callan McInally,
Jason Mitchell, Harald Nowak, Guimo Rodriguez, and Natasha Tatarchuk.

Resources

RenderMonkey
The executable and documentation for RenderMonkey can be found at
www.ati.com/developer/sdk/radeonSDK/html/Tools/RenderMonkey.html

Cg
While not HLSL, it's pretty close. You can learn more about it at
http://developer.nvidia.com/Cg, or http://www.cgshaders.org/.

DirectX 9
The Microsoft DirectX 9 documentation is pretty sparse on HLSL, but it's there for you to puzzle
out.

Shader Books
For DirectX shaders there's ShaderX by Engel, Real-Time Shader Programming by Fosner.
There's two ShaderX2 additional books coming out soon as well. Cg is covered by The Cg
Tutorial by Fernando and Kilgard. Real-Time Shading by Olano, et. al. is more about current
shader research, but it's a useful source of information if your interested in delving further into
the state-of-the-art.

Illumination Texts
Unfortunately most graphics texts gloss over all but the simplest shading models. Most of the
older ones can be found in Computer Graphics by Foley, van Dam, et. al., with the newer ones
in Principles of Digital Image Synthesis, by Glassner. Quite a few of the original papers can be
found online as well. The RenderMan Companion by Upstill and Advanced RenderMan by
Apodaca and Gritz are really useful sources of inspiration.

Copyright © 2003 CMP Media Inc. All rights reserved.

Gama Network Presents:

Animation With Cg
By Randima Fernando and Mark J. Kilgard
Gamasutra
March 25, 2003

URL: http://www.gamasutra.com/features/20030325/fernando_01.shtml

What is Cg? The Cg language makes it possible for you to control the shape, Excerpted
appearance, and motion of objects drawn using programmable graphics From:
hardware. It marries programmatic control of these attributes with the
incredible speed and capabilities of today's graphics processors. Never
before have computer graphics practitioners, whether artists or
programmers, had so much control over the real-time images they
generate.

Cg provides developers with a complete programming platform that is easy

to use and enables the fast creation of special effects and real-time
cinematic-quality experiences on multiple platforms. By providing a new
level of abstraction, Cg removes the need for developers to program directly [More info...]
to the graphics hardware assembly language, and 2 thereby more easily
target OpenGL, DirectX, Windows, Linux, Macintosh OS X, and console platforms such as the
Xbox. Cg was developed in close collaboration with Microsoft Corporation and is compatible with
both the OpenGL API and Microsoft's High-Level Shading Language (HLSL) for DirectX 9.0.

Cg stands for "C for graphics." The C programming language is a popular, general purpose
language invented in the 1970s. Because of its popularity and clean design, C provided the
basis for several subsequent programming languages. For example, C++ and Java base their
syntax and structure largely on C. The Cg language bases itself on C as well. If you are familiar
with C or one of the many languages derived from C, then Cg will be easy to learn.

On the other hand, if you are not familiar with C or even programming languages in general but
you enjoy computer graphics and want to learn something new, read on anyway. Cg programs
tend to be short and understandable.

A Language for Programming Graphics Hardware

Cg is different from C, C++, and Java because it is very specialized. No one will ever write a
spreadsheet or word processor in Cg. Instead, Cg targets the ability to programmatically control
the shape, appearance, and motion of objects rendered using graphics hardware. Broadly, this
type of language is called a shading language. However, Cg can do more than just shading. For
example, Cg programs can perform physical simulation, compositing, and other nonshading
tasks.

Think of a Cg program as a detailed recipe for how to render an object by using programmable
graphics hardware. For example, you can write a Cg program to make a surface appear bumpy
or to animate a virtual character. Later you will learn more about the history of shading
languages and where Cg fits into this history.
Cg's Data-Flow Model

In addition to being specialized for graphics, Cg and other shading languages are different from
conventional programming languages because they are based on a data- 3 flow computational
model. In such a model, computation occurs in response to data that flows through a sequence
of processing steps.

Cg programs operate on vertices and fragments (think "pixels" for now if you do not know what
a fragment is) that are processed when rendering an image. Think of a Cg program as a black
box into which vertices or fragments flow on one side, are somehow transformed, and then flow
out on the other side. However, the box is not really a black box because you get to determine,
by means of the Cg programs you write, exactly what happens inside.

Every time a vertex is processed or the rasterizer generates a fragment while rendering a 3D
scene, your corresponding vertex or fragment Cg program executes.

Most recent personal computers-and all recent game consoles-contain a graphics processing
unit (GPU) that is dedicated to graphics tasks such as transforming and rasterizing 3D models.
Your Cg programs actually execute within the GPU of your computer.

GPU Specialization and CPU Generalization

Whether or not a personal computer or game console has a GPU, there must be a CPU that runs
the operating system and application programs. CPUs are, by design, general purpose. CPUs
execute applications (for example, word processors and accounting packages) written in
general-purpose languages, such as C++ or Java.

Because of the GPU's specialized design, it is much faster at graphics tasks, such as rendering
3D scenes, than a general-purpose CPU would be. New GPUs process tens of millions of vertices
per second and rasterize hundreds of millions or even billions of fragments per second. Future
GPUs will be even speedier. This is overwhelmingly faster than the rate at which a CPU could
process a similar number of vertices and fragments. However, the GPU cannot execute the
same arbitrary, general-purpose programs that a CPU can.

The specialized, high-performance nature of the GPU is why Cg exists. Generalpurpose

programming languages are too open-ended for the specialized task of processing vertices and
fragments. In contrast, the Cg language is fully dedicated to this task. Cg also provides an
abstract execution model that matches the GPU's execution model. You will learn about the
unique execution model of GPUs in coming sections.

The Performance Rationale for Cg

To sustain the illusion of interactivity, a 3D application needs to maintain an animation rate of

15 or more images per second. Generally, we consider 60 or more frames per second to be
"real time," the rate at which interaction with applications appears to occur instantaneously.
The computer's display may have a million or more pixels that require redrawing. For 3D
scenes, the GPU typically processes every pixel on the screen many times to account for how
objects occlude each other, or to improve the appearance of each pixel. This means that real-
time 3D applications can require hundreds of millions of pixel updates per second. Along with
the required pixel processing, 3D models are composed of vertices that must be transformed
properly before they are assembled into polygons, lines, and points that will be rasterized into
pixels. This can require transforming tens of millions of vertices per second.

Moreover, this graphical processing happens in addition to the considerable amount of effort
required of the CPU to update the animation for each new image. The reality is that we need
both the CPU and the GPU's specialized graphics-oriented capabilities. Both are required to
render scenes at the interactive rates and quality standards that users of 3D applications and
games demand. This means a developer can write a 3D application or game in C++ and then
use Cg to make the most of the GPU's additional graphics horsepower.

Coexistence with Conventional Languages

In no way does Cg replace any existing general-purpose languages. Cg is an auxiliary language,

designed specifically for GPUs. Programs written for the CPU in conventional languages such as
C or C++ can use the Cg runtime to load Cg programs for GPUs to execute. The Cg runtime is a
standard set of subroutines used to load, compile, manipulate, and configure Cg programs for
execution by the GPU. Applications supply Cg programs to instruct GPUs on how to accomplish
the programmable rendering effects that would not otherwise be possible on a CPU at the
rendering rates a GPU is capable of achieving.

Cg enables a specialized style of parallel processing. While your CPU executes a conventional
application, that application also orchestrates the parallel processing of vertices and fragments
on the GPU, by programs written in Cg.

If a real-time shading language is such a good idea, why didn't someone invent Cg sooner? The
answer has to do with the evolution of computer graphics hardware. Prior 5 to 2001, most
computer graphics hardware-certainly the kind of inexpensive graphics hardware in PCs and
game consoles-was hard-wired to the specific tasks of vertex and fragment processing. By
"hard-wired," we mean that the algorithms were fixed within the hardware, as opposed to being
programmable in a way that is accessible to graphics applications. Even though these hard-
wired graphics algorithms could be configured by graphics applications in a variety of ways, the
applications could not reprogram the hardware to do tasks unanticipated by the designers of
the hardware. Fortunately, this situation has changed.

Graphics hardware design has advanced, and vertex and fragment processing units in recent
GPUs are truly programmable. Before the advent of programmable graphics hardware, there
was no point in providing a programming language for it. Now that such hardware is available,
there is a clear need to make it easier to program this hardware. Cg makes it much easier to
program GPUs in the same manner that C made it much easier to program CPUs.

Before Cg existed, addressing the programmable capabilities of the GPU was possible only
through low-level assembly language. The cryptic instruction syntax and manual hardware
register manipulation required by assembly languages-such as DirectX 8 vertex and pixel
shaders and some OpenGL extensions-made it a painful task for most developers. As GPU
technology made longer and more complex assembly language programs possible, the need for
a high-level language became clear. The extensive low-level programming that had been
required to achieve optimal performance could now be delegated to a compiler, which optimizes
the code output and handles tedious instruction scheduling. Figure 1-1 is a small portion of a
complex assembly language fragment program used to represent skin. Clearly, it is hard to
comprehend, particularly with the specific references to hardware registers.

In contrast, well-commented Cg code is more portable, more legible, easier to debug, and
easier to reuse. Cg gives you the advantages of a high-level language such as C while delivering
the performance of low-level assembly code.

Other Aspects of Cg

Cg is a language for programming "in the small." That makes it much simpler than a modern
general-purpose language such as C++. Because Cg specializes in transforming vertices and
fragments, it does not currently include many of the complex features required for massive
software engineering tasks. Unlike C++ and Java, Cg does not support classes and other
features used in object-oriented programming. Current Cg 6 implementations do not provide
pointers or even memory allocation (though future implementations may, and keywords are
appropriately reserved). Cg has absolutely no support for file input/output operations. By and
large, these restrictions are not permanent limitations in the language, but rather are indicative
of the capabilities of today's highest performance GPUs. As technology advances to permit more
general programmability on the GPU, you can expect Cg to grow appropriately. Because Cg is
closely based on C, future updates to Cg are likely to adopt language features from C and C++.

. . .
DEFINE LUMINANCE = {0.299, 0.587, 0.114, 0.0};
TEX H0, f[TEX0], TEX4, 2D;
TEX H1, f[TEX2], TEX5, CUBE;
DP3X H1.xyz, H1, LUMINANCE;
MULX H0.w, H0.w, LUMINANCE.w;
MULX H1.w, H1.x, H1.x;
MOVH H2, f[TEX3].wxyz;
MULX H1.w, H1.x, H1.w;
DP3X H0.xyz, H2.xzyw, H0;
MULX H0.xyz, H0, H1.w;
TEX H1, f[TEX0], TEX1, 2D;
TEX H3, f[TEX0], TEX3, 2D;
MULX H0.xyz, H0, H3;
MADX H1.w, H1.w, 0.5, 0.5;
MULX H1.xyz, H1, {0.15, 0.15, 1.0, 0.0};
MOVX H0.w, H1.w;
TEX H1, H1, TEX7, CUBE;
TEX H3, f[TEX3], TEX2, 1D;
MULX H3.w, H0.w, H2.w;
MULX H3.xyz, H3, H3.w;
. . .
Figure 1-1. A Snippet of Assembly Language Code

Cg provides arrays and structures. It has all the flow-control constructs of a modern language:
loops, conditionals, and function calls.

Cg natively supports vectors and matrices because these data types and related math
operations are fundamental to graphics and most graphics hardware directly supports vector
data types. Cg has a library of functions, called the Standard Library, that is well suited for the
kind of operations required for graphics. For example, the Cg Standard Library includes a reflect
function for computing reflection vectors. Cg programs execute in relative isolation. This means
that the processing of a particular vertex or fragment has no effect on other vertices or
fragments processed at the same time. There are no side effects to the execution of a Cg
program. This lack of interdependency among vertices and fragments makes Cg programs
extremely well suited for hardware execution by highly pipelined and parallel hardware.

The Limited Execution Environment of Cg Programs

When you write a program in a language designed for modern CPUs using a modern operating
system, you expect that a more-or-less arbitrary program, as long as it is correct, will compile
and execute properly. This is because CPUs, by design, execute general-purpose programs for
which the overall system has more than sufficient resources.

However, GPUs are specialized rather than general-purpose, and the feature set of GPUs is still
evolving. Not everything you can write in Cg can be compiled to execute on a given GPU. Cg
includes the concept of hardware "profiles," one of which you specify when you compile a Cg
program. Each profile corresponds to a particular combination of GPU architecture and graphics
API. Your program not only must be correct, but it also must limit itself to the restrictions
imposed by the particular profile used to compile your Cg program. For example, a given
fragment profile may limit you to no more than four texture accesses per fragment.

As GPUs evolve, additional profiles will be supported by Cg that correspond to more capable
GPU architectures. In the future, profiles will be less important as GPUs become more full-
featured. But for now Cg programmers will need to limit programs to ensure that they can
compile and execute on existing GPUs. In general, future profiles will be supersets of current
profiles, so that programs written for today's profiles will compile without change using future
profiles.

This situation may sound limiting, but in practice the Cg programs shown in this book work on
tens of millions of GPUs and produce compelling rendering effects. Another reason for limiting
program size and scope is that the smaller and more efficient your Cg programs are, the faster
they will run. Real-time graphics is often about balancing increased scene complexity,
animation rates, and improved shading. So it's always good to maximize rendering efficiency
through judicious Cg programming.

Keep in mind that the restrictions imposed by profiles are really limitations of current GPUs, not
Cg. The Cg language is powerful enough to express shading techniques that are not yet possible
with all GPUs. With time, GPU functionality will evolve far enough that Cg profiles will be able to
run amazingly complex Cg programs. Cg is a language for both current and future GPUs.

Vertices, Fragments, and the Graphics Pipeline

To put Cg into its proper context, you need to understand how GPUs render images. This
section explains how graphics hardware is evolving and then explores the modern graphics
hardware-rendering pipeline.

The Evolution of Computer Graphics Hardware

Computer graphics hardware is advancing at incredible rates. Three forces are driving this rapid
pace of innovation, as shown in Figure 1-2. First, the semiconductor industry has committed
itself to doubling the number of transistors (the basic unit of computer hardware) that fit on a
microchip every 18 months. This constant redoubling of computer power, historically known as
Moore's Law, means cheaper and faster computer hardware, and is the norm for our age.

Figure 1-2. Forces Driving Graphics Hardware

Innovation

The second force is the vast amount of computation required to simulate the world around us.
Our eyes consume and our brains comprehend images of our 3D world at an astounding rate
and with startling acuity. We are unlikely ever to reach a point where computer graphics
becomes a substitute for reality. Reality is just too real. Undaunted, computer graphics
practitioners continue to rise to the challenge. Fortunately, generating images is an
embarrassingly parallel problem. What we mean by "embarrassingly parallel" is that graphics
hardware designers can repeatedly split up the problem of creating realistic images into more
chunks of work that are smaller and easier to tackle. Then hardware engineers can arrange, in
parallel, the ever-greater number of transistors available to execute all these various chunks of
work.
Our third force is the sustained desire we all have to be stimulated and entertained visually.
This is the force that "connects" the source of our continued redoubling of computer hardware
resources to the task of approximating visual reality ever more realistically than before.

As Figure 1-2 illustrates, these insights let us confidently predict that computer graphics
hardware is going to get much faster. These innovations whet our collective appetite for more
interactive and compelling 3D experiences. Satisfying this demand is what motivated the
development of the Cg language.

Animation

Movement in Time

Animation is the result of an action that happens over time-for example, an object that
pulsates, a light that fades, or a character that runs. Your application can create these types of
animation using vertex programs written in Cg. The source of the animation is one or more
program parameters that vary with the passing of time in your application.

To create animated rendering, your application must keep track of time at a level above Cg and
even above OpenGL or Direct3D. Applications typically represent time with a global variable
that is regularly incremented as your application's sense of time advances. Applications then
update other variables as a function of time.

You could compute animation updates on the CPU and pass the animated data to the GPU.
However, a more efficient approach is to perform as much of the animation computation as
possible on the GPU with a vertex program, rather than require the CPU to do all the number-
crunching. Offloading animation work from the CPU can help balance the CPU and GPU
resources and free up the CPU for more involved computations, such as collision detection,
artificial intelligence, and game play.

A Pulsating Object

In this first example, you will learn how to make an object deform periodically so that it appears
to bulge. The goal is to take a time parameter as input and then modify the vertex positions of
the object geometry based on the time. More specifically, you need to displace the surface
position in the direction of the surface normal, as shown in Figure 6-1.

By varying the magnitude of the displacement over time, you create a bulging or pulsing effect.
Figure 6-2 shows renderings of this effect as it is applied to a character. The pulsating
animation takes place within a vertex program.

The Vertex Program

Example 6-1 shows the complete source code for the C6E1v_bulge vertex program, which is
intended to be used with the C2E2f_passthrough fragment program from Chapter 2. Only the
vertex position and normal are really needed for the bulging effect. However, lighting makes
the effect look more interesting, so we have included material and light information as well. A
helper function called computeLighting calculates just the diffuse and specular lighting (the
specular material is assumed to be white for simplicity).
Figure 6-1. Making an Object Bulge

Figure 6-2. A Pulsating Alien

float3 computeLighting ( float3 lightPosition,

float3 lightColor,
float3 Kd,
float shininess,
float3 P,
float3 N,
float3 eyePosition)

{
// Compute the diffuse lighting
float3 L = normalize(lightPosition - P);
float diffuseLight = max(dot(N, L), 0);
float3 diffuseResult = Kd * lightColor * diffuseLight;

// Compute the specular lighting

float3 V = normalize(eyePosition - P);
float3 H = normalize(L + V);
float3 specularLight = lightColor * pow(max(dot(N, H), 0),
shininess);
if (diffuseLight <= 0) specularLight = 0;
float3 specularResult = lightColor * specularLight;
return diffuseResult + specularResult;
}

void C6E1v_bulge(float4 position : POSITION,

float3 normal : NORMAL,

out float4 oPosition : POSITION,

out float4 color : COLOR,

uniform float4x4 modelViewProj,

uniform float time,
uniform float frequency,
uniform float scaleFactor,
uniform float3 Kd,
uniform float shininess,
uniform float3 eyePosition,
uniform float3 lightPosition,
uniform float3 lightColor)

{
float displacement = scaleFactor * 0.5 *
sin(position.y * frequency * time) + 1;
float4 displacementDirection = float4(normal.x, normal.y,
normal.z, 0);
float4 newPosition = position +
displacement * displacementDirection;
oPosition = mul(modelViewProj, newPosition);
color.xyz = computeLighting(lightPosition, lightColor,
Kd, shininess, newPosition.xyz, normal, eyePosition);
color.w = 1;
}
Example 6-1. The C6E1v_bulge Vertex Program

Displacement Calculation - Creating a Time-Based Function

The idea here is to calculate a quantity called displacement that moves the vertex position up or
down in the direction of the surface normal. To animate the program's effect, displacement has
to change over time. You can choose any function you like for this. For example, you could pick
something like this:

float displacement = time;

Of course, this behavior doesn't make a lot of sense, because displacement would always
increase, causing the object to get larger and larger endlessly over time. Instead, we want a
pulsating effect in which the object oscillates between bulging larger and returning to its normal
shape. The sine function provides such a smoothly oscillating behavior.

A useful property of the sine function is that its result is always between -1 and 1. In some
cases, such as in this example, you don't want any negative numbers, so you can scale and
bias the results into a more convenient range, such as from 0 to 1:

float displacement = 0.5 * (sin(time) + 1);

Did you know that the sin function is just as efficient as addition or multiplication in the CineFX
architecture? In fact, the cos function, which calculates the cosine function, is equally fast. Take
advantage of these features to add visual complexity to your programs without slowing down
their execution.

Adding Controls to the Program

To allow finer control of your program, you can add a uniform parameter that controls the
frequency of the sine wave. Folding this uniform parameter, frequency, into the displacement
equation gives:
float displacement = 0.5 * (sin(frequency * time) + 1);

You may also want to control the amplitude of the bulging, so it's useful to have a uniform
parameter for that as well. Throwing that factor into the mix, here's what we get:

float displacement = scaleFactor * 0.5 *

(sin(frequency * time) + 1);

As it is now, this equation produces the same amount of protrusion all over the model. You
might use it to show a character catching his breath after a long chase. To do this, you would
apply the program to the character's chest. Alternatively, you could provide additional uniform
parameters to indicate how rapidly the character is breathing, so that over time, the breathing
could return to normal. These animation effects are inexpensive to implement in a game, and
they help to immerse players in the game's universe.

Varying the Magnitude of Bulging

But what if you want the magnitude of bulging to vary at different locations on the model? To
do this, you have to add a dependency on a per-vertex varying parameter. One idea might be
to pass in scaleFactor as a varying parameter, rather than as a uniform parameter. Here we
show you an even easier way to add some variation to the pulsing, based on the vertex
position:

float displacement = scaleFactor * 0.5 *

sin(position.y * frequency * time) + 1;

This code uses the y coordinate of the position to vary the bulging, but you could use a
combination of coordinates, if you prefer. It all depends on the type of effect you are after.

Updating the Vertex Position

In our example, the displacement scales the object-space surface normal. Then, by adding the
result to the object-space vertex position, you get a displaced object-space vertex position:

float4 displacementDirection = float4(normal.x, normal.y,

normal.z, 0);

float4 newPosition = position +

displacement * displacementDirection

Precompute Uniform Parameters When Possible

The preceding example demonstrates an important point. Take another look at this line of code
from Example 6-1:

float displacement = scaleFactor * 0.5 * sin(position.y * frequency * time) + 1;

If you were to use this equation for the displacement, all the terms would be the same for each
vertex, because they all depend only on uniform parameters. This means that you would be
computing this displacement on the GPU for each vertex, when in fact you could simply
calculate the displacement on the CPU just once for the entire mesh and pass the displacement
as a uniform parameter. However, when the vertex position is part of the displacement
equation, the sine function must be evaluated for each vertex. And as you might expect, if the
value of the displacement varies for every vertex like this, such a per-vertex computation can
be performed far more efficiently on the GPU than on the CPU.

If a computed value is a constant value for an entire object, optimize your program by
precomputing that value on a per-object basis with the CPU. Then pass the precomputed value
to your Cg program as a uniform parameter. This approach is more efficient than recomputing
the value for every fragment or vertex processed.

Particle Systems

Sometimes, instead of animating vertices in a mesh, you want to treat each vertex as a small
object, or particle. A collection of particles that behave according to specific rules is known as a
particle system. This example implements a simple particle system in a vertex program. For
now, focus on how the system works; don't worry about its simplistic appearance. At the end of
this section, we will mention one easy method to enhance your particle system's appearance.
Figure 6-3 shows the particle system example progressing in time.

Figure 6-3. A Particle System

The example particle system behaves according to a simple vector kinematic equation from
physics. The equation gives the x, y, and z positions of each particle for any time. The basic
equation from which you will start is shown in Equation 6-1:

P final = P initial + vt + 1/2 at²

Equation 6-1. Particle Trajectory

where:

z P final is the particle's final position,

z P initial is the particle's initial position,
z v is the particle's initial velocity,
z a is the particle's acceleration, and
z t is the time taken.

The equation models the trajectory of a particle set in initial motion and under the influence of
gravity, but not otherwise interacting with other particles. This equation gives the position of a
particle for any value of time, assuming that you provide its initial position, initial velocity, and
constant acceleration, such as gravity.

Initial Conditions

The application must supply the initial position and initial velocity of each particle as varying
parameters. These two parameter values are known as initial conditions because they describe
the particle at the beginning of the simulation.

In this particular simulation, the acceleration due to gravity is the same for every particle.
Therefore, gravity is a uniform parameter.

To make the simulation more accurate, you could factor in effects such as drag and even spin-
we leave that as an exercise for you.
Vectorized Computations

Modern GPUs have powerful vector-processing capabilities, particularly for addition and
multiplication; they are well suited for processing vectors with up to four components.
Therefore, it is often just as efficient to work with such vector quantities as it is to work with
scalar (single-component) quantities.

Equation 6-1 is a vector equation because the initial position, initial velocity, constant
acceleration, and computed position are all three-component vectors. By implementing the
particle system equation as a vector expression when writing the Cg vertex program, you help
the compiler translate your program to a form that executes efficiently on your GPU.

Vectorize your calculations whenever possible, to take full advantage of the GPU's powerful
vector-processing capabilities.

The Particle System Parameters

Table 6-1 lists the variables used by the vertex program presented in the next section. Each
variable is a parameter to the vertex program, except for the relative time (t) and final position
(pFinal), which are calculated inside the vertex program. Note that the y component of the
acceleration is negative-because gravity acts downward, in the negative y direction. The
constant 9.8 meters per second squared is the acceleration of gravity on Earth. The initial
position, initial velocity, and uniform acceleration are object-space vectors.

Variable Type Description Source (Type)

Application
pInitial float3 Initial position
(Varying)
Application
vInitial float3 Initial velocity
(Varying)

tInitial float3 Time at which particle was Application

created (Varying)
Application
acceleration float3 Acceleration (0.0, -9.8, 0.0)
(Uniform)
Application
globalTime float Global time
(Uniform)
pFinal float3 Current position Internal
t float Relative time Internal
Table 6-1. Variables in the Particle Equation

void C6E2v_particle( float4 pInitial : POSITION,

float4 vInitial : TEXCOORD0,
float tInitial : TEXCOORD1,
out float4 oPosition : POSITION,
out float4 color : TEXCOORD0,
out float pointSize : PSIZE,
uniform float globalTime,
uniform float4 acceleration,
uniform float4x4 modelViewProj)
{
float t = globalTime - tInitial;
float4 pFinal = pInitial +
vInitial * t +
0.5 * acceleration * t * t;

oPosition = mul(modelViewProj, pFinal);

color = float4(t, t, t, 1);

pointSize = -8.0 * t * t +
8.0 * t +
0.1 * pFinal.y + 1;

}
Example 6-2. The C6E2v_particle Vertex Program

The Vertex Program

Example 6-2 shows the source code for the C6E2v_particle vertex program. This program is
meant to work in conjunction with the C2E2f_passthrough fragment program.

Computing the Particle Positions

In this program, the application keeps track of a "global time" and passes it to the vertex
program as the uniform parameter globalTime. The global time starts at zero when the
application initializes and is continuously incremented. As each particle is created, the particle's
time of creation is passed to the vertex program as the varying parameter tInitial. To find
out how long a particle has been active, you simply have to subtract tInitial from
globalTime:

float t = globalTime - tInitial;

Now you can plug t into Equation 6-1 to find the particle's current position:

float4 pInitial +
pFinal = vInitial * t +
0.5 * acceleration * t * t;

This position is in object space, so it needs to be transformed into clip space, as usual:

oPosition = mul(modelViewProj, pFinal);

Computing the Particle Color

In this example, time controls the particle color:

color = float4(t, t, t, 1);

This is a simple idea, but it produces an interesting visual variation. The color increases with
time linearly. Note that colors saturate to pure white (1, 1, 1, 1). You can try your own
alternatives, such as varying the color based on the particle's position, or varying the color
based on a combination of position and time.

Computing the Particle Size

C6E2v_particle uses a new vertex program output semantic called PSIZE. When you render a
point to the screen, an output parameter with this semantic specifies the width (and height) of
the point in pixels. This gives your vertex program programmatic control of the point size used
by the rasterizer.

The point size of each particle varies as time passes. The particles start out small, increase in
size, and then gradually shrink. This variation adds to the fireworks-like effect. As an extra
touch, we added a slight dependence on the particles' height, so that they get a little larger on
their way up. To accomplish all this, we use the following function for the point size:

pointSize -8.0 * t * t +
= 8.0 * t +
0.1 * pFinal.y + 1;

Figure 6-4. A Point Size Function

Figure 6-4 shows what the function looks like.

This function is nothing special-we merely created the formula to achieve the effect that we
wanted. In other words, the formula does not have any real physical meaning, aside from
attempting to mimic the effect we had in mind.

Dressing Up Your Particle System

Although the C6E2v_particle program produces interesting particle motion, the particles
themselves do not look very appealing-they are just solid-colored squares of different sizes.

However, you can improve the particle appearance by using point sprites. With point sprites,
the hardware takes each rendered point and, instead of drawing it as a single vertex, draws it
as a square made up of four vertices, as shown in Figure 6-5. Point sprites are automatically
assigned texture coordinates for each corner vertex. This allows you to alter the appearance of
the particles from a square to any texture image you want.
Figure 6-5. Converting Points to Point Sprites

By rendering the points as point sprites, you can use the assigned texture coordinates to
sample a texture that supplies the shape and appearance of each point vertex, instead of
simply rendering each point vertex as a square point. Point sprites can create the impression of
added geometric complexity without actually drawing extra triangles. Figure 6-6 shows a more
visually interesting example of a particle system, using point sprites. Both OpenGL and Direct3D
have standard interfaces for rendering point sprites.

Figure 6-6. A Particle System with Point Sprites

Key-Frame Interpolation

3D games often use a sequence of key frames to represent an animated human or creature in
various poses. For example, a creature may have animation sequences for standing, running,
kneeling, ducking, attacking, and dying. Artists call each particular pose that they create for a
given 3D model a key frame.

Key-Framing Background

The term key frame comes from cartoon animation. To produce a cartoon, an artist first quickly
sketches a rough sequence of frames for animating a character. Rather than draw every frame
required for the final animation, the artist draws only the important, or "key," frames. Later,
the artist goes back and fills in the missing frames. These in-between frames are then easier to
draw, because the prior and subsequent key frames serve as before-and-after references.

Computer animators use a similar technique. A 3D artist makes a key frame for each pose of an
animated character. Even a standing character may require a sequence of key frames that show
the character shifting weight from one foot to the other. Every key frame for a model must use
the exact same number of vertices, and every key frame must share the same vertex
connectivity. A vertex used in a given key frame corresponds to the same point on the model in
every other key frame of the model. The entire animation sequence maintains this
correspondence. However, the position of a particular vertex may change from frame to frame,
due to the model's animation.
Given such a key-framed model, a game animates the model by picking two key frames and
then blending together each corresponding pair of vertex positions. The blend is a weighted
average in which the sum of the weights equals 100 percent. Figure 6-7 shows an alien
character with several key frames. The figure includes two key frames, marked A and B, to be
blended into an intermediate pose by a Cg program.

An application can use a Cg vertex program to blend the two vertices together. This blending
may include further operations to illuminate the blended character appropriately for more
realism. Usually, an application specifies a single position for each vertex, but for key-frame
blending, each vertex has two positions, which are combined with a uniform weighting factor.

Key-frame interpolation assumes that the number and order of vertices are the same in all the
key frames for a given model. This assumption ensures that the vertex program is always
blending the correct pairs of vertices. The following Cg code fragment blends key-frame
positions:

blendedPosition = (1 - weight) * keyFrameA + weight * keyFrameB; 157

The keyFrameA and keyFrameB variables contain the (x, y, z) positions of the vertex being
processed at key frames A and B, respectively. Note that weight and (1 - weight) sum to 1.
If weight is 0.53, the Cg program adds 47 percent (1.0 - 0.53) of the position of key frame A to
53 percent of the position of key frame B. Figure 6-8 shows an example of this type of
animation.

Figure 6-7. Key Frames for an Alien

Figure 6-8. An Example of Key-Frame Blending

To maintain the appearance of continuous animation, the key-frame weight increases with each
rendered frame until it reaches 1.0 (100 percent). At this point, the existing key frame B
becomes the new key frame A, the weight resets to 0, and the game selects a new key frame B
to continue the animation. Animation sequences, such as walking, may loop repeatedly over the
set of key frames that define the character's walking motion. The game can switch from a
walking animation to a running animation just by changing over to the sequence of key frames
that define the running animation.

It is up to the game engine to use the key frames to generate convincing animated motion.
Many existing 3D games use this style of key-frame animation. When an application uses a Cg
vertex program to perform the key-frame blending operations, the CPU can spend time
improving the gameplay rather than continuously blending key frames. By using a Cg vertex
program, the GPU takes over the task of key-frame blending.

Interpolation Approaches

There are many types of interpolation. Two common forms for key-frame interpolation are
linear interpolation and quadratic interpolation.

Linear Interpolation

With linear interpolation, the transition between positions happens at a constant rate. Equation
6-2 shows the definition of linear interpolation:

blendedPosition = positionA x (1-f) + positionB x f

Equation 6-2. Linear Interpolation

As f varies from 0 to 1 in this equation, the intermediate position varies between positionA and
positionB. When f is equal to 0, the intermediate position is exactly positionA, the starting
position. When f is equal to 1, the intermediate position is positionB, the ending position. Once
again, you can use Cg's lerp function to accomplish the interpolation.

Using lerp, the interpolation between two positions can be written concisely as:

intermediatePosition = lerp(positionA, positionB, f);

Quadratic Interpolation

Linear interpolation is good for many situations, but sometimes you want the rate of transition
to change over time. For example, you might want the transition from positionA to positionB to
start out slowly and get faster as time passes. For this, you might use quadratic interpolation,
as in the following code fragment:

intermediatePosition = position1 * (1 - f * f) + position2 * f * f

Other functions that you might use are step functions, spline functions, and exponential
functions. Figure 6-9 shows several common types of interpolation functions.

Basic Key-Frame Interpolation

Example 6-3 shows the C6E3v_keyFrame vertex program. This program performs the object-
space blending of two positions, each from a different key frame. The lerp Standard Library
function linearly interpolates the two positions, and then the program transforms the blended
position into clip space. The program passes through a texture coordinate set and a color.

As indicated by the input semantics for positionA and positionB, the application is
responsible for configuring key frame A's position as the conventional position (POSITION) and
key frame B's position as texture coordinate set 1 (TEXCOORD1).

void C6E3v_keyFrame( float3 positionA : POSITION,

float3 positionB : TEXCOORD1,
float4 color : COLOR,
float2 texCoord : TEXCOORD0,

out float4 oPosition : POSITION,

out float2 oTexCoord : TEXCOORD0,
out float4 oColor : COLOR,

uniform float keyFrameBlend,

uniform float4x4 modelViewProj)
{
float3 position = lerp(positionA, positionB,
keyFrameBlend);
oPosition = mul(modelViewProj, float4(position, 1));
oTexCoord = texCoord;
oColor = color;
}

Example 6-3. The C6E3v_keyFrame Vertex Program

Figure 6-9. Various Interpolation Functions

The application is also responsible for determining the key-frame blending factor via the
uniform parameter keyFrameBlend. The value of keyFrameBlend should transition from 0 to 1.
Once 1 is reached, the application chooses another key frame in the animation sequence, the
old key frame B position input is then configured as the key frame A position input, and the new
key-frame position data feeds the key frame B position input.

Key-Frame Interpolation with Lighting

You often want to light a key-framed model. This involves not merely blending two positions
(the vertex in two different key frames), but also blending the two corresponding surface
normals. Then you can calculate lighting computations with the blended normal. Blending two
normals may change the length of the resulting normal, so you must normalize the blended
normal prior to lighting.

Example 6-4 shows the C6E4v_litKeyFrame vertex program that adds per-vertex

struct Light {
float3 eyePosition; // In object space
float3 lightPosition; // In object space
float4 lightColor;
float specularExponent;
float ambient;
};

float4 computeLighting( float4 computeLighting(Light light,

float3 position, // In object space
float3 normal) // In object space

{
float3 lightDirection = light.lightPosition - position;
float3 lightDirNorm = normalize(lightDirection);
float3 eyeDirection = light.eyePosition - position;
float3 eyeDirNorm = normalize(eyeDirection);
float3 halfAngle = normalize(lightDirNorm + eyeDirNorm);
float diffuse = max(0, dot(lightDirNorm, normal));
float specular = pow(max(0, dot(halfAngle, normal)),
light.specularExponent);
return light.lightColor * (light.ambient +
diffuse + specular);
}
void C6E4v_litKeyFrame(float3 positionA : POSITION,
float3 normalA : NORMAL,
float3 positionB : TEXCOORD1,
float3 normalB : TEXCOORD2,
float2 texCoord: TEXCOORD0,

out float4 oPosition : POSITION,

out float2 oTexCoord : TEXCOORD0,
out float4 color : COLOR,

uniform float keyFrameBlend,

uniform Light light,
uniform float4x4 modelViewProj)
{
float3 position = lerp(positionA, positionB,
keyFrameBlend);
float3 blendNormal = lerp(normalA, normalB,
keyFrameBlend);
float3 normal = normalize(blendNormal);
oPosition = mul(modelViewProj, float4(position, 1));
oTexCoord = texCoord;
color = computeLighting(light, position, normal);
}
Example 6-4. The C6E4v_litKeyFrame Vertex Program

lighting to the C6E3v_keyFrame example. In the updated example, each key frame also supplies
its own corresponding per-vertex surface normal.

The computeLighting internal function computes a conventional lighting model using object-
space lighting.

Vertex Skinning

Another approach to animating characters is vertex skinning. Many 3D modeling packages

author 3D content suitable for vertex skinning. The technique is also known as matrix palette
blending.

The Theory of Vertex Skinning

Rather than have key frames for each pose of a character, vertex skinning maintains a single
default pose and a large set of matrices that appropriately rotate and translate various
subregions of the default pose's polygonal mesh. For reasons that will become apparent, these
various matrix transforms are often called "bones."

One or more of these matrices control each vertex in the default pose's polygonal mesh. Each
matrix is assigned a weighting factor (from 0 to 100 percent), which indicates how much that
matrix affects each vertex. Only a small number of matrices usually control each vertex,
meaning that only these few matrices have positive and significant weighting factors for a given
vertex. We call this small set of matrices the bone set for each vertex. We assume that the
weighting factors for all the matrices in a vertex's bone set always sum to 100 percent.

When rendering this type of model, you first transform every vertex by each matrix in the
vertex's bone set, then weight the results of each matrix transform according to the matrix's
corresponding weighting factor, and finally sum the results. This new position is the skinned
vertex position.

When all the matrices are identity matrices (no rotation, no translation), the mesh is in the
default pose. 3D artists often pick a default pose in which the character is standing and facing
forward, with legs apart and arms outstretched.

Constructing Poses from Matrices

By controlling the matrices, you can create novel poses. For example, a vertex on a character's
forearm close to the elbow might use 67 percent of the forearm matrix, 21 percent of the elbow
matrix, and 12 percent of the upper arm matrix. The animator who creates a model for vertex
skinning must appropriately localize each matrix so that, for example, the matrix that controls
the left shoulder has no effect on vertices near the ankle. Often, the number of matrices
affecting any given vertex is limited to no more than four. For the 3D artist, once all the
weights and matrices are assigned to the model's default pose, constructing a new pose is a
matter of manipulating the matrices appropriately, rather than attempting to position each
individual vertex. Posing and animating the model is much simpler when it is authored for
vertex skinning.

For a character model, the most significant matrices represent the way rigid bones in the
character's body move and rotate; hence, the vertex-skinning matrices are called bones. The
vertices represent points on the skin. Vertex skinning simulates how bones, represented as
matrices, tug and reposition various points, represented as vertices, on the character's skin.

Lighting

For correct lighting, you can compute the same sort of transformed and weighted average used
for positions, except that you transform normals by the inverse transpose of each matrix rather
than by the matrix itself. Weighted normals may no longer be unit length, so normalization is
required.

Assuming that the bone matrices are merely rotations and translations simplifies the
transformation of the normals for lighting, because the inverse transpose of a matrix without
scaling or projection is the matrix itself.

Storage Requirements Compared with Key Frames

With the key frame approach, every pose requires a distinct set of vertex positions and
normals. This becomes unwieldy if huge numbers of poses are required.

However, with vertex skinning, each pose requires just the default pose-shared by all poses-
and the matrix values for the given pose. There are generally substantially fewer matrices per
character than vertices, so representing a pose as a set of bone matrices is more compact than
representing the pose with a key frame. With vertex skinning, you can also create novel poses
dynamically, either by blending existing bone matrices from different poses or by controlling
matrices directly. For example, if you know what matrices control an arm, you can wave the
arm by controlling those matrices.

In addition to requiring the matrices for each pose, the model's default pose needs each vertex
to have a default position, a default normal, some number of matrix indices to identify which
subset of matrices control the vertex, and the same number of weighting factors, corresponding
to each respective matrix.

This data for the default pose is constant for all other poses. Generating a new pose requires
only new matrices, not any changes to the default pose data. If the GPU can perform all the
vertex-skinning computations, this means that the CPU needs to update only the bone matrices
for each new pose, but not otherwise manipulate or access the default pose data.

Vertex skinning is quite amenable to storing and replaying motion-capture sequences. You can
represent each motion-capture frame as a set of bone matrices that you can then apply to
different models that share the same default pose and matrix associations. Inverse kinematics
solvers can also generate bone matrices procedurally. An inverse kinematics solver attempts to
find an incremental sequence of bone matrices that transition from one given pose to another
given pose in a realistic, natural manner.

Vertex Skinning in a Vertex Program

The C6E5v_skin4m vertex program in Example 6-5 implements vertex skinning, assuming that
no more than four bone matrices affect each vertex (a common assumption).

An array of 24 bone matrices, each a 3x4 matrix, represents each pose. The entire array is a
uniform parameter to the program. The program assumes that each bone matrix consists of a
translation and a rotation (no scaling or projection).

The per-vertex matrixIndex input vector provides a set of four bone-matrix indices for
accessing the boneMatrix array. The per-vertex weight input vector provides the four
weighting factors for each respective bone matrix. The program assumes that the weighting
factors for each vertex sum to 100 percent.

For performance reasons, the program treats boneMatrix as an array of float4 vectors rather
than an array of float3x4 matrices. The matrixIndex array contains floating-point values
instead of integers, and so the addressing of a single array of vectors is more efficient than
accessing an array of matrices. The implication of this is that the indices in the matrixIndex
vector should be three times the actual matrix index. So, the program assumes 0 is the first
matrix in the array, 3 is the second matrix, and so on. The indices are fixed for each vertex, so
you improve performance by moving this "multiply by 3" outside the vertex program.

A for loop, looping four times, transforms the default pose position and normal by each bone
matrix. Each result is weighted and summed.

The program computes both the weighted position and normal for the pose. The same
computeLighting internal function from Example 6-4 computes per-vertex object-space
lighting with the weighted position and normal.

Although this example is rather limited, you could generalize it to handle more bone matrices,
general bone matrices (for example, allowing scaling), and matrices influencing each vertex-
and to compute a better lighting model.

void C6E5v_skin4m(float3 position :

POSITION,
float3 normal :
NORMAL,
float2 texCoord :
TEXCOORD0,
float4 weight :
TEXCOORD1,
float4 matrixIndex :
TEXCOORD2,

out float4 oPosition : POSITION,

out float2 oTexCoord :
TEXCOORD0,
out float4 color : COLOR,

uniform Light light,

uniform float4 boneMatrix[72], // 24
matrices
uniform float4x4 modelViewProj)
{
float3 netPosition = 0, netNormal = 0;

for (int i = 0; i < 4; i++) {

float index = matrixIndex[i];
float3x4 model = float3x4(boneMatrix
[index + 0],
boneMatrix
[index + 1],
boneMatrix
[index + 2]);
float3 bonePosition = mul(model,
float4(position, 1));
// Assume no scaling in matrix, just
rotate & translate
float3x3 rotate = float3x3(model
[0].xyz,
model
[1].xyz,
model
[2].xyz);
float3 boneNormal = mul(rotate,
normal);
netPosition += weight[i] *
bonePosition;
netNormal += weight[i] * boneNormal;
}
netNormal = normalize(netNormal);

oPosition = mul(modelViewProj, float4

(netPosition, 1));
oTexCoord = texCoord;
color = computeLighting(light,
netPosition, netNormal);
}

Example 6-5. The C6E5v_skin4m Vertex

Program

The OpenGL Graphics System: A Specification documents the OpenGL 3D programming

interface. The best tutorial for learning OpenGL programming is the OpenGL Programming
Guide: The Official Guide to Learning OpenGL, Third Edition (Addison- Wesley, 1999), by Mason
Woo, Jackie Neider, Tom Davis, and Dave Shreiner. The http://www.opengl.org/ website serves
up much more information about OpenGL.

Documentation for the Direct3D programming interface is available from Microsoft's

msdn.microsoft.comWeb site. NVIDIA provides further information about the Cg runtime, CgFX,
and Cg itself on its Developer website at developer.nvidia.com/Cg.

If you are interested in the physics behind the particle system you created, you can learn more
by reviewing kinematics in any high school or college physics textbook.

Jeff Lander wrote a series of articles in 1998 and 1999 for Game Developer Magazine about
various animation techniques. You can find these articles on the
http://www.gamasutra.com/features/20030325/www.darwin3d.com website. For particle
systems, read "The Ocean Spray in Your Face." For vertex skinning, check out "Skin Them
Bones: Game Programming for the Web Generation."

The original volume of Game Programming Gems (Charles River Media, 2000), edited by Mark
DeLoura, contains several gems related to key-frame animation and vertex skinning. Check out
these articles: "Interpolated 3D Keyframe Animation," by Herbert Marselas; "A Fast and Simple
Skinning Technique," by Torgeir Hagland; and "Filling the Gaps-Advanced Animation Using
Stitching and Skinning," by Ryan Woodland.

John Vince's book 3-D Computer Animation (Addison-Wesley, 1992) covers many of the
techniques described in this chapter, as well as others, such as free-form deformation (FFD).

DirectX 8 added point sprites to Direct3D. OpenGL implementations from multiple hardware
vendors support the NV_point_sprite extension. The specification for this OpenGL extension is
available at the http://www.opengl.org/ website.

Copyright © 2003 CMP Media Inc. All rights reserved.

Gama Network Presents:

GDC 2003: Building an AI Sensory System:

Examining The Design of Thief: The Dark Project
By Tom Leonard
Gamasutra
March 7, 2003

URL: http://www.gamasutra.com/gdc2003/features/20030307/leonard_01.htm

The term "senses" in game development is a useful metaphor for understanding, designing, and
discussing that part of the AI that gathers information about items of interest in the simulated
environment of the game. Non-player characters visually presented as humans, animals, or
creatures with eyes and ears in a realistic three-dimensional space lend themselves well to the
metaphor.

This engineering metaphor is not best applied too literally. In spite of the seemingly physical
nature of the AIs in the game world, the analogy of game AI senses is not a physiological or
neurological one. The line between "sense" and "knowledge" in a game is a blurry one. Sense
incorporates the idea of awareness of another entity in the game, includes elements of value
and knowledge, and can have game-relevant logic wired directly in.

A game sensory system must be designed in a way that is subservient to the game design and
efficient in implementation. The senses need only be as sophisticated as is needed to be
entertaining and robust. The result of their work must be perceivable and understandable by
the player. Few game designs require AIs with a sense of taste, touch, or smell; thus senses
primarily are concerned with vision or hearing. Used wisely, senses can be an invaluable tool to
make simple state machines more interesting by providing them with a broad range of
environmental input.

This paper describes an approach to designing and implementing a high-fidelity sensory system
for a stealth-oriented first-person AI system. The techniques described are derived from
experience constructing the AI for Thief: The Dark Project, as well as familiarity with the code of
Half-Life. Initially, the basic concepts of AI senses are laid out using Half-Life as a motivating
example. The paper then examines the more stringent sensory requirements of a stealth game
design. Finally, the sensory system built for Thief is described.

An Introductory Example: Half-Life

Half-Life is not a game that centers on stealth and senses. With a strong tactical combat
element, however, it does require a reasonable sensory system. This makes it a perfect case to
explore the basics of AI sensory systems. AIs in Half-Life have sight and hearing, a system for
managing information about sensed entities, and present interesting examples of leveraging
basic senses into appealing behaviors.

In a simple sensory system, AIs periodically "look" at and "listen" to the world. Unlike real
vision and hearing where stimuli arrive at the senses whether desired or not, these are active
events. The AI examines the world based on its interest, and decides according to a set of rules
that it sees or hears another element in the game. These probes are designed to emulate real
senses while limiting the amount of work done. A greater amount of resources is dedicated to
the things that are important for the game mechanics.
For example, in Half-Life the core sensory logic that is run periodically is:

If I am close to player then...

Begin look
--Gather a list of entities within a specified distance
--For each entity found...
----If I want to look for them and
----If they are in my viewcone and
----If I can raycast from my eyes to their eyes then...
------If they are the player and
------If I have been told to not see the player until they see me and
------If they do not see me
--------End look
------Else
--------Set various signals depending on my relationship with the seen
--------entity
End look

Begin listen
--For each sound being played...
----If the sound is carrying to my ears...
------Add the sound to a list of heard sounds
------If the sound is a real sound...
--------Set a signal indicating heard something
------If the sound is a "smell" pseudo-sound
--------Set a signal indicating smelled something
End listen

The first concept illustrated by this pseudo-code is that the senses are closely tied to the
properties of the AI, its relationship with the subject, and the relevance of the AI to the player's
experience. This is in part motivated by optimization concerns, but made available by game
mechanics. In the Half-Life game design an AI that is not near the player is not relevant and
need not sense the world. Even when near the player, the AI needs only to look at things that
are known to produce reactions of fear or hatred later.

The logic also demonstrates the basic construction of vision as a view distance, a view cone,
line-of-sight, and eye position (Figure 1). Each AI has a length-limited two-dimensional field of
view within which it will cast rays to interesting objects. Unblocked ray casts indicate visibility.
Figure 1

There are two important things to note. First, the operations of sensing are ordered from least
expensive to most expensive. Second, for player satisfaction, vision is a game of peek-a-boo. In
a first-person game, the player's sense of body is weak, and the player seen by an opponent
they do not see often feels cheated.

Most interesting is the snippet that restrains the AI's ability to see the player until seen by the
player, which is purely for coordinating the player's entertainment. This is an example of how
higher-level game goals can be simply and elegantly achieved by simple techniques in lower
level systems.

The logic for hearing is much simpler than vision. The basic element of a hearing component is
the definition and tuning of what it means for a sound to carry to the AI audibly. In the case of
Half-Life, hearing is a straightforward heuristic of the volume of the sound multiplied by a
"hearing sensitivity" yielding a distance within which the AI hears the sound. More interesting is
the demonstration of the utility of hearing as a catchall for general world information gathering.
In this example, the AI "hears" pseudo-sounds, fictional smells emanating from nearby corpses.

Senses as Gameplay Focus: Thief

Thief: The Dark Project and its successors present a lightly scripted game world where the
central game mechanic, stealth, challenges the traditional form of the first-person 3D game.
The Thief player moves slowly, avoids conflict, is penalized for killing people, and is entirely
mortal. The gameplay centers on the ebb and flow AI sensory knowledge of the player as they
move through the game space. The player is expected to move through areas populated with
stationary, pacing, and patrolling AIs without being detected, creeping among shadows and
careful not to make alerting sounds. Though the game AI's senses are built on the same core
concepts as those of Half-Life, the mechanics of skulking, evading, and surprising require a
more sophisticated sensory system.

The primary requirement was creating a highly tunable sensory system that operated within a
wide spectrum of states. On the surface, stealth gameplay is about fictional themes of hiding,
evasion, surprise, quiet, light and dark. One of the things that makes that kind of experience
fun is broadening out the gray zone of safety and danger that in most first-person games is
razor thin. It's about getting the payer's heart pounding by holding them on the cusp of either
state, then letting loose once the zone is crossed. This demanded "broad-spectrum" senses that
didn't tend to polarize rapidly to the extremes of "player sensed" and "player not sensed."

A secondary requirement was that the sense system be active much more frequently and
operating on more objects than is typical of a first-person shooter. During the course of the
game, the player can alter the state of the world in ways that the AIs are supposed to take
notice of, even when the player is not around. These things, like body hiding, require reliable
sensing. Together with the first requirement, these created an interesting challenge when
weighed against the perennial requirement for game developers: performance.

Finally, it was necessary that both players and designers understand the inputs and outputs of
the sensory system, and that the outputs match learned expectations based on the inputs. This
suggested a solution with a limited number of player-perceivable inputs, and discrete valued
results.

Expanding the Senses

At heart, the sensory system described here is very similar to that found in Half-Life. It is a
viewcone and raycast based vision system and simple hearing system with hooks to support
optimization, game mechanics, and pseudo-sensory data. Like the Half-Life example, most of
the sense gathering is decoupled from the decision process that acts on that information. This
system expands some of these core ideas, and introduces a few new ones.

Figure 2, Basic components and relationships

The design of the system and the flow of data through it are derived from its definition as an
information gathering system that is customizable and tunable, but stable and intelligible in its
output.

In this system, AI senses are framed in terms of "awareness." Awareness is expressed as a

range of discrete states that represent an AI's certainty about the presence, location, and
identity of an object of interest. These discrete states are the only representation of the
internals of the system exposed to the designer, and are correlated by the higher-level AI to an
alertness state. In Thief's AI, the range of alertness states is similar to awareness states. The
alertness state of the AI is fed back into the sensory system in various ways to alter the
behavior of the system.

Awareness is stored in sense links that associate either a given AI to another entity in the
game, or to a position in space. These relations store game relevant details of the sensing
(time, location, line-of-sight, etc.), as well as cached values used to reduce calculations from
think cycle to think cycle. Sense links are, in effect, the primary memory of the AI. Through
verbalization and observation sense links can be propagated among peer AIs, with controls in
place to constrain knowledge cascades across a level. They may also be manipulated by game
logic after base processing.

Figure 3, Sense Links

Each object of interest in the game has an intrinsic visibility value independent of any viewer.
Depending on the state of the game and the nature of the object the level of detail of this value
and the frequency of update are scaled in order to keep the amount of processor time spent
deriving the value within budgets.

Visibility is defined as the lighting, movement, and exposure (size, separation from other
objects) of the entity. The meaning of these is closely tied to the game requirements. For
example, the lighting of the player is biased towards the lighting near the floor below the
player, as this provides the player with an objective, perceivable way to anticipate their own
safety. These values and their aggregate sum visibility are stored as 0..1 analog values.

Viewcones

Rather than having a single two-dimensional field of view, the Thief senses implement a set of
ordered three-dimensional viewcones described as an XY angle, a Z angle, a length, a set of
parameters describing both general acuity and sensitivity to types of stimuli (e.g., motion
versus light), and relevance given the alertness of the AI. The viewcones are oriented according
to the direction an AI's head is facing.

At any time for a given object being sensed, only the first view cone the object is in is
considered in sense calculations. For simplicity and gameplay tunability, each viewcone is
presumed to produce a constant output regardless of where in the viewcone the subject is
positioned.
For example, the AI represented in Figure 4 has five viewcones. An object at point A will be
evaluated using viewcone number 3. The viewcone used for calculating the vision sense
awareness for an entity at either point B and point C is viewcone number 1, where identical
visibility values for an object will yield the same result.

Figure 4, Viewcones,
Top-view

When probing interesting objects in the world, the senses first determine which viewcone, if
any, applies to the subject. The intrinsic visibility is then passed through a "look" heuristic along
with the viewcone to output a discrete awareness value.

The motivation for multiple viewcones is to enable the expression of such things as direct
vision, peripheral vision, or a distinction between objects directly forward and on the same Z
plane as opposed to forward but above and below. Cone number 5 in the diagram above is a
good example of leveraging the low-level to express a high level concept. This "false vision"
cone is configured to look backwards and configured to be sensitive to motion, giving the AI a
"spidey-sense" of being followed too closely even if the player is silent.

Information Pipeline

The sense management system is designed as a series of components each taking a limited and
well-defined set of data and outputting an even more limited value. Each stage is intended to
be independently scalable in terms of the processing demands based on relevance to game
play. In terms of performance, these multiple scalable layers can be made to be extremely
efficient.
Figure 5, Information Pipeline

The core sensory system implements heuristics for accepting visibility, sound events, current
awareness links, designer and programmer configuration data, and current AI state, and
outputting a single awareness value for each object of interest. These heuristics are considered
a black box tuned by the AI programmer continually as the game develops.

Vision is implemented by filtering the visibility value of an object through the appropriate
viewcone, modifying the result based on the properties of the individual AI. In mundane cases a
simple raycast for line-of-sight is used. In more interesting cases, like the player, multiple
raycasts occur to include the spatial relation of the AI to the subject in the weighing of the
subject's exposure.

Thief has a sophisticated sound system wherein sounds both rendered and not rendered were
tagged with semantic data and propagated through the 3D geometry of the world. When a
sound "arrived" at an AI, it arrived from the directions it should in the real world, tagged with
attenuated awareness values, possibly carrying information from other AIs if it was a spoken
concept. These sounds join other awareness inducing things (like the Half-Life smell example)
as awareness relations to positions in space.

Awareness Pulses

Once the look and listen operations are complete, their awareness results are passed to a
method responsible for receiving periodic pulses from the raw senses, and resolving them into a
single awareness relationship, storing all the details in the associated sense link. Unlike the
analog data used in the pipeline to this point, the data in this process is entirely discrete. The
result of this process is to create, update, or expire sense links with the correct awareness
value.
This is a three-step process. First, the sound and vision input values are compared, one
declared dominant, and that becomes the value for awareness. The accessory data each
produces is then distilled together into a summary of the sense event.

Second, if the awareness pulse is an increase from previous readings, it is passed through a
time-based filter that controls whether the actual awareness will increase. The time delay is a
property only of the current state, not the goal state. This is how reaction delays and player
forgiveness factors are implemented. Once the time threshold is passed, the awareness
advances to the goal state without passing through intermediate states.

Finally, if the new pulse value is below current readings, a capacitor is used to allow awareness
to degrade gradually and smoothly. Awareness decreases across some amount of time, passing
through all the intermediate states. This softens the behavior of the AI once the object of
interest is no longer actively sensed, but is not the mechanism by which the core AI's alertness
is controlled.

If an object of interest is no longer generating pulses, the senses incorporate a degree of free
knowledge which is scaled based on the state of the AI. This mechanism produces the
appearance of deduction on the part of the AI when an object has left the field of view without
overtly demonstrating cheating to the player.

Conclusion

The system described here was designed for a single-player software rendered game. Because
of this, all authoritative information about game entities was available to it. Unfortunately, in a
game engine with a client/server architecture and a hardware-only renderer, this may not be
true. Determining the lit-ness field of an object's visibility may not be straightforward. Thus
incorporating such a system as described here is something to do deliberately and with care, as
it will place information demands on other systems.

Furthermore, although efficient in what it does, it is designed for a game that in many ways
centers around the system's output. In Thief it consumes a non-trivial amount of the AI's CPU
budget. This will take time away from pathing, tactical analysis, and other decision processes.

However, there are benefits to be had for any game to invest in their sensing code. By
gathering and filtering more information about the environment and serving it up in a well-
defined manner, senses can be leveraged to produce engaging AI behaviors without
significantly increasing the complexity of the decision state machines. A robust sense system
also provides a clean hook for expressing "pre-conscious" behaviors by controlling and
manipulating the core knowledge inputs. Finally, a multi-state sense system provides the player
with an AI opponent or ally that exhibits varied and subtle reactions and behaviors without
adding complexity to the core decision machines.

Further Exploration

Because of the highly data-driven nature of the Dark Engine on which Thief was built, most of
the concepts presented in this paper and all of the configuration details may be explored first-
hand using a copy of the tools available at http://www.thief-thecircle.com/

Copyright © 2003 CMP Media Inc. All rights reserved.

Gama Network Presents:

Giving Life to Ratchet & Clank: Enabling Complex

Character Animations by Streamlining Processes
By John Lally
Gamasutra
February 11, 2003

URL: http://www.gamasutra.com/features/20030211/lally_01.htm

At first, we were thrilled. As character animators, we couldn't have asked for a better project.
There were two heroes, dozens of enemies, scores of NPCs, and more than 100 character-
driven cutscenes. Enthusiasm and artistic latitude made it all ours for the taking.

But staying true to our shared vision of Ratchet & Clank meant that our digital actors needed to
become more than mere cycling automatons. We regarded each character as an intermediary
through which we could reach out to players and draw them deeper into our universe. This
meant our characters needed to blend physically into their environments, emotionally into their
situations, and expressively into our narrative. It was on these principles that we based both
our objectives and our standard of success.

Our team acknowledged that a rift existed between the level of complexity we desired and the
time we had scheduled to implement it. In order to surmount this obstacle, we developed
several methods for using Maya, our artistic skills, and our time more effectively.

This article will discuss these methods both in terms of their functionality and their
implementation. To this end, it will provide technical details on our testing practices, our MEL
shortcuts, and our real-time animation procedures.

Furthermore, it will explain how each of these methods saved us valuable production time,
enabling us to achieve our artistic goals.

Testing with Prototypes: Why and How

Part of achieving our goal of tying our characters closely to their

environments and gameplay meant prototyping low-resolution
versions of our characters and their respective animations. Like
coalmine canaries, we sent proto-models into our new levels to
nose out potential animation, programming, and design problems.
We relied on prototyping throughout the course of our production
as a means of refining a character's move set. This process of
refinement was key to winnowing down unworkable ideas before
animating a character's high-resolution incarnation.

As a rule, our prototypes emphasized function over style. And

although we set the aesthetic threshold low, these previsualization
models still needed to be built and animated accurately enough to
function as valid test cases. For the animators, this meant that
prototype characters needed to jump to their correct heights,
attack to their design specifications, and run at their proper speeds.

Generally, we created prototypes using a character design sketch

as a guide. These proto-characters were constructed with primitive
objects and only roughly resembled their future incarnations, as
you can see in Figure 1. Since previsualization models were so
simple to construct, every animator could assist in building them,
regardless of their modeling experience. Accuracy was required
only in the representation of the character's height, proportions,
and posture.

For the most part, our prototypes had extremely simple skeletons:
all geometric components were assigned to a

single bone with no special deformation. Though such simplicity

made for blocky-looking models, in practice our animators had all
the flexibility they needed to test out a move set.
Figure 1. The Dog
Animating our proto-characters was similar to sketching a Charger prototype
traditional pencil test. Although animators were given a designer- was used to pretest
approved move set, it was understood that animations needed only the final character's
to be rendered into their roughest forms. One pass was often animations, including
sufficient, as polish and overlap were unnecessary. its walk, run, and
attack.
The areas where precision did count were timing, measurement,
and interaction with other characters. As they have the greatest direct impact on gameplay,
these attributes were considered critical to testing a new character's behavior accurately.

Timing has a major effect on both the readability of an animation and on gameplay. From a
distance, a poorly timed idle can look muddy. An attack animation can be too slow to make an
enemy a worthy opponent, or too fast to be registered. Emphasis or a lack thereof on just a few
frames can make or break any animation, especially within the short cycles of the real-time
universe we were creating. We discovered that by testing and fine-tuning our timings in the
prototype stage, we could often avoid reworking polished animations on final characters.

Making sure that proto-characters adhered to design

measurements was also important. For example, if the design
document called for an enemy to

attack at a range of 4 meters, animators would ensure that the

prototype did exactly this. Designers could then get an accurate
idea of whether an enemy traveled at the correct speed, was tuned
to the appropriate difficulty, and was scaled appropriately in
relation to the main characters.
Figure 2. The final
Prototyping also gave us a means of pretesting character behaviors Dog Charger model.
and interactions. Whether it was with Ratchet or Clank, with the
environment, or with another character, proto-models provided invaluable early glances at
interactive behavior. For artists, programmers, and designers, previsualization served to
telegraph character behaviors both in terms of their technical feasibility and their gameplay
value.

Ultimately we found that our previsualization process was beneficial not just to animators but to
our design and programming staff as well. It gave our programmers a head start on coding
gameplay, while designers could test, tune, and ask for changes at a very early stage, allowing
room for refinements.
Prototyping saved animators time and energy that otherwise would have been spent
painstakingly modifying or redoing final multi-pass animations. It provided a relatively simple
means for evaluating character behaviors with respect to their timing, specifications, and
interactivity. Moreover, it provided our animators with a practice run, complete with feedback,
before moving on to a high-resolution character (Figure 2).

MEL Shortcuts: Automating Our Setups

Maya Embedded Language (MEL) scripts were essential for bridging

the gap between the level of complexity we desired and the time
we had scheduled to implement it. Through MEL scripts, we were
able to streamline setup operations, customize animation
processes, and level our technological playing field.

Two such scripts (examined later in this article) allowed our team
to take advantage of driven key functionality that otherwise would
have been too cumbersome to animate or too tedious to rig by
hand. Another tool enabled our artists, regardless of technical Figure 3. This leg
experience, to fit characters with IK systems automatically. setup was used for
most bipendal
Most of our bipedal characters had leg setups like the one pictured characters, saving
in Figure 3. As seen in the hierarchy (Figure 4) our legs had tedious hand-setups
standard hip, knee, and ankle joints, a heel joint, and two to three for IK systems for
bones in the feet. (For clarity purposes, please note that we individual characters.
referred to our foot bones as "toes.")

Our IK-rig consisted of three to four RP (Rotate Plane) IK-handles. These connected hip-to-
ankle, ankle-to-toe, toe-to-toe and/or toe-to-null. All were configured into a hierarchy (Figure
5) that specified relationships between the IK-handles, a set of locators, and several NURBS
constraint objects.

Though relatively simple, setting this IK-system up by hand for every

NPC, enemy, and prototype would have taken more time than we
had. Moreover, we knew that this time would be better spent bringing
our characters to life.

An actual tools programmer might scoff at the artist-authored MEL

script we developed to make our leg chains. In the end, however, our
"IK Setup Tool" reduced an hourlong technical chore to a simple task
that took seconds. Furthermore, the script did not require setup
expertise, and our relatively simple code could be customized and
refined entirely from within the art department.

Figure 4. Standard Using the IK Setup Tool (Figure 6) was a three-step process. First,
hierarchy for a an artist checked their characters' leg joint names against the tool's
character's leg, as presets, making any necessary changes. Next, a scale factor for the
shown in the constraint objects was entered, based loosely on a character's size.
Hypergraph. The artist then hit nine buttons in sequence. These buttons would
auto-attach the IK handles and instantly build the constraint
hierarchy.

Dissecting the IK Setup Tool

MEL is a quirky and often inconsistent language. A good portion of the time we spent
developing our IK Setup Tool was used to track down the proper commands for the tasks we
needed to execute. Still, we managed to uncover the MEL commands we needed to actuate the
core tasks of each of our nine tool buttons.
The first button's purpose was to place IK handles on a character's
legs. It read the names of the bones from the top

text fields by using the textFieldGrp command in its query (-q) mode.
These string variables were then passed to the ikHandle command,
which in turn created the IK handles.

The second button placed NURBS cones on a character's hip, ankle,

and toe joints. These cones, created using MEL's cone command,
were the primary constraint objects an animator would use to
manipulate the legs. The xform command was used to query (-q) the
positions of the leg bones and store them as variables. The move
command then read these variables and moved the cones into place.
Finally, MEL's pointConstraint locked the hip cones to the character's Figure 5. The leg
hips.
constraint hierachy
viewed in the
Pressing the third button called CreateLocator to place a pair of Hypergraph,
locators in the scene. Next, the group command grouped the locators showing
to themselves. Then xform (-q) queried the positions of the connections
character's knees, and move translated the two new parent objects to between the IK
the knee joints and the locators to positions in front of the knees. handles, locator set,
and NURBS
Button number four configured the cones, constraint objects.
locators, IK handles, parent groups, and
constraints into a standardized hierarchy via the parent command.
Again, the new groups were translated into place using move. New
constraint relationships were created between the knee locators and
main leg IK handles, and the new constraint hierarchy and the
skeleton. These were implemented using the poleVectorConstraint
and scaleConstraint commands, respectively.

Button five added several expressions to the scene, saving us data-

entry drudgery. We added expression code for specifying both
constraint and skeletal behavior using the expression command,
Figure 6. The IK allowing us to automate both the creation and the specifications of
Setup Tool our setup expressions.
streamlined
repetitive, error- Number six altered the rotate order of the heel and toe NURBS cones
prone setup from XYZ to YXZ using setAttr. We had previously determined that
procedures and kept this rotate order produced the most reliable rotations in our quaint Z-
customization up environment.
within the artists'
hands. Buttons seven through nine performed some final housekeeping
tasks. Button seven grouped custom rotation guides to a character's
spine using the polyCube and parent commands. Button eight used setAttr to ensure Maya's
segment scale compensate was switched off for all of a character's joints. Finally, button nine
keyed a reference frame at -10 on the character's skeleton and constraint hierarchies using
setKeyframe. Listing 1 shows some of the MEL procedures we found most useful.

Automating this process with MEL both saved us time and eliminated the steps most prone to
human error. Furthermore, by enabling any artist, regardless of their setup experience, to fit a
prototype and/or character with a functioning IK system quickly, we alleviated bottlenecks. This
conservation of both time and human resources saved energy that could then be devoted to
artwork.

Listing 1. Some helpful MEL procedures.

// A method for querying a bone's position in world space:

xform -query -worldSpace -translation

my_joint_name;

// A method for querying the contents of a text field:

textFieldGrp -query -text my_text_field_name;

// A method for setting a keyframe at frame -10 on a

hierarchy:

setKeyframe -time -10 -hierarchy below

my_hierarchy_name;

// Basic transformation methods: translation, rotation,

and scaling:

// Moves an object to (0,0,5):

move -absolute 0 0 5 my_object_name;

// Rotates an object by 90 degrees on Z,

// relative to its current Rotation:

rotate -relative 0 0 90 my_object_name;

// Scales an object to 3 times its current size:

scale -relative 3 3 3 my_object_name;

// (Note: All flags are listed in their long forms.)

Low-Tech Animation Solutions

The shortcuts and prototypes I've described so far shared a

common purpose: to help us create better animation more
efficiently. Both of these methods accomplished this by either by
telegraphing problems or by saving time. Often, however, we would
spurn a high-tech solution due to its specificity, inefficiency, and/or
complexity. And still at other times, we embraced traditional CG
taboos.

We consistently and repeatedly translated and scaled our

characters' bones. True, most of us learned on our grandmothers'
knees never to do that to a CG character. "Use your constraints,"
she would say. "Rotate your bones if you must. But avoid scaling
them, and don't ever, ever let me catch you in a translation!" We all
love our grandmothers, but we found that the tenets of traditional
animation called for - nay, demanded - that we defy her.

The reason behind our disobedience was squash and stretch. We

found that by scaling our joints, and especially by translating them,
we could instill our animations with extra gravity and snap. Major
translations often lasted only a couple of frames and, borrowing an
idea from Disney, were "more felt than seen."

Since we had no IK setups to speak of on the spines and arms of

our characters, translating the bones in these body parts was quite
simple. If needed, we could key the leg IK solvers "off" in order to
manipulate these joints. Translation and scaling were effective
across the board and worked wonders on anything from walks to
attacks to facial animation (Figure 7).

Requiring no additional setup, these low-tech solutions saved us

time. Within limits, this method of animation provided animators
with a direct, tactile, and expedient method of sculpting their
characters' poses. Although unglamorous, this technique was as
effective as any in terms of preserving our resources and improving
our animations.
Figure 7. Joint scaling
Walks and the Walk Guide and translation
offered animators
Another device we used to aid our direct manipulation of
animation was called the Walk Guide. poses and gave
We used this tool help our characters' characters' moves
feet stick to the ground during walk and extra verve for the
run animations. Although foot slippage modest cost of a low-
is commonly forgiven in the world of tech solution.
games, we hoped that by eliminating it
we could add an extra dimension of believability to our characters'
locomotion.

The Walk Guide was an elongated cube with many smaller cuboids
attached to it. The smaller cuboids were identical to the polygonal
markers on our characters' ankles and toes, which were grouped to
their feet during setup.

By scaling a special parent node, the Guide's small cuboids could be

adjusted to match a character's foot size. Scaling the large cuboid
allowed an animator to accommodate for the character's stride
length. A set of constraints and locators ensured that as the stride
length changed, the preset foot size remained constant.

Since our walk cycles were animated in place, we needed a way in

which to simulate forward movement while keeping track of the
positions of a character's feet. The solution was to animate the
Walk Guide to the speed specified by the designer (2 meters per
second, for example). Once the Walk Guide was moving at the
proper speed and the small cuboids correctly scaled, an animator
could begin working on the character's walk cycle.
Figure 8. The Walk
Guide helped line up
the characters' feet on The trick to using the Walk Guide to eliminate foot sliding was to
the ground properly keep the character's foot markers lined up with the small cuboids
every frame to on the Guide. This applied for every frame in which the foot made
minimize unattractive contact with the ground (Figures 8).
foot sliding.
Upon a cycle's completion, a character could be put into a level and
moved at its preset speed with little or no foot slippage. Additionally, programmers could scale
the playback speed of the cycle relative to the character's velocity and still have the feet stay
grounded.

There were several gameplay situations that were not as clean as the test case I just described;
however, the Walk Guide did serve to plant our character's feet properly in most of our worlds.
Once accustomed to the Guide, we animators found that using it benefited both our schedule
and our artwork, as it kept track of the more technical aspects of locomotion for us.

Making Faces: Artistic Reasons and Technical Details

We knew from the start of developing Ratchet & Clank that facial expression would be an
important component not just to our cinematics but to our gameplay animations as well. Once
again, we were faced with the dueling goals of animation depth and scheduling efficiency. We
settled on two methods for making faces: one simple one for our enemies and one more
complex for our heroes. Expressions exaggerated the idles, intensified the attacks, and sealed
the deaths our of enemies and heroes alike.

When animating our enemies, we drew on a traditional animation dictum: A viewer of animation
is usually drawn to a character's face, particularly to the eyes. Attention paid to a character's
eyes and mouth was very important to making convincing actions, especially during our quick
gameplay cycles.

Most enemy characters had fairly simple face skeletons. However, these skeletons allowed for a
high degree of manipulation of the eyes and mouth. Each eye had between two and four bones
controlling its brow and lids. Mouths were generally simpler, using only one or two bones. In
most cases, this setup gave us all the flexibility we needed to exaggerate the enemy's features
and thus heighten the emotion of its actions (Figure 9).

Figure 9. With enemy face skeletons, less

was more. Bone detail was reserved for
the eyes and mouth to enable simple,
exaggerated expressions. Here during an
in-game animation the Robot
Paratrooper's face reacts to being knocked
down.

Our heroes' faces had a more sophisticated setup, which they shared with the NPCs. Though
NPC faces were manipulated mostly in our cinematics, Ratchet & Clank made heavy use of
expression during gameplay, as well.

Like the enemy setups, hero and NPC faces were manipulated via their face joints. Unlike the
enemies', these joints were animated though a driven key system instead of being transformed
directly. Since they clocked more screen time, hero and NPC faces tended to have a far greater
amount of bones - and hence expressive range - than their enemy counterparts.
Figure 10. With enemy face skeletons, less
was more. Bone detail was reserved for
the eyes and mouth to enable simple,
exaggerated expressions. Here during an
in-game animation the Robot
Paratrooper's face reacts to being knocked
down.

Figure 10 shows some of the range of expression Ratchet and Clank exhibit during gameplay.
He smiles when excited, grimaces when he's hit, grits his teeth during combat, chatters them
when he's cold, and drops his jaw when he dies. Clank's expressions change both while he's
strapped to Ratchet's back and when he's played independently.

As I mentioned earlier, hero and NPC expressions were animated by combining preset driven
key attributes via a MEL script slider interface. These presets allowed the animator to combine
and create a wide array of facial expression without having to build them from scratch. Like
color primaries, these attributes could be blended together to form new combinations.

About half of a character's 40 or so facial attributes were dedicated to producing a basic

expression, either on all or on parts of the face. These basic expressions included anger,
disgust, fear, happiness, sadness, and surprise, all of which would be easily recognizable to a
player. More subtle attributes were dedicated to animating phonemes and controlling individual
facial features. Unique and varied emotional ranges could then be achieved by combining
expression, phoneme, and feature attributes together.

Scripting Facial Presets

Assigning facial presets to our characters cost us some setup time.

However, we were able optimize some of the

processes with another MEL script. Like our other MEL tools, this
script automated some of the tedious steps, allowing a setup artist to
spend more time on the art of sculpting facial poses.

Facial presets were created in a separate animation file, where each

expression, phoneme, or feature pose was stored as a separate Figure 11. The
keyframe. Upon completing this file, a character artist would use our Driven Key
MEL Driven Key Generator (Figure 11) to set the driven keys Generator analyzed
automatically for each pose. a preset facial pose,
compared it to a
The Driven Key Generator worked by comparing the transformations neutral pose, and
of the keyframed pose to those of a default. When the script assigned driven keys
registered that a channel had changed from the default, it would set to the affected
a driven key on that channel based on its changed value. The script channels.
relied on MEL's arithmetic functions to identify value changes, and its
setAttr and setDrivenKeyframe commands to activate the drivers. Listing 2 shows some of the
Driven Key Generator's sample code.

The drivers for our facial animations were stored on a model called the Control Box, shown in
Figure 12. This hierarchy of cubes served as a visual outline of facial attributes, and could also
double as a second interface. For efficiency's sake, Ratchet, Clank and all of our NPC characters
had identical Control Boxes, though Ratchet's had many more active drivers.

We found our automated setup method to be advantageous for

three reasons. First, it saved a setup artist from

having to manually identify and key bones, channels, and drivers.

Second, it assigned driven keys to changed channels only, leaving
any non-affected channels free for animators to keyframe. Finally,
it circumvented Maya's built-in driven key interface, which we
found to be cumbersome and even unreliable when simultaneously
assigning multiple bones and channels to a driver.
Figure 12. Each NPC
Regardless of method, facial animation played a vital role in and hero had its own
breathing life into our gameplay characters. Again, MEL was Control Box on which
instrumental both in granting our artists access to an advanced its facial drivers were
Maya feature, and in optimizing our workflow. Whether a hero or an stored. Facial drivers
enemy, virtually every character personality in our game was were actually
strengthened through facial expressions. In turn, this enhanced attributes of the
interactions both with players as well as between the characters Control Box's cubes.
themselves.

End of Cycle

Like all character-driven projects, Ratchet & Clank presented our animation team with a unique
set of artistic and technical challenges. Our artistic philosophy was built on the understanding
that our characters were the instruments though which a player would experience our universe.
We knew that in meeting these challenges, our puppets would transcend mere game space and
become the entities that our players would identify with, vilify, and even personify.

However, this philosophy needed to be coupled with practical methodology if it was to see our
project to its conclusion. From this necessity grew our testing practices, MEL shortcuts, and
real-time animation procedures. Throughout production, these methods removed many of the
barriers that would otherwise have obstructed the artistic efforts of our animators.

As the Insomniac team cycles into our next project, we continue to refine and expand upon the
systems and procedures we developed during Ratchet & Clank. Though our procedures continue
to evolve, our underlying goals remain unchanged. For in the end, we can only prove a
technology's worth by an audience's response to our characters.

Listing 2. Sample code from the Driven Key Generator.

// The "if" gate checks for changed X-Translation values

// between the Default and Posed frames.

if ($txa != $txb)
{

// Sets the Driver Attribute and the Current

Joint's
// X-Translation to their Default Values;
// Sets a Driven Key Frame for the Default
Values.

setAttr $atnm $dr0;

setAttr ($current + ".tx") $txa;
setDrivenKeyframe -currentDriver $atnm -
attribute
translateX $current;

// Sets the Driver Attribute and the Current

Joint's
// X-Translation to their Posed Values;
// Sets a Driven Key Frame for the Posed Values.

setAttr $atnm $dr1;

setAttr ($current + ".tx") $txb;
setDrivenKeyframe - currentDriver $atnm -
attribute
translateX $current;

// Prints command summary to the Script Editor

for
// easy reference.

print ($current + ": TX has been keyed for

slider
value 0: " + $txa + " and slider value 10: " +
$txb); print " \n";

// In this loop segment, $current is the current joint,

// and $atmn is the attribute name. $dr0 and $dr1
represent
// Default and Posed Driver values. $txa & $txb are the
// Default and Posed X-translation values, respectively.

// (Note: All flags are listed in their long forms.)

Copyright © 2003 CMP Media Inc. All rights reserved.

Gama Network Presents:

What Designers Need to Know About Physics

By Fred Marcus
Gamasutra
January 21, 2003

URL: http://www.gamasutra.com/resource_guide/20030121/marcus_01

During my years as game design director at Angel Studios, there wasn’t a day where I didn’t
have to deal with physics in one or several of our products. Impressive physic demos have
always been one of Angel Studios ticket to get contracts, so we had to have some in our games.
And actually, we had a lot of physics in our games: from driving physics to ragdolls, to collisions
with stacking and more, the entire spectrum was covered. They all helped make our games look
and feel different from the rest of the crowd. That, and our ability to tame physics so our games
stayed playable and fun.

In this article, I will try to explain how to tune physics from a designer’s point of view with
some fundamentals you have to know and end with some classic traps to avoid. But first, do
you approach physics as a game designer?

A Hands-On Approach

Physics can make a game look and feel different. They can bring shocking realism to impacts,
to vehicle controls and really enhance the critical reactions a player’s action generate.
Physics can help pull a game into reality. The ultimate goal, however, is to enhance that reality.
If you get it right, you will give unique sensations to players and show them things they have
never experienced before. Unfortunately, there is a price to pay for that. Physics can take a lot
of CPU time and it requires a lot of tuning to keep a game fun to play.

Badly tuned physics can be a disaster: A vehicle can become impossible to control, objects can
get in the way of the player, things just don’t feel right and what should have made the game
fun just generates frustration. As a game designer, you are responsible for the fun of the game,
so it is your role to make sure physics are tuned properly and not left solely to a programmer’s
whim.

If you don't play a role designing and tuning the physics, your game might ship with you
unhappy with the tuning, wishing you had learned more sooner (also called regrets) and you
will miss a lot of opportunities to discover fun things you might have done with the physics. You
will have to get your hands dirty and tweak many variables many times before you get some
good results. There are no other ways around that. Besides, if you are a game designer, you
already have to tune a lot of variables for your controls and your cameras.

But before you get there though, you need to learn the theory behind physics.

What Are They Talking About?

As a game designer, it is also your role to sit down and talk to your team and create a
communication flow between all team members. Usually, you do the talking. You listen, yes,
but you also have to explain a lot of things to a lot of people.

Well, this time, it is your turn to listen and to listen well! It is essential for the designer to
understand the language of physics, to understand its limitations in real case scenarios, to
make sure you get a grasp of what is possible or not. Only then, after hours of questioning and
drawings (Paper and pencil are your best friends when sitting with a physics programmer) you
will start to grasp how you can control this beast and make it behave the way you want it to.

It is ok to ask questions, again and again. It’s time to take your game designer’s ego and throw
it out the door. If you don’t understand something, just ask and ask again, until you get it right
-- Everything has to be crystal clear. You might look dumb, that’s for sure, but you cannot
leave anything unexplained.

Understanding what the physics guy means is crucial if you plan to start to tweak things around
and understand what is happening in your game. It will also allow both of you to work on
solutions that solve these “special cases” occurrences. You know, it’s a game and sometimes
you don’t want physics to do this or that. If you don’t know what you are talking about, physics
wise, you will not be able to convince your programmer to tweak the code to make things work.
After all, he wants it to be real but you want it to be really FUN.

Before you start talking to your programmer (Yes, you have to!), here is a very simple to
understand primer on game physics as well as some basic techniques to help you tune your
game for fun.

Center of Gravity

The center of gravity is a crucial parameter in physics. It will determine how your object will
react to forces applied to it. The center of gravity is at the center of mass of an object (center
of mass is another name for center of gravity), where the object will likely be balanced. If you
move the position of the center of gravity of an object, you will change it’s behavior
dramatically: If a tall object has its center of gravity very low, it will be very stable and hard to
roll. If it is placed way up there though, the object will roll very easily. If you want a car to
make nice drifts, or roll a bit into turns, move your center of gravity around. You will get very
different driving feelings when you do so.

Control, It’s All About Control

Physics in games are about forces and impulsions (a force is a continuous impulse). You can’t
just move a physically based object by giving it a position in 3D space. You have to push it or
apply a torque to rotate it. Also, the heavier the object, the stronger the force or the impulsion
you need to apply to move it (except if your object and the ones you are colliding with have a
low friction…). Some physics engine allow you to specify x,y and z coordinates and will try to
calculate the forces needed to bring your object there, but it will still be an approximation.

To control all these forces and to prevent things to get too crazy (like an object spinning all
over the place, a car that falls on it’s side too easily in turns) you have a friend that can help
you. And this friend is called dampening. This is your Tylenol, your savior -- the magic word. If
in your game physics make things fly out of control, go to your programmer and say the magic
word. He will know what you are talking about. You should also ask him about dampening when
things feel too slow, sluggish, smooth, controlled. Then, he probably already has dampening
coded in and he did the tuning himself. Get these tuning parameters exposed and play with
them.

Translation and Rotations

You can dampen things in translation and in rotation, for each axis. In translation, damping can
make a box slow down progressively once pushed or it can slow down a car going too fast.
Think of it as brakes. It’s up to you if you want the brakes to be very soft and progressive or if
you want to bring objects to a brutal stop. Too much translation damping and the object won’t
move at all though.

In rotation, you can prevent an object to roll too much. Think of a car taking a turn too fast. If
you don’t want that car to roll over, you add damping on the z axis. If you put too much
damping though, the car won’t rotate at all and look very unrealistic.

Learn to Love Quadratics

So, damping is great, it helps you control things. The problem is, sometimes, you want an
object to behave differently at different velocities.

Maybe you want a vehicle to roll a lot at low speeds but if you keep it like that, it will definitely
roll over at high speeds and that’s a no no! Well, you can ask your programmer to give you
speed relative damping! Quadratics are great for that. You get a constant damping value C,
valid even if the object is static. On top of that, you have a damping value B for when the
object speeds up. And then, at really high speed, a third damping A value kicks in for these
special cases. The thing is, damping values can be updated every frame so it’s up to you to see
what your game needs and design a system that fits these needs.

Get Rid of It!

Of course, too much damping and your game doesn’t need physics anymore! That’s where the
difficulty in balancing all this comes from. If your physics is too controlled by damping in order
to make the game playable, you can get rid of it when you need to! For example, during a car
crash, remove damping on all axis at once or progressively to see the vehicle spin all over the
place in the most spectacular way. The difficulty for the designer and the programmer is to
determine and recognize when a vehicle is really crashing.

Once you've learned a bit about how the physics in your game works, you'll soon encounter a
number of common problems that you previously lacked the tools to tackle. This section offers
a list of traps you have to avoid at all cost. They are classic problems, I see them every single
time physics are implemented in a game. Make sure you read this before you start to tune.

Real Time Tuning

A key element in tuning physics is the possibility to tweak each value in real time while the
game plays. You have to ‘feel’ physics, just like you need to feel the controls and the cameras
in your game. Being able to tweak physics values in real time will allow you to increase your
iteration rate dramatically. Ask for this feature. Better: BEG for it!

Lock your frame rate!

Variable frame rate will change your physically based behaviors. Even if you are told that they
will stay the same once the code has been optimized, don’t trust believe it! It will not happen, it
will change and you will have to re-tune things (hopefully you will be trained by then and it
won’t take you that long).

Oversampling, a Chain Reaction Explained

Sloppy physics code can take a lot of CPU cycles. It can take so much time that your frame rate
will drop from 60 to 30 frames per seconds or worst. Variable frame rate disturbs a physic
simulation and usually, oversampling is used to solve the problem.

Oversampling physics means that the physics code is updated independently from the display.
If the game is displayed every other frame (30 fps), physics are still updated 60 times per
second. It means that the game runs through the physics code two times before an image is
displayed. It helps keep the physics stable.

But wait, if the game has to run the physics code two times before it displays something, then,
it eats even more CPU time. And the game can get even slower! And then you might drop to 20
fps and need to update physics three times before you display anything! This chain reaction is a
classic problem and trying to get a good, fast 60fps with no oversampling is the best you can
ask for. Beware of oversampling and low frame rates.

Physics LOD'd

Not everything needs to be physically based in your game. Especially when these things are far
from the player’s point of view. No need to compute suspensions if the vehicle is four pixels on
screen! Make sure your coder has physics LOD built in.

Another crucial thing to remember is that AI does not need to be physically based either. If you
want to have tight control over what the AI does in your game, then don’t make it physically
based. Physics will prevent programmers to move an object by specifying x, y and z
coordinates. They will need to move objects through impulsions and forces, a more ‘analogue’
and approximated way to do it.

Final Word

Not everyone is convinced that good physics are essential to a game -- "games don’t need it to
be fun” I hear every so often. It is true, absolutely.

But, each new platform brings us closer to realism graphically. Hardcore gamers complain when
the art is average, the AI dumb… and sooner or later they will reach a point where they will not
accept object behaviors that don’t look right. As for game designers, we have the opportunity to
make sure that game physics not only look and feel right, but that they play right -- balancing
realism with effective and fun tuning.

Copyright © 2000-2001 CMP Media Inc. All rights reserved.

Gama Network Presents:

Contact Physics

By Roderick Kennedy
Gamasutra
January 21, 2003

One of my first jobs in game physics was writing the flight models for the fighter sim EF2000.
Back in the mid-90's, the physics challenges were well suited to the PC's of the time, and
contact physics wasn't part of the picture. A plane can be modelled very accurately as a point
mass in the sky, and the challenge for the physics programmer is to get the right lift
coefficients, drag, and engine model. It's hard to believe now, but combat flight sims were one
of the biggest PC genres in 1996. Microprose's F15 Strike Eagle kicked the whole thing off,
Spectrum Holobyte responded with the classic Falcon. Meanwhile, British upstarts DID
challenged the big boys with TFX, and then EF2000. And a shareware game called Doom was
slowing up development time, taking over office networks at lunch and dinnertime.

Figure 1. From EF2000 to WRC

2, the physics challenges are a
world apart.

Today, flying and shooting is a niche market; todays games are much more "close-in", and it's
ground-based car simulations like World Rally Championship that occupy the hardcore sim niche
that flight sims once did. We now have the challenge of making games feel solid, creating an
illusion of tangible physical presence. With today's advanced graphics you really notice when
the physics are lagging behind. My colleagues at Evolution Studios (many of them DID alumni),
are looking to bring that all-important sense of solidity to new levels as they begin work on
their next WRC title.

In this article I will show how solid contact physics can be implemented, and describe some of
the problems the programmer will encounter. The article should be of help to physics
programmers, users of 3rd party engines, and decision makers who need to evaluate competing
technologies.

Figure 2. Oops! A car with a single

contact.

A Solid Contact

An example: A car has flipped over, and hits the ground, as in Figure 2. With a single contact
point and no friction, we do the math to calculate its motion. This simplest case has been
covered by other authors, so I'll be brief. The mass of the car is m, and it has a 3×3 moment of
inertia matrix J. We're looking for the force that the ground exerts on the car at the contact.
The vector equations for linear and angular motion are:

(1)

where x is the car's position, g is the gravitational acceleration; w is the car's angular velocity,
and q is the vector from x to the contact point. We've called the mystery contact force f, and N
is the surface normal, which is also the direction our force will act in. In Matrix form, this is our
"equation of motion":

(2)

I've put subscripts to describe the exact number of rows and columns. A 6 by 1 matrix and a 6-
vector are interchangeable. The "constraint equation" should complete the picture by specifying
that objects should not occupy the same space. We require that the car's contact point remains
exactly on the surface. Call this r1 in world space.

(3)
(4)

i.e. r2 is just the projection of r1 onto the surface. Our constraint is that r1 remains on the
surface:

(5)

and that works out as just . The 2nd order constraint is obtained by differentiating
twice:

(6)

(7)

So after a little re-arrangement, we have the constraint equation in matrix form:

(8)

The left-hand term is a "centripetal acceleration" - all points on a rotating solid accelerate
towards the centre. Note that the 1 by 6 matrix in the middle is the exact transpose of the one
in the equation of motion: this is true in general. I prefer to use a single vector for the
acceleration degrees of freedom, so:

(9)

(10)

Now, although I've used to describe the acceleration, we don't actually have a vector y,
because angular position can't be properly described with a 3-element vector. But as long as we
can obtain the change in w from one frame to the next, we can use quaternions or some other
method to describe angular position. Now invert the 6 by 6 mass matrix M, and substitute
equation. (9) into (10):

So we can define the scalars:

and :

(11)

This is our main equation. The solution is just f =- l/G. It's good to define l so it appears
negative in our expression, as l is the acceleration that would exist between the contact points
without the contact force. The force f acts in the opposite direction. If you calculate f and it
turns out to be negative, that means we're pulling, not pushing, and you should deactivate the
contact.
Now applying this acceleration over several timesteps will keep the car skidding along the
surface in the correct manner provided the initial velocity between the contact points was zero.
If it wasn't (e.g. when they first collided) we would need to apply an impulse to fix that.
Without going into details, the answer is: i = - v/G, where v is the velocity of r1 relative to r2,
normal to N and i is the impulse to be applied at the point of impact. You can apply this
correction every frame to prevent drift, alternatively add a heuristic term to l which is
proportional to v, so that the force will increase when the relative velocity is negative and
decrease when it's positive. Do the same for the position so the contact points line up nicely.

Multiple Solid Contacts

You can now handle a single contact between the car and the ground. Fine for many fast-
moving collisions, but eventually, the car will slow down and another corner will touch the
ground. Now we have to consider multiple-contact solutions.

Figure 3. Two points, or one edge, are now

touching the ground.

Figure 3 shows this situation, where a whole edge of the car is touching the ground. In reality
the force will be spread across the whole contacting edge. For our purposes, we can just
consider two forces at the two endpoints.

Our two contact points have normals Na and Nb (these might be just the same vertical vector
but let's keep it general). The forces are fa and fb. The equation of motion is

(12)

and as you might guess from Equation (8), the constraint equation is:

(13)

But we can still express this system as:

(14)

(15)

- and these two equations are like Equationns (4) and (5), except now f is a two-element
vector, as is a. Re-arranging,

(16)

Equation (16) is like (11), except of course, that Gamma is now a 2x2 matrix. Here's where
contact physics is different from the old-style video game collision. You can't apply the contact
corrections sequentially. You have to find the one solution for fa and fb which satisfies both
constraints. Now in this case, the matrix is only 2x2, and it's guaranteed to have an inverse
(unless contacts a and b are in the same place).

So:

(17)

Now by putting in two constraints, we've turned our car from a six degree-of-freedom system,
to four. Add another contact, and (usually) you'll take away another degree of freedom. With
three contacts, we've effectively got a whole surface of the car touching the ground, and if the
contact normals are vertical (i.e. we're on flat ground) none of the remaining degrees of
freedom is affected by gravity. So with a bit of friction, our car can come to a halt.

And G would be a 3×3 matrix. Once again, as long as the three contacts are not at the same
place, and don't fall in a line, you're guaranteed to have an invertible matrix G.

If you look in the literature, you'll find that this method isn't used much. That's because matrix
inversion and other standard linear algebra techniques can't guarantee that all the forces will be
positive. Suppose we require in advance that:

for all i (18)

This means all contact forces are positive or zero, and the accelerations they produce are
positive or zero (no impinging between solids). Now we have what's called a "linear
complementarity problem" (LCP), and an iterative method can give a solution where all forces
are either positive, or zero. The common solution method is Lemke's algorithm, which you will
find via [1]. A very good introduction to this approach is found in [2].

Redundancy

The problem is, our car may not have any triangular surfaces! When the next collision occurs,
we'll most likely have four contacts. Not only that, but if the contact normals are all the same,
the G matrix will now be singular - it has no unique inverse, and most matrix-inversion routines
will fail. Why? Because four contacts between two surfaces is more than we need to zero out
the relative motion at the contact points. As shown in Figure 4, there's an infinite number of
combinations of forces at the contacts which will produce the same effect.

…etc.

Figure 4. Redundancy - obviously the forces should be equal. But why?

Of course, in special cases, like a table sitting on the ground, the solution is obvious - the forces
should be equal. But that won't help us find the impulse to apply when the table rolls from two
legs onto four. For a general solution we have three options:

1. Never add the fourth contact.

This would work as long as the three contacts we have are spaced well. But if they're all on one
side of the object, it might start to tip over. Then one contact would vanish, and a new one
would appear on the other side. We could end up with an ugly oscillation.

2. Treat surface-surface contacts as a special case.

We could detect when more than two points on a surface are touching another surface, then
switch to a single "surface contact" which constrains three d.o.f. However, it would be awkward
to then calculate when to deactivate this special contact, and it would mean introducing a
different set of equations. This method would also not help when redundancy arose from
contacts which are not on the same surface.

3. Eliminate the redundancy.

It stands to reason that there should be a relationship between the contact points which would
allow us to find all the contact forces.

Redundancy is a different kind of issue if you're iterating to find a solution, because in most
cases, you don't care which of the infinitely many solutions you find - the behaviour will be the
same, provided you've already made sure of not finding negative forces.

Accuracy and Stabilty

It's quite common in contact physics applications to see objects which should move smoothly or
settle down quickly, instead shake, wobble or jump into motion. There are two common causes
for this behaviour.

1. The solution is wrong. Or not quite correct. This is common when the iterative scheme for
finding the forces either can't reach a solution, or stops too soon. The short answer is to do
more iterations - but there will be a speed issue.

2. The properties of the physical system can't be well-modelled at the framerate. One
way this happens is if you've put some unrealistic values in your mass matrix (having moments
of inertia too small for the object's size and mass is a frequent mistake). Friction can do this as
well, if your friction coefficients are large enough that they more-than reverse the object's
motion between one frame and the next. Physically realistic friction values will do this in many
simulations.

Sometimes, programmers will solve No. 2 by having multiple physics iterations for each game
frame. Please don't do this, there's almost always a better way.

Linked Objects
Figure 5: Modelling ragdolls needs joined-
up thinking.

We've so far covered contacts between single, 6-d.o.f. bodies and a static world. But a lot of
games need linked systems of bodies, for instance using a "ragdoll" model for realistic death
animations (see Figure 5). There are two approaches you can use here:

1. Treat each part of the hierarchy as a separate body. Then define special contacts
between the bodies at the joints. We would have a contact for, say, the hip-joint, which can
have positive or negative forces, limits rotation, and acts in all directions. Using this method,
you can use the techniques outlined above, but you will probably need a good iterative scheme,
as you will be solving for, well, a lot of forces.

2. Treat the whole hierarchy as a single object. The object will have, not 6, but maybe 26
degrees of freedom. Calculating the mass matrix will be a daunting task, but with this method
you won't have to worry about limbs stretching and detaching when too-big an impulse is
applied.

Friction

Friction should be applied between nearly all contacts. Generally, if there is a sideways velocity
between the contacting points r1 and r2, dynamic friction is:

(17)

where mdynamic is the coefficient of dynamic friction, Fnormal is the contact force Nf and is the
unit vector in the "scraping direction". Note how even if the relative velocity is small we can still
get a big force, and that can lead to the stability problem I mentioned earlier.
Static friction is different; it acts to prevent relative motion completely, and can do so provided
the necessary friction force is smaller than mstatic|Fnormal|. To model static friction, treat it as
two extra forces per contact to be solved for in f. It can also be considered as another
complementarity condition in the solution method - either static friction is smaller than the
maximum, or it is zero.

Generally, mstatic is larger than mdynamic - for example, a car's tyres on tarmac might have a
static friction coefficient of 1.5, and a dynamic friction of 1.2. So you'll get better turning force
if you stay in the static friction zone - or better braking if you don't lock the wheels. You can
see this effect in action in WRC - on tarmac you can keep the vehicle just on the edge of the
static limit, but on gravel or snow you'll be in the dynamic zone most of the time.
Many simulations either ignore static friction, or simulate it by having a larger coefficient when
relative velocity is low. This can lead, along with inaccurate coefficients or bad inertia values, to
a "floaty" behaviour - objects seem to slide too much, as though in slow-motion. Avoid this by
correctly modelling static friction whenever possible, and by ensuring your dynamic friction
coefficients are close to reality as you can get without causing instability.

Types of Contact

I've so far discussed only one geometric type of contact - of a point and a surface. Many games
only use this type, but to fully model your objects as polyhedra, you'll need edge-to-edge
contacts as well. Then the contact direction N is determined by the cross product of the edge
directions. Most of the derivations are a little more involved, but the same principles apply as
above. You can also implement curved surfaces. For each contact type (sphere-surface, point-
cylinder, I could go on) you'll need an expression for the line of the b matrix (or three lines if
you're including static friction). Rolling contacts are particularly tricky for all but the simplest
types.

Final Words

A good place to start could be an impulse-based system that only ever has one contact at a
time. Once you're happy with that, try multiple contacts. With only one or two, you will be able
to get away with using the matrix inverse. When you get to having fairly complex systems like
the ragdoll, it's worth trying an iteration scheme.

For anyone serious about game physics, the place to go next is David Baraff's page [1], where
you can download some of the major papers on the subject. Chris Hecker [3] offers a more
game-centric summary and a good overview of the field.

You should now be well on your way to some rock-solid contact physics, though it's a perilous
road. Some programmers have had good results with approximate methods, like the Verlet
particle systems described in [4]. Alternatively, there are several middleware packages which
effectively provide a plug-in solution for dynamics. Unless you find the cost prohibitive or really
want to do something new in physics, these are well worth a look. Game graphics are fairly
racing ahead, and if the physics we use can keep pace, creating new and captivating
experiences for gamers should be well within our grasp.

References

[1] David Baraff's homepage: http://www.pixar.com/companyinfo/research/deb/

[2] David Baraff, "Analytical Methods for Dynamic Simulation of Non-penetrating Rigid Bodies",
Computer Graphics, Volume 23, Number 3, July 1989.

[3] Chris Hecker's homepage: http://www.d6.com/users/checker/dynamics.htm

[4] Thomas Jakobsen, "Advanced Character Physics", Gamasutra Game Physics Resource Guide

Gama Network Presents:

Advanced Character Physics

By Thomas Jakobsen
Gamasutra
January 21, 2003

This article explains the basic elements of an approach to physically-based modeling which is
well suited for interactive use. It is simple, fast, and quite stable, and in its basic version the
method does not require knowledge of advanced mathematical subjects (although it is based on
a solid mathematical foundation). It allows for simulation of both cloth; soft and rigid bodies;
and even articulated or constrained bodies using both forward and inverse kinematics.

The algorithms were developed for IO Interactive’s game Hitman: Codename 47. There, among
other things, the physics system was responsible for the movement of cloth, plants, rigid
bodies, and for making dead human bodies fall in unique ways depending on where they were
hit, fully interacting with the environment (resulting in the press oxymoron “lifelike death
animations”).The article also deals with subtleties like penetration test optimization and friction
handling.

The use of physically-based modeling to produce nice-looking animation has been considered
for some time and many of the existing techniques are fairly sophisticated. Different
approaches have been proposed in the literature [Baraff, Mirtich, Witkin, and others] and much
effort has been put into the construction of algorithms that are accurate and reliable. Actually,
precise simulation methods for physics and dynamics have been known for quite some time
from engineering. However, for games and interactive use, accuracy is really not the primary
concern (although it’s certainly nice to have) – rather, here the important goals are believability
(the programmer can cheat as much as he wants if the player still feels immersed) and speed
of execution (only a certain time per frame will be allocated to the physics engine). In the case
of physics simulation, the word believability also covers stability; a method is no good if objects
seem to drift through obstacles or vibrate when they should be lying still, or if cloth particles
tend to “blow up”.

The methods demonstrated in this paper were created in an attempt to reach these goals. The
algorithms were developed and implemented by the author for use in IO Interactive’s computer
game Hitman: Codename 47, and have all been integrated in IO’s in-house game engine
Glacier. The methods proved to be quite simple to implement (compared to other schemes at
least) and have high performance.

The algorithm is iterative such that, from a certain point, it can be stopped at any time. This
gives us a very useful time/accuracy trade-off: If a small source of inaccuracy is accepted, the
code can be allowed to run faster; this error margin can even be adjusted adaptively at run-
time. In some cases, the method is as much as an order of magnitude faster than other existing
methods. It also handles both collision and resting contact in the same framework and nicely
copes with stacked boxes and other situations that stress a physics engine.
In overview, the success of the method comes from the right combination of several techniques
that all benefit from each other:

z A so-called Verlet integration scheme.

z Handling collisions and penetrations by projection.
z A simple constraint solver using relaxation.
z A nice square root approximation that gives a solid speed-up.
z Modeling rigid bodies as particles with constraints.
z An optimized collision engine with the ability to calculate penetration depths.

Each of the above subjects will be explained shortly. In writing this document, the author has
tried to make it accessible to the widest possible audience without losing vital information
necessary for implementation. This means that technical mathematical explanations and notions
are kept to a minimum if not crucial to understanding the subject. The goal is demonstrating
the possibility of implementing quite advanced and stable physics simulations without dealing
with loads of mathematical intricacies.

The content is organized as follows. First, in Section 2, a “velocity-less” representation of a

particle system will be described. It has several advantages, stability most notably and the fact
that constraints are simple to implement. Section 3 describes how collision handling takes
place. Then, in Section 4, the particle system is extended with constraints allowing us to model
cloth. Section 5 explains how to set up a suitably constrained particle system in order to
emulate a rigid body. Next, in Section 6, it is demonstrated how to further extend the system to
allow articulated bodies (that is, systems of interconnected rigid bodies with angular and other
constraints). Section 7 contains various notes and shares some experience on implementing
frictionetc. Finally, in Section 8 a brief conclusion.

In the following, bold typeface indicates vectors. Vector components are indexed by using
subscript, i.e., x=(x1, x2, x3).

Verlet integration

The heart of the simulation is a particle system. Typically, in implementations of particle

systems, each particle has two main variables: Its position x and its velocity v. Then in the
time-stepping loop, the new position x’ and velocity v’ are often computed by applying the
rules:

where ∆t is the time step, and a is the acceleration computed using Newton’s law f=ma (where
f is the accumulated force acting on the particle). This is simple Euler integration.

Here, however, we choose a velocity-less representation and another integration scheme:

Instead of storing each particle’s position and velocity, we store its current position x and its
previous position x*. Keeping the time step fixed, the update rule (or integration step) is then:

This is called Verlet integration (see [Verlet]) and is used intensely when simulating molecular
dynamics. It is quite stable since the velocity is implicitly given and consequently it is harder for
velocity and position to come out of sync. (As a side note, the well-known demo effect for
creating ripples in water uses a similar approach.) It works due to the fact that 2x-x*=x+(x-
x*) and x-x* is an approximation of the current velocity (actually, it’s the distance traveled last
time step). It is not always very accurate (energy might leave the system, i.e., dissipate) but
it’s fast and stable. By lowering the value 2 to something like 1.99 a small amount of drag can
also be introduced to the system.

At the end of each step, for each particle the current position x gets stored in the corresponding
variable x*. Note that when manipulating many particles, a useful optimization is possible by
simply swapping array pointers.

The resulting code would look something like this (the Vector3 class should contain the
appropriate member functions and overloaded operators for manipulation of vectors):

// Sample code for physics simulation

class ParticleSystem {
Vector3 m_x[NUM_PARTICLES]; // Current positions
Vector3 m_oldx[NUM_PARTICLES]; // Previous positions
Vector3 m_a[NUM_PARTICLES]; // Force accumulators
Vector3 m_vGravity; // Gravity
float m_fTimeStep;
public:
void TimeStep();
private:
void Verlet();
void SatisfyConstraints();
void AccumulateForces();
// (constructors, initialization etc. omitted)
};
// Verlet integration step
void ParticleSystem::Verlet() {
for(int i=0; i<NUM_PARTICLES; i++) {
Vector3& x = m_x[i];
Vector3 temp = x;
Vector3& oldx = m_oldx[i];
Vector3& a = m_a[i];
x += x-oldx+a*fTimeStep*fTimeStep;
oldx = temp;
}
}
// This function should accumulate forces for each particle
void ParticleSystem::AccumulateForces()
{
// All particles are influenced by gravity
for(int i=0; i<NUM_PARTICLES; i++) m_a[i] = m_vGravity;
}
// Here constraints should be satisfied
void ParticleSystem::SatisfyConstraints() {
// Ignore this function for now
}
void ParticleSystem::TimeStep() {
AccumulateForces();
Verlet();
SatisfyConstraints();
}

The above code has been written for clarity, not speed. One optimization would be using arrays
of float instead of Vector3 for the state representation. This might also make it easier to
implement the system on a vector processor.

This probably doesn’t sound very groundbreaking yet. However, the advantages should become
clear soon when we begin to use constraints and switch to rigid bodies. It will then be
demonstrated how the above integration scheme leads to increased stability and a decreased
amount of computation when compared to other approaches.
Try setting a=(0,0,1), for example, and use the start condition x=(1,0,0), x*=(0,0,0), then do
a couple of iterations by hand and see what happens.

Collision and contact handling by projection

So-called penalty-based schemes handle contact by inserting springs at the penetration points.
While this is very simple to implement, it has a number of serious drawbacks. For instance, it is
hard to choose suitable spring constants such that, on one hand, objects don’t penetrate too
much and, on the other hand, the resulting system doesn’t get unstable. In other schemes for
simulating physics, collisions are handled by rewinding time (by binary search for instance) to
the exact point of collision, handling the collision analytically from there and then restarting the
simulation – this is not very practical from a real-time point of view since the code could
potentially run very slowly when there are a lot of collisions.

Here, we use yet another strategy. Offending points are simply projected out of the obstacle.
By projection, loosely speaking, we mean moving the point as little as possible until it is free of
the obstacle. Normally, this means moving the point perpendicularly out towards the collision
surface.

Let’s examine an example. Assume that our world is the inside of the cube (0,0,0)-
(1000,1000,1000) and assume also that the particles’ restitution coefficient is zero (that is,
particles do not bounce off surfaces when colliding). To keep all positions inside the valid
interval, the corresponding projection code would be:

// Implements particles in a box

void ParticleSystem::SatisfyConstraints() {
for(int i=0; i<NUM_PARTICLES; i++) { // For all particles
Vector3& x = m_x[i];
x = vmin(vmax(x, Vector3(0,0,0)),
Vector3(1000,1000,1000));
}
}

(vmax operates on vectors taking the component-wise maximum whereas vmin takes the
component-wise minimum.) This keeps all particle positions inside the cube and handles both
collisions and resting contact. The beauty of the Verlet integration scheme is that the
corresponding changes in velocity will be handled automatically. In the following calls to
TimeStep(), the velocity is automatically regulated to contain no component in the normal
direction of the surface (corresponding to a restitution coefficient of zero). See Figure 1.

Figure 1: Ten timesteps and

two particles.

Try it out – there is no need to directly cancel the velocity in the normal direction. While the
above might seem somewhat trivial when looking at particles, the strength of the Verlet
integration scheme is now beginning to shine through and should really become apparent when
introducing constraints and coupled rigid bodies in a moment.
Solving several concurrent constraints by relaxation

A common model for cloth consists of a simple system of interconnected springs and particles.
However, it is not always trivial to solve the corresponding system of differential equations. It
suffers from some of the same problems as the penalty-based systems: Strong springs leads to
stiff systems of equations that lead to instability if only simple integration techniques are used,
or at least bad performance – which leads to pain. Conversely, weak springs lead to elastically
looking cloth.

However, an interesting thing happens if we let the stiffness of the springs go to infinity: The
system suddenly becomes solvable in a stable way with a very simple and fast approach. But
before we continue talking about cloth, let’s revisit the previous example. The cube considered
above can be thought of as a collection of unilateral (inequality) constraints (one for each side
of the cube) on the particle positions that should be satisfied at all times:

(C1)

In the example, constraints were satisfied (that is, particles are kept inside the cube) by simply
modifying offending positions by projecting the particles onto the cube surface. To satisfy (C1),
we use the following pseudo-code

// Pseudo-code to satisfy (C1)

for i=1,2,3
set xi=min{max{xi, 0}, 1000}

One may think of this process as inserting infinitely stiff springs between the particle and the
penetration surface – springs that are exactly so strong and suitably damped that instantly they
will attain their rest length zero.

We now extend the experiment to model a stick of length 100. We do this by setting up two
individual particles (with positions x1 and x2) and then require them to be a distance of 100
apart. Expressed mathematically, we get the following bilateral (equality) constraint:

Although the particles might be correctly placed initially, after one integration step the
separation distance between them might have become invalid. In order to obtain the correct
distance once again, we move the particles by projecting them onto the set of solutions
described by (C2). This is done by pushing the particles directly away from each other or by
pulling them closer together (depending on whether the erroneous distance is too small or too
large). See Figure 2.

Figure 2: Fixing an invalid distance by moving particles.

The pseudo-code for satisfying the constraint (C2) is

// Pseudo-code to satisfy (C2)
delta = x2-x1;
deltalength = sqrt(delta*delta);
diff = (deltalength-restlength)/deltalength;
x1 -= delta*0.5*diff;
x2 += delta*0.5*diff;

Note that delta is a vector so delta*delta is actually a dot product. With restlength=100 the
above pseudo-code will push apart or pull together the particles such that they once more
attain the correct distance of 100 between them. Again we may think of the situation as if a
very stiff spring with rest length 100 has been inserted between the particles such that they are
instantly placed correctly.

Now assume that we still want the particles to satisfy the cube constraints. By satisfying the
stick constraint, however, we may have invalidated one or more of the cube constraints by
pushing a particle out of the cube. This situation can be remedied by immediately projecting the
offending particle position back onto the cube surface once more – but then we end up
invalidating the stick constraint again.

Really, what we should do is solve for all constraints at once, both (C1) and (C2). This would be
a matter of solving a system of equations. However, we choose to proceed indirectly by local
iteration. We simply repeat the two pieces of pseudo-code a number of times after each other
in the hope that the result is useful. This yields the following code:

// Implements simulation of a stick in a box

void ParticleSystem::SatisfyConstraints() {
for(int j=0; j<NUM_ITERATIONS; j++) {
// First satisfy (C1)
for(int i=0; i<NUM_PARTICLES; i++) { // For all particles
Vector3& x = m_x[i];
x = vmin(vmax(x, Vector3(0,0,0)),
Vector3(1000,1000,1000));
}

// Then satisfy (C2)

Vector3& x1 = m_x[0];
Vector3& x2 = m_x[1];
Vector3 delta = x2-x1;
float deltalength = sqrt(delta*delta);
float diff = (deltalength-restlength)/deltalength;
x1 -= delta*0.5*diff;
x2 += delta*0.5*diff;
}
}

(Initialization of the two particles has been omitted.) While this approach of pure repetition
might appear somewhat naïve, it turns out that it actually converges to the solution that we are
looking for! The method is called relaxation (or Jacobi or Gauss-Seidel iteration depending on
how you do it exactly, see [Press]). It works by consecutively satisfying various local
constraints and then repeating; if the conditions are right, this will converge to a global
configuration that satisfies all constraints at the same time. It is useful in many other situations
where several interdependent constraints have to be satisfied at the same time.

The number of necessary iterations varies depending on the physical system simulated and the
amount of motion. It can be made adaptive by measuring the change from last iteration. If we
stop the iterations early, the result might not end up being quite valid but because of the Verlet
scheme, in next frame it will probably be better, next frame even more so etc. This means that
stopping early will not ruin everything although the resulting animation might appear somewhat
sloppier.
Cloth Simulation

The fact that a stick constraint can be thought of as a really hard spring should make apparent
its usefulness for cloth simulation as sketched in the beginning of this section. Assume, for
example, that a hexagonal mesh of triangles describing the cloth has been constructed. For
each vertex a particle is initialized and for each edge a stick constraint between the two
corresponding particles is initialized (with the constraint’s “rest length” simply being the initial
distance between the two vertices).

The function HandleConstraints() then uses relaxation over all constraints. The relaxation
loop could be iterated several times. However, to obtain nicely looking animation, actually for
most pieces of cloth only one iteration is necessary! This means that the time usage in the cloth
simulation depends mostly on the N square root operations and the N divisions performed
(where N denotes the number of edges in the cloth mesh). As we shall see, a clever trick makes
it possible to reduce this to N divisions per frame update – this is really fast and one might
argue that it probably can’t get much faster.

// Implements cloth simulation

struct Constraint {
int particleA, particleB;
float restlength;
};
// Assume that an array of constraints, m_constraints, exists
void ParticleSystem::SatisfyConstraints() {
for(int j=0; j<NUM_ITERATIONS; j++) {
for(int i=0; i<NUM_CONSTRAINTS; i++) {
Constraint& c = m_constraints[i];
Vector3& x1 = m_x[c.particleA];
Vector3& x2 = m_x[c.particleB];
Vector3 delta = x2-x1;
float deltalength = sqrt(delta*delta);
float diff=(deltalength-c.restlength)/deltalength;
x1 -= delta*0.5*diff;
x2 += delta*0.5*diff;
}

// Constrain one particle of the cloth to origo

m_x[0] = Vector3(0,0,0);
}
}

We now discuss how to get rid of the square root operation. If the constraints are all satisfied
(which they should be at least almost), we already know what the result of the square root
operation in a particular constraint expression ought to be, namely the rest length r of the
corresponding stick. We can use this fact to approximate the square root function.
Mathematically, what we do is approximate the square root function by its 1st order Taylor-
expansion at a neighborhood of the rest length r (this is equivalent to one Newton-Raphson
iteration with initial guess r). After some rewriting, we obtain the following pseudo-code:

// Pseudo-code for satisfying (C2) using sqrt approximation

delta = x2-x1;
delta*=restlength*restlength/(delta*delta+restlength*restlength)-0.5;
x1 -= delta;
x2 += delta;

Notice that if the distance is already correct (that is, if |delta|=restlength), then one gets
delta=(0,0,0) and no change is going to happen.

Per constraint we now use zero square roots, one division only, and the squared value
restlength*restlength can even be precalculated! The usage of time consuming operations is
now down to N divisions per frame (and the corresponding memory accesses) – it can’t be done
much faster than that and the result even looks quite nice. Actually, in Hitman, the overall
speed of the cloth simulation was limited mostly by how many triangles it was possible to push
through the rendering system.

The constraints are not guaranteed to be satisfied after one iteration only, but because of the
Verlet integration scheme, the system will quickly converge to the correct state over some
frames. In fact, using only one iteration and approximating the square root removes the
stiffness that appears otherwise when the sticks are perfectly stiff.

By placing support sticks between strategically chosen couples of vertices sharing a neighbor,
the cloth algorithm can be extended to simulate plants. Again, in Hitman only one pass through
the relaxation loop was enough (in fact, the low number gave the plants exactly the right
amount of bending behavior).

The code and the equations covered in this section assume that all particles have identical
mass. Of course, it is possible to model particles with different masses, the equations only get a
little more complex.

To satisfy (C2) while respecting particle masses, use the following code:

// Pseudo-code to satisfy (C2)

delta = x2-x1;
deltalength = sqrt(delta*delta);
diff = (deltalength-restlength)
/(deltalength*(invmass1+invmass2));
x1 -= invmass1*delta*diff;
x2 += invmass2*delta*diff;

Here invmass1 and invmass2 are the numerical inverses of the two masses. If we want a
particle to be immovable, simply set invmass=0 for that particle (corresponding to an infinite
mass). Of course in the above case, the square root can also be approximated for a speed-up.

Rigid Bodies

The equations governing motion of rigid bodies were discovered long before the invention of
modern computers. To be able to say anything useful at that time, mathematicians needed the
ability to manipulate expressions symbolically. In the theory of rigid bodies, this lead to useful
notions and tools such as inertia tensors, angular momentum, torque, quaternions for
representing orientations etc. However, with the current ability to process huge amounts of
data numerically, it has become feasible and in some cases even advantageous to break down
calculations to simpler elements when running a simulation. In the case of 3D rigid bodies, this
could mean modeling a rigid body by four particles and six constraints (giving the correct
amount of degrees of freedom, 4x3-6 = 6). This simplifies a lot of aspects and it’s exactly what
we will do in the following.

Consider a tetrahedron and place a particle at each of the four vertices. In addition, for each of
the six edges on the tetrahedron create a distance constraint like the stick constraint discussed
in the previous section. This is actually enough to simulate a rigid body. The tetrahedron can be
let loose inside the cube world from earlier and the Verlet integrator will let it move correctly.
The function SatisfyConstraints() should take care of two things: 1) That particles are kept
inside the cube (like previously), and 2) That the six distance constraints are satisfied. Again,
this can be done using the relaxation approach; 3 or 4 iterations should be enough with optional
square root approximation.

Now clearly, in general rigid bodies do not behave like tetrahedrons collision-wise (although
they might do so kinetically). There is also another problem: Presently, collision detection
between the rigid body and the world exterior is on a vertex-only basis, that is, if a vertex is
found to be outside the world it is projected inside again. This works fine as long as the inside
of the world is convex. If the world were non-convex then the tetrahedron and the world
exterior could actually penetrate each other without any of the tetrahedron vertices being in an
illegal region (see Figure 3 where the triangle represents the 2D analogue of the tetrahedron).
This problem is handled in the following.

Figure 3: A tetrahedron
pentrating the world.

We’ll first consider a simpler version of the problem. Consider the stick example from earlier
and assume that the world exterior has a small bump on it. The stick can now penetrate the
world exterior without any of the two stick particles leaving the world (see Figure 4). We won’t
go into the intricacies of constructing a collision detection engine since this is a science in itself.
Instead we assume that there is a subsystem available which allows us to detect the collision.
Furthermore we assume that the subsystem can reveal to us the penetration depth and identify
the penetration points on each of the two colliding objects. (One definition of penetration points
and penetration depth goes like this: The penetration distance dp is the shortest distance that
would prevent the two objects from penetrating if one were to translate one of the objects by
the distance dp in a suitable direction. The penetration points are the points on each object that
just exactly touch the other object after the aforementioned translation has taken place.)

Take a look again at Figure 4. Here the stick has moved through the bump after the Verlet step.
The collision engine has identified the two points of penetration, p and q. In Figure 4a, p is
actually identical to the position of particle 1, i.e., p=x1. In Figure 4b, p lies between x1 and
x2 at a position ¼ of the stick length from x1. In both cases, the point p lies on the stick and
consequently it can be expressed as a linear combination of x1 and x2, p=c1 x1+c2 x2 such
that c1+c2=1. In the first case, c1=1 and c2=0, in the second case, c1=0.75 and c2=0.25.
These values tell us how much we should move the corresponding particles.

To fix the invalid configuration of the stick, it should be moved upwards somehow. Our goal is
to avoid penetration by moving p to the same position as q. We do this by adjusting the
positions of the two particles x1 and x2 in the direction of the vector between p and q, ∆=q-p.

In the first case, we simply project x1 out of the invalid region like earlier (in the direction of q)
and that’s it (x2 is not touched). In the second case, p is still nearest to x1 and one might
reason that consequently x1 should be moved more than x2. Actually, since p=0.75 x1 + 0.25
x2, we will choose to move x1 by an amount of 0.75 each time we move x2 by an amount of
0.25. In other words, the new particle positions x1’ and x2’ are given by the expressions:

(*)

where λ is some unknown value. The new position of p after moving both particles is p’=c1
x1’+ c2 x2’.

Recall that we want p’=q, i.e., we should choose l exactly such that p’ ends up coinciding with
q. Since we move the particles only in the direction of ∆, also p moves in the direction of ∆ and
consequently the solution to the equation p’=q can be found by solving:

(**)

for λ. Expanding the left-hand side yields:

which together with the right-hand side of (**) gives

Plugging λ into (*) gives us the new positions of the particles for which p’ coincide with q.

Figure 5 shows the situation after moving the particles. We have no object penetration but now
the stick length constraint has been violated. To fix this, we do yet another iteration of the
relaxation loop (or several) and we’re finished.

The above strategy also works for the tetrahedron in a completely analogous fashion. First the
penetration points p and q are found (they may also be points interior to a triangle), and p is
expressed as a linear combination of the four particles p=c1 x1+c2 x2+c3 x3+c4 x4 such that
c1+c2+c3+c4=1 (this calls for solving a small system of linear equations). After finding ∆=q-p,
one computes the value:
and the new positions are then given by:

Here, we have collided a single rigid body with an immovable world. The above method
generalizes to handle collisions of several rigid bodies. The collisions are processed for one pair
of bodies at a time. Instead of moving only p, in this case both p and q are moved towards
each other.

Again, after adjusting the particle positions such that they satisfy the non-penetration
constraints, the six distance constraints that make up the rigid body should be taken care of
and so on. With this method, the tetrahedron can even be imbedded inside another object that
can be used instead of the tetrahedron itself to handle collisions. In Figure 6, the tetrahedron is
embedded inside a cube.

First, the cube needs to be ‘fastened’ to the tetrahedron in some way. One approach would be
choosing the system mass midpoint 0.25*(x1+x2+x3+x4) as the cube’s position and then
derive an orientation matrix by examining the current positions of the particles. When a
collision/penetration is found, the collision point p (which in this case will be placed on the
cube) is then treated exactly as above and the positions of the particles are updated
accordingly. As an optimization, it is possible to precompute the values of c1-c4 for all vertices
of the cube. If the penetration point p is a vertex, the values for c1-c4 can be looked up and
used directly. Otherwise, p lies on the interior of a surface triangle or one of its edges and the
values of c1-c4 can then be interpolated from the precomputed values of the corresponding
triangle vertices.

Embedding the
tetrahedron inside
another object.

Usually, 3 to 4 relaxation iterations are enough. The bodies will not behave as if they were
completely rigid since the relaxation iterations are stopped prematurely. This is mostly a nice
feature, actually, as there is no such thing as perfectly rigid bodies – especially not human
bodies. It also makes the system more stable.

By rearranging the positions of the particles that make up the tetrahedron, the physical
properties can be changed accordingly (mathematically, the inertia tensor changes as the
positions and masses of the particles are changed).

Other arrangements of particles and constraints than a tetrahedron are possible such as placing
the particles in the pattern of a coordinate system basis, i.e. at (0,0,0), (1,0,0), (0,1,0),
(0,0,1). Let a, b, and c be the vectors from particle 1 to particles 2, 3, and 4, respectively.
Constrain the particles’ positions by requiring vectors a, b, and c to have length 1 and the
angle between each of the three pairs of vectors to be 90 degrees (the corresponding dot
products should be zero). (Notice, that this again gives four particles and six constraints.)

Articulated Bodies

It is possible to connect multiple rigid bodies by hinges, pin joints, and so on. Simply let two
rigid bodies share a particle, and they will be connected by a pin joint. Share two particles, and
they are connected by a hinge. See Figure 7.

It is also possible to connect two rigid bodies by a stick constraint or any other kind of
constraint – to do this, one simply adds the corresponding ‘fix-up’ code to the relaxation loop.

This approach makes it possible to construct a complete model of an articulated human body.
For additional realism, various angular constraints will have to be implemented as well. There
are different ways to accomplish this. A simple way is using stick constraints that are only
enforced if the distance between two particles falls below some threshold (mathematically, we
have a unilateral (inequality) distance constraint, |x2-x1|>100). As a direct result, the two
particles will never come too close to each other. See Figure 8.

Figure 8: Two stick

constraints and an
inequality
constraint (dotted).

Another method for restraining angles is to satisfy a dot product constraint:

Particles can also be restricted to move, for example, in certain planes only. Once again,
particles with positions not satisfying the above-mentioned constraints should be moved –
deciding exactly how is slightly more complicated that with the stick constraints.

Actually, in Hitman corpses aren’t composed of rigid bodies modeled by tetrahedrons. They are
simpler yet, as they consist of particles connected by stick constraints in effect forming stick
figures. See Figure 9. The position and orientation for each limb (a vector and a matrix) are
then derived for rendering purposes from the particle positions using various cross products and
vector normalizations (making certain that knees and elbows bend naturally).
Figure 9: The particle/stick
configuration used in Hitman to
represetn human anatomy.

In other words, seen isolated each limb is not a rigid body with the usual 6 degrees of freedom.
This means that physically the rotation around the length axis of a limb is not simulated.
Instead, the skeletal animation system used to setup the polygonal mesh of the character is
forced to orientate the leg, for instance, such that the knee appears to bend naturally. Since
rotation of legs and arms around the length axis does not comprise the essential motion of a
falling human body, this works out okay and actually optimizes speed by a great deal.

Angular constraints are implemented to enforce limitations of the human anatomy. Simple self
collision is taken care of by strategically introducing inequality distance constraints as discussed
above, for example between the two knees – making sure that the legs never cross.

For collision with the environment, which consists of triangles, each stick is modeled as a
capped cylinder. Somewhere in the collision system, a subroutine handles collisions between
capped cylinders and triangles. When a collision is found, the penetration depth and points are
extracted, and the collision is then handled for the offending stick in question exactly as
described in the beginning of Section 5.

Naturally, a lot of additional tweaking was necessary to get the result just right.

Comments

This section contains various remarks that didn’t fit anywhere else.

Motion control
To influence the motion of a simulated object, one simply moves the particles correspondingly.
If a person is hit at the shoulder, move the shoulder particle backwards over a distance
proportional to the strength of the blow. The Verlet integrator will then automatically set the
shoulder in motion.

This also makes it easy for the simulation to ‘inherit’ velocities from an underlying traditional
animation system. Simply record the positions of the particles for two frames and then give
them to the Verlet integrator, which then automatically continues the motion. Bombs can be
implemented by pushing each particle in the system away from the explosion over a distance
inversely proportional to the square distance between the particle and the bomb center.

It is possible to constrain a specific limb, say the hand, to a fixed position in space. In this way,
one can implement inverse kinematics (IK): Inside the relaxation loop, keep setting the position
of a specific particle (or several particles) to the position(s) wanted. Giving the particle infinite
mass (invmass=0) helps making it immovable to the physics system. In Hitman, this strategy is
used when dragging corpses; the hand (or neck or foot) of the corpse is constrained to follow
the hand of the player.

Handling friction
Friction has not been taken care of yet. This means that unless we do something more,
particles will slide along the floor as if it were made of ice. According to the Coulomb friction
model, friction force depends on the size of the normal force between the objects in contact. To
implement this, we measure the penetration depth dp when a penetration has occurred (before
projecting the penetration point out of the obstacle). After projecting the particle onto the
surface, the tangential velocity vt is then reduced by an amount proportional to dp (the
proportion factor being the friction constant). This is done by appropriately modifying x*. See
the Figure 10. Care should be taken that the tangential velocity does not reverse its direction –
in this case one should simply be set it to zero since this indicates that the penetration point
has seized to move tangentially. Other and better friction models than this could and should be
implemented.

Figure 10: Collision handling with friction (projection and

modification of tangential velocity).

Collision detection
One of the bottlenecks in physics simulation as presented here lies in the collision detection,
which is potentially performed several times inside the relaxation loop. It is possible, however,
to iterate a different number of times over the various constraints and still obtain good results.

In Hitman, the collision system works by culling all triangles inside the bounding box of the
object simulated (this is done using a octtree approach). For each (static, background) triangle,
a structure for fast collision queries against capped cylinders is then constructed and cached.
This strategy gave quite a speed boost.

To prevent objects that are moving really fast from passing through other obstacles (because of
too large time steps), a simple test if performed. Imagine the line (or a capped cylinder of
proper radius) beginning at the position of the object’s midpoint last frame and ending at the
position of the object’s midpoint at the current frame. If this line hits anything, then the object
position is set to the point of collision. Though this can theoretically give problems, in practice it
works fine.

Another collision ‘cheat’ is used for dead bodies. If the unusual thing happens that a fast
moving limb ends up being placed with the ends of the capped cylinder on each side of a wall,
the cylinder is projected to the side of the wall where the cylinder is connected to the torso.

Miscellaneous
The number of relaxation iterations used in Hitman vary between 1 and 10 with the kind of
object simulated. Although this is not enough to accurately solve the global system of
constraints, it is sufficient to make motion seem natural. The nice thing about this scheme is
that inaccuracies do not accumulate or persist visually in the system causing object drift or the
like – in some sense the combination of projection and the Verlet scheme manages to distribute
complex calculations over several frames (other schemes have to use further stabilization
techniques, like Baumgarte stabilization). Fortunately, the inaccuracies are smallest or even
nonexistent when there is little motion and greatest when there is heavy motion – this is nice
since fast or complex motion somewhat masks small inaccuracies for the human eye.

A kind of soft bodies can also be implemented by using ‘soft’ constraints, i.e., constraints that
are allowed to have only a certain percentage of the deviation ‘repaired’ each frame (i.e., if the
rest length of a stick between two particles is 100 but the actual distance is 60, the relaxation
code could first set the distance to 80 instead of 100, next frame 90, 95, 97.5 etc.).

As mentioned, we have purposefully refrained from using heavy mathematical notation in order
to reach an audience with a broader background. This means that even though the methods
presented are firmly based mathematically, their origins may appear somewhat vague or even
magical.

For the mathematically inclined, however, what we are doing is actually a sort of time-stepping
approach to solving differential inclusions (a variant of differential equations) using a simple
sort of interior-point algorithm (see [Stewart] where a similar approach is discussed). When
trying to satisfy the constraints, we are actually projecting the system state onto the manifold
described by the constraints. This, in turn, is done by solving a system of linear equations. The
linear equations or code to solve the constraints can be obtained by deriving the Jacobian of the
constraint functions. In this article, relaxation has been discussed as an implicit way of solving
the system. Although we haven’t touched the subject here, it is sometimes useful to change the
relaxation coefficient or even to use over-relaxation (see [Press] for an explanation). Since
relaxation solvers sometimes converge slowly, one might also choose to explicitly construct the
equation system and use other methods to solve it (for example a sparse matrix conjugate
gradient descent solver with preconditioning using the results from the previous frame (thereby
utilizing coherence)).

Note that the Verlet integrator scheme exists in a number of variants, e.g., the Leapfrog
integrator and the velocity Verlet integrator. Accuracy might be improved by using these.

Singularities (divisions by zero usually brought about by coinciding particles) can be handled by
slightly dislocating particles at random.

As an optimization, bodies should time out when they have fallen to rest. To toy with the
animation system for dead characters in Hitman: Codename 47, open the Hitman.ini file and
add the two lines “enableconsole 1” and “consolecmd ip_debug 1” at the bottom. Pointing the
cursor at an enemy and pressing shift+F9 will cause a small bomb to explode in his vicinity
sending him flying. Press K to toggle free-cam mode (camera is controlled by cursor keys, shift,
and ctrl).

Note that since all operations basically take place on the particle level, the algorithms should be
very suitable for vector processing (Playstation 2 for example).

Conclusion

This paper has described how a physics system was implemented in Hitman. The underlying
philosophy of combining iterative methods with a stable integrator has proven to be successful
and useful for implementation in computer games. Most notably, the unified particle-based
framework, which handles both collisions and contact, and the ability to trade off speed vs.
accuracy without accumulating visually obvious errors are powerful features. Naturally, there
are still many specifics that can be improved upon. In particular, the tetrahedron model for rigid
bodies needs some work. This is in the works.

At IO Interactive, we have recently done some experiments with interactive water and gas
simulation using the full Navier-Stokes equations. We are currently looking into applying
techniques similar to the ones demonstrated in this paper in the hope to produce faster and
more stable water simulation.

Acknowledgements

The author wishes to thank Jeroen Wagenaar for fruitful discussions and the entire crew at IO
Interactive for cooperation and for producing such a great working environment.
References

[Baraff] Baraff, David, Dynamic Simulation of Non-Penetrating Rigid Bodies, Ph.D. thesis, Dept.
of Computer Science, Cornell University, 1992.
http://www.cs.cmu.edu/~baraff/papers/index.html

[Mirtich] Mirtich, Brian V., Impulse-base Dynamic Simulation of Rigid Body Systems, Ph.D.
thesis, University of California at Berkeley, 1996.
http://www.merl.com/people/mirtich/papers/thesis/thesis.html

[Press] Press, William H. et al, Numerical Recipes, Cambridge University Press, 1993.
http://www.nr.com/nronline_switcher.html

[Stewart] Stewart, D. E., and J. C. Trinkle, “An Implicit Time-Stepping Scheme for Rigid Body
Dynamics with Inelastic Collisions and Coulomb Friction”, International Journal of Numerical
Methods in Engineering, to appear.
http://www.cs.tamu.edu/faculty/trink/Papers/ijnmeStewTrink.ps

[Verlet] Verlet, L. "Computer experiments on classical fluids. I. Thermodynamical properties of

Lennard-Jones molecules", Phys. Rev., 159, 98-103 (1967).

[Witkin] Witkin, Andrew and David Baraff, "Physically Based Modeling: Principles and Practice",
Siggraph ’97 course notes, 1997.
http://www.cs.cmu.edu/~baraff/sigcourse/index.html

________________________________________________________

Collision and contact handling by projection

Gama Network Presents:

Creating an Event-Driven Cinematic Camera,

Part Two

By Brian Hawkins
Gamasutra
January 10, 2003

URL: http://www.gamasutra.com/features/20030110/hawkins_01.htm

Once upon a time, it was a death wish for a game to be based on a movie license. However,
things have changed considerably in recent years. There have been a number of well done and
successful game titles based on movies, and on the flip side there have been several movies
released that had games as their origin. With the crossover between movies and games finally
starting to show some success, it is time to revisit how Hollywood can actually be helpful to the
game industry.

In the past century, motion pictures have developed a visual language that enhances the
storytelling experience. Equally important, audiences have grown accustomed to certain
conventions used to tell these visual stories. Unfortunately, very little of this knowledge has
been translated for use in interactive storytelling.

Last month, in Part One of this two-part series, we looked at how to describe a cinematic
camera shot in general terms so that it could be automatically converted to camera position and
orientation within the game. To conclude, this month’s article brings it all together by
presenting a system that can choose the best shots and connect them together. Once finished,
these concepts can be joined to form a complete basis for a cinematic experience that improves
the interactive storytelling of games by giving players access to the action within a game in
ways that make sense to them instinctively.

Film Crew

Major motion pictures are made by hundreds of different people all working together in a huge
team effort. To transfer the cinematic experience to the world of games, we can take certain
established, key roles from the film industry and translate them into entities in the computer.
Objects in object-oriented languages such as C++ can conveniently represent these entities. In
this article, we will look at the three primary roles and describe their responsibilities as objects.
From this, you can build architectures to coordinate real-time cinematic camera displays. Before
going into detail about each role, let’s take a brief look at each in turn.

The first job belongs to the director. In films, the director controls the scene and actors to
achieve the desired camera shots that will then be edited later. However, because our director
object will have little or no control over the game world, this responsibility shifts to determining
where good camera shots are available and how to take advantage of them.
Once these possibilities are collected, they are passed on to the editor who must decide which
shots to use. Unlike in motion pictures, however, the editor object must do this in real time as
each previous shot comes to an end. The editor is also responsible for choosing how to
transition between shots.

Finally, once the shot and transition have been decided upon, it becomes the cinematographer
object’s task to transform that information into actual camera position and movement within
the game world. With this basic idea of how all the roles and responsibilities fit together, we can
move on to a closer look at each individual role.

Through the Viewfinder: The Director

As mentioned previously, the director’s role in the game world is to collect information on
available shots and their suitability for inclusion in the final display. This is the one place where
human intervention is necessary, after which no more human input is necessary. It is currently
impossible to create a system sophisticated enough to determine the priority of events within
the game world from a creative standpoint.

Instead, programmers and scripters are given the ability to provide information about priority
and layout of interesting events, hence the term used in this article — event-driven cinematic
camera, through a suggestShot method on the director object. This information will then be
used by the editor for a final decision on which shots to include. Following is a breakdown of the
information necessary to make these decisions.

The first and most important piece of information is the priority of the shot. The priority
represents how interesting a particular shot is compared to other shots available at the time.
Thus the value of priority is relative, which means there is no definitive meaning for any
particular number. You must therefore be careful to remain consistent within a single game in
order to give the priority levels meaning. For example, all other values being equal, a shot with
a priority of two is twice as interesting as a shot with a priority of one.

The second piece of information required is the timing of the shot. Timing is the most complex
part of the editing process, and the sooner an event can be predicted, the better choices the
editor can make. Timing breaks down into four values: start time, estimated length, decay rate,
and end condition. The start time is obviously the beginning of the event. The estimated length
is a best guess at how long the shot will last. The decay rate determines how quickly the
priority decays once the event begins. Finally, the end condition determines when the shot
should be terminated. Let’s look at decay rate and end conditions in more detail.

The decay rate is used to determine the actual priority at a given time t using the starting
priority p and a constant, k. The constant is provided as part of the decay rate information,
since it will differ from shot to shot. The other information for decay rate is the equation to use
for determining the actual priority. For maximum flexibility, this should be a function object that
takes t, p, k, and the start time, ts, and returns the priority for that time. Two useful functions
that should be predefined as function objects for this parameter are:

These functions should suffice for most circumstances. Notice that the second equation cubes
the value rather than squaring it. This is important, because it ensures that the priority remains
negative after a certain amount of time has passed, whereas squaring would have caused the
result to always remain positive. Figure 1 shows the resulting graphs of these functions as a
visual aid for understanding how decay rate affects priority.
Figure 1. Decay rate graph, showing how decay rate
affects shot priority.

The end condition is best specified as a function object that returns one of three values. The
first value indicates the shot cannot be terminated yet, the second value indicates the shot can
be terminated if another shot is ready, and the third value indicates that the shot must be
terminated. The reason for the middle value is that it gives the editor more flexibility in
choosing shots by allowing a choice of new shots within a certain time, rather than
instantaneously when the shot is terminated.

Next comes the shot information. This is all the information needed by the cinematographer to
change the shot from a suggestion into a real in-game shot. This includes information such as
the primary actor and secondary actor, if any. In addition, the shot size, emphasis, angle, and
height may be necessary. Refer to last month’s article for more information on determining this
information as well as the following scene information.

The scene information consists of the actors within the given scene and the current line of
action for that scene. Unfortunately, scene information can change dynamically as actors move
around and the cinematographer changes the line of action. Because of this fact, it is best to
store the scene as a reference through the primary actor of the shot that is being suggested.

The director’s other responsibilities are to provide the editor with a list of currently available
shots at any time and to ensure that this list is up-to-date. Keeping the list up-to-date primarily
involves removing shots that are no longer valid. A shot becomes invalid when the priority
modified by decay rate, as discussed previously, falls below zero. Once the editor chooses a
shot, it is also removed from the list of shots. This brings us to a discussion of how the editor
chooses a shot.

Slice and Dice: The Editor

The editor is responsible for choosing the next shot that will be shown as well as any transitions
between shots. First, let’s look at the process of choosing the next shot. The majority of the
information needed is provided with the shot suggestions from the director, but there are
parameters that can be used to give each editor its own style. The two parameters involved in
shot decisions are the desired shot length, lshot, and the desired scene length, lscene. By setting
these for different editors, the shots chosen will vary for the same set of circumstances. For
example, one editor could prefer short scenes filled with one or two long shots by setting the
shot time and the scene time to be relatively close values. On the other hand, another editor
could prefer longer scenes filled with short shots. This provides a number of options when
choosing an editor for a particular situation.

The time for choosing the next shot is determined by a change in the return value of the end
condition for the current shot. Once the current shot indicates that it can be terminated, the
editor must obtain the list of currently available shots from the director. From this list, the
editor object then filters out any shots whose start time is too far in the future. If the end
condition requires immediate termination, this excludes all shot suggestions whose start time is
not the current time or whose start time has not already passed. Otherwise, all shots whose
start time is no more than lshot beyond the current time are considered.

To choose the shot from this list, we must sort them based on a value that represents the
quality of each shot suggestion and then take the shot with the highest value. Before we can
compute this value, we need to introduce a few other values that will be used in its calculation.
First, we consider the desired shot length versus the estimated shot length, lestimated:

Then we look to see if the actors have any relation to those in the last shot:

Next, we check to see if the new scene matches the old scene. For this the editor must also
keep track of the time spent in the current scene, tscene:

Finally, the priority is modified by the decay rate discussed earlier if the shot event has already
commenced:

Once we have all this information, we can compute the quality value of each shot on the list:

Notice that the values cactor and cscene allow us to maintain consistency for our shots. This is a
very important property of good film directing and editing and should not be overlooked in
interactive cinematography, even though it is more difficult to maintain.

You may have also noticed that when calculating pω(t) that t can be before ts, thus it is possible
under some circumstances to choose a shot that has not started yet. In this case, we hold on to
the shot and wait for one of two events: either the shot start time occurs or the end condition
of the current shot forces us to terminate. Upon the occurrence of either event, we must once
again test to see which is the best shot, in case a better shot has come along or we are being
forced to move on before the shot we would like to display can start.

Now that an ordering exists that allows us to choose the next shot, the only remaining choice
necessary is the transition from the current shot to the new shot. If we are transitioning
between different scenes, the choice is easy, a cut or fade should be used. However, if the
transition is between two shots in the same scene, the logic becomes slightly more complex.
Within a scene it is important to maintain the line of action; in other words, to keep the camera
on one side of a plane defined for the scene so as not to confuse the viewer’s perception of
scene orientation.

Let’s consider the various permutations that can occur between shots and what type of
transition should be used. For now, we will break them into fading (think of cutting as an
instantaneous fade) and camera movement.We will go into more detail on moving the camera
later. First, if the actors remain the same between the shots, then we can preserve the line of
action and use a fade. Likewise, even if the actors change but the new line of action lies on the
same side of the line of action as the previous camera position, then a fade can still be used.

However, if the two lines of action differ significantly, then a camera move needs to be
performed. The camera move should allow the line of action to change without confusing the
viewer. To get a rough approximation of the distance the camera must travel, compare the
distances between the current and new camera positions and the current and new focal points.
Now compute how fast the camera must move to travel that distance in the time it would take
for the new shot to become uninteresting:

Where ∆c is the vector between camera positions, ∆f is the vector between focal points, and p(t)
is the priority decay formula for the shot.

If the camera move cannot be made at a reasonable speed, then a new shot must be chosen,
unless the actors from the last shot would not be visible in the pending shot. Otherwise, a new
shot should be chosen with preference for close-ups that include only one actor, thus making
the next transition easier. We can now move on to realizing the shot and transition. For the
decay formulas given earlier, t would be tstart + 1/k.

Lights, Camera, Action: The Cinematographer

Last month, we covered the math necessary to turn a description of a shot into actual camera
position and orientation. This month, we will build on that and flesh out the role of the
cinematographer by covering the handling of transitions.

The simplest transition is the cut, where we only need to change the camera position and
orientation to a new position and orientation. Only slightly more complex is the fade, which
provides a two-dimensional visual effect between two camera views. When fading, it is
important to decide whether to fade between two static images or allow the world to continue
to simulate while the fade occurs. Allowing continued simulation implies rendering two scenes
per frame but eliminates the need for pauses in gameplay. If you are able to handle the extra
rendering, interesting fade patterns can be achieved by using complementary masks when
rendering each scene. Depending on the hardware available for rendering, you may only be
able to do black and white masks, or you could go all the way to alpha-value masks.
Figure 2. Shot transition criteria, where re is the radius of
acceptable error.

The other group of transitions involves moving the camera. The three transitions we will
consider are pan, zoom, and crane. The decision of which move to make depends on the
camera and focal positions for the two shots. Figure 2 shows the various situations that lead to
the choice of a particular shot. The pan is used if the camera is in approximately the same
location for both shots and only the focal point moves. Though this happens rarely in an
interactive environment, when it does happen the old camera position can be kept and only the
orientation needs to be animated to the new orientation. Similarly, the conditions for zooming
are fairly uncommon, as both the camera positions and focal points must lie close to the same
line, but when it does occur the camera field-of-view can be used to allow a much more
interesting transition than a simple camera move.

Finally, we come to the most complex transition, the crane. The best method for creating a
crane move is often by borrowing the services of the AI’s path-planning algorithm in order to
avoid moving the camera through objects. It is best if the path planning also handles
orientation, as this will lead to better results than interpolating between the focal points.

Unfortunately, getting crane shots to look their best is a complex process for which this is only
a starting point. If you do not have the time to invest in making them work, you may wish to
leave them out altogether.

Beyond the Basics

You now have enough information to create your own basic cinematic system to include in your
game. There is plenty of room to go beyond this basic system. Research on some of these
areas has already been conducted in academic circles. For instance, events that involve
conversations between characters could be specified as a single suggestion rather than
manually suggesting each individual shot during the discourse. “The Virtual Cinematographer”
and “Real-time Cinematic Camera Control for Interactive Narratives” (see For More Information)
describe how director styles can be created to specify camera shots automatically for these
situations. This reduces human involvement, which is always important as it allows other
features to be added to the game.

Another important aspect of cinematography that is only now becoming possible with the power
of newer graphics h