Creating Input Layouts in Direct3D 10
Creating Input Layouts in Direct3D 10
Developers familiar with Direct3D 9 will discover a series of functional enhancements and performance
improvements in Direct3D 10, including:
The ability to process entire primitives (with adjacency), amplify and de-amplify data in the new
geometry shader stage.
The ability to output pipeline-generated vertex data to memory using the stream output stage.
New objects and paradigms provided to minimize CPU overhead spent on validation and
processing in the runtime and driver.
o The ability to perform per-primitive material swapping and setup using a geometry
shader.
New resource types (including shader-indexable arrays of textures) and resource formats.
A full set of required functionality: legacy hardware capability bits (caps) have been removed in
favor of a rich set of guaranteed functionality. To enable this and other design improvements,
the Direct3D 10 API only targets Direct3D 10-class hardware and later.
Layered Runtime - The Direct3D 10 API is constructed with layers, starting with the basic
functionality at the core and building optional and developer-assist functionality (debug, etc.) in
outer layers.
Full HLSL integration - All Direct3D 10 shaders are written in HLSL and implemented with the
common shader core.
388
An increase in the number of render targets, textures, and samplers. There is also no shader
length limit.
o Multisample Alpha-to-Coverage
There are additional behavioral differences that Direct3D 9 developers should also be aware of. For a
more complete list, refer to Direct3D 9 to Direct3D 10 Considerations (Direct3D 10)
[http://msdn2.microsoft.com/en-us/library/bb205073.aspx].
389
State Objects (Direct3D 10)
In Direct3D 10, device state is grouped into state objects which greatly reduce the cost of state changes.
There are several state objects, and each one is designed to initialize a set of state for a particular
pipeline stage. You can create up to 4096 of each type of state object.
Input-Layout State
This group of state (see D3D10_INPUT_ELEMENT_DESC) dictates how the input assembler stage reads
data out of the input buffers and assembles it for use by the vertex shader. This includes state such as
the number of elements in the input buffer and the signature of the input data. The input-assembler
stage is a new stage in the pipeline whose job is to stream primitives from memory into the pipeline.
Rasterizer State
This group of state (see D3D10_RASTERIZER_DESC) initializes the rasterizer stage. This object includes
state such as fill or cull modes, enabling a scissor rectangle for clipping, and setting multisample
parameters. This stage rasterizes primitives into pixels, performing operations like clipping and mapping
primitives to the viewport.
Depth-Stencil State
This group of state (see D3D10_DEPTH_STENCIL_DESC) initializes the depth-stencil portion of the
output-merger stage. More specifically, this object initializes depth and stencil testing.
Blend State
This group of state (see D3D10_BLEND_DESC) initializes the blending portion of the output-merger
stage.
Sampler State
This group of state (see D3D10_SAMPLER_DESC) initializes a sampler object. A sampler object is used by
the shader stages to filter textures in memory.
In Direct3D 10, the sampler object is no longer bound to a specific texture - it just describes how to do
filtering given any attached resource.
390
Performance Considerations
Designing the API to use state objects creates several performance advantages. These include validating
state at object creation time, enabling caching of state objects in hardware, and greatly reducing the
amount of state that is passed during a state-setting API call (by passing a handle to the state object
instead of the state).
To achieve these performance improvements, you should create your state objects when your
application starts up, well before your render loop. State objects are immutable, that is, once they are
created, you cannot change them. Instead you must destroy and recreate them. To cope with this
restriction, you can create up to 4096 of each type of state objects. For example, you could create
several sampler objects with various sampler-state combinations. Changing the sampler state is then
accomplished by calling the appropriate Set API which passes a handle to the object (as opposed to the
sampler state). This significantly reduces the amount of overhead during each render frame for changing
state since the number of calls and the amount of data are greatly reduced.
Alternatively, you can choose to use the effect system which will automatically manage efficient
creation and destruction of state objects for your application.
391
API Layers (Direct3D 10)
The Direct3D 10 runtime is constructed with layers, starting with the basic functionality at the core and
building optional and developer-assist functionality in outer layers.
Core Layer
The core layer exists by default; providing a very thin mapping between the API and the device driver,
minimizing overhead for high-frequency calls. As the core layer is essential for performance, it only
performs critical validation.
The remaining layers are optional. As a general rule, layers add functionality, but do not modify existing
behavior. For example, core functions will have the same return values independent of the debug layer
being instantiated, although additional debug output may be provided if the debug layer is instantiated.
Create layers when a device is created by calling D3D10CreateDevice and supplying one or more
D3D10_CREATE_DEVICE_FLAG values.
Debug Layer
The debug layer provides extensive additional parameter and consistency validation (such as validating
shader linkage and resource binding, validating parameter consistency, and reporting error
descriptions). The output generated by the debug layer consists of a queue of strings which are
accessible using the ID3D10InfoQueue Interface (see Customize Debug Output with ID3D10InfoQueue
(Direct3D 10)). Errors generated by the core layer are highlighted with warnings by the debug layer.
To create a device that supports the debug layer, you must install the DirectX SDK (to get
D3D10SDKLayers.DLL), and then specify the D3D10_CREATE_DEVICE_DEBUG flag when calling
D3D10CreateDevice. Of course, running an application with the debug layer will be substantially slower.
Switch-to-Reference Layer
This layer enables an application to transition between a hardware device (HAL) and a reference or
software device (REF). To switch a device, you must first create a device that supports the switch-to-
reference layer (see D3D10CreateDevice) and then call ID3D10SwitchToRef::SetUseRef.
All device state, resources and objects are maintained through this device transition. However, it is
sometimes not possible to exactly match resource data; this is especially true with multisampled
resources. This is due to the fact that some information is available to a HAL device that is not available
to a REF device.
Virtually all real-time applications use the HAL implementation of the pipeline. When the pipeline is
switched from a hardware device to a reference device, rendering operations are done simultaneously
in both hardware and software. As the software device is rendering, it will require that resources are
downloaded to system memory. This may require other resources cached in system memory to be
evicted to make room. In general, this is not a problem except in the case of multisampled resources.
392
Since multisampling can be hardware specific it can be difficult to match exactly the results of
multisampled resources between a HAL and REF implementation.
Thread-Safe Layer
This layer is designed to allow multi-threaded applications to access the device from multiple threads.
Direct3D 10 enables an application to exercise explicit control over the device synchronization primitive
with device functions that can be invoked at any time over the lifetime of the device, including enabling
and disabling the use of the critical section (temporarily enabling/disabling multithread protection), and
a means to take and release the critical section lock and thereby hold the lock over multiple Direct3D 10
API entrypoints.
This layer is enabled by default, but if not present has no performance impact on single-thread accessed
devices.
393
Customize Debug Output with
ID3D10InfoQueue (Direct3D 10)
The information queue is managed by an interface (see ID3D10InfoQueue Interface) that stores,
retrieves, and filters debug messages. The queue consists of: a message queue, an optional storage filter
stack, and a optional retrieval filter stack. The storage-filter stack can be used to filter the messages you
want stored; the retrieval-filter stack can be used to filter the messages you want stored. Once you have
filtered a message, the message will be printed out to the debug window and stored in the appropriate
stack.
In general:
Call ID3D10InfoQueue::GetMessage is used to get messages (that pass an optional retrieval filter).
Registry Controls
Use registry keys to adjust filter settings, adjust break points, and to mute the debug output. The debug
layer will check these paths for registry keys; the first path found will be used.
1. HKCU\Software\Microsoft\Direct3D\<user-defined subkey>
2. HKLM\Software\Microsoft\Direct3D\<user-defined subkey>
3. HKCU\Software\Microsoft\Direct3D
Where:
DWORD Mute_ID_* - Message name or number can be used for * (just like for BreakOn_ID_*
described earlier). Debug output is disabled if this key is non-zero.
394
DWORD Unmute_SEVERITY_INFO - Debug output is ENABLED if this key is non-zero. By default
when InfoQueueStorageFilterOverride is enabled, debug messages with severity INFO are
muted - therefore for this key allows INFO to be turned back on.
These controls change whether a message is recorded or displayed; they do not affect whether an API
passes or fails.
BreakOn_CATEGORY_* - Break on any message passing through the storage filters. * is one of
the D3D10_MESSAGE_CATEGORY messages.
BreakOn_SEVERITY_* - Break on any message passing through the storage filters. * is one of the
D3D10_MESSAGE_SEVERITY_ messages.
BreakOn_ID_* - Break on any message passing through the storage filters. * is one of the
D3D10_MESSAGE_ID_ messages or can be the numerical value of the error enum. For example,
Suppose the message with ID "D3D10_MESSAGE_ID_HYPOTHETICAL" had the value 123 in the
D3D10_MESSAGE_ID enum. In this case, creating the value BreakOn_ID_HYPOTHETICAL=1 or
BreakOn_ID_123=1 would both accomplish the same thing - break when a message having ID
D3D10_MESSAGE_ID_HYPOTHETICAL is encountered.
395
Reference Counting (Direct3D 10)
Direct3D10 pipeline Set functions do not hold a reference to the DeviceChild objects. This means that
each application must hold a reference to the DeviceChild object for as long as the object needs to be
bound to the pipeline. When the reference count of an object drops to zero, the object will be unbound
from the pipeline and destroyed. This style of reference holding is also known as weak-reference holding
because each pipeline binding location holds a weak reference to the interface/object that is bound to
it.
For example:
pDevice->RSGetState( &pCurRasterizerState );
// pCurRasterizerState will be equal to pRasterizerState.
pCurRasterizerState->Release();
pRasterizerState->Release();
// Since app released the final ref on this object, it is unbound.
pDevice->GetRasterizerState( &pCurRasterizerState );
// pCurRasterizerState will be equal to NULL.
In Direct3D 9, pipeline Set functions hold a reference to the device's objects; in Direct3D10 pipeline Set
functions do not hold a reference to the DeviceChild objects.
396
Pipeline Stages (Direct3D 10)
The Direct3D 10 programmable pipeline is designed for generating graphics for realtime gaming
applications. The conceptual diagram below illustrates the data flow from input to output through each
of the programmable stages.
All of the stages are configurable via the Direct3D 10 API. Stages featuring common shader cores (the
rounded rectangular blocks) are programmable using the HLSL programming language. As you will see,
this makes the pipeline extremely flexible and adaptable. The purpose of each of the stages is listed
below.
Input Assembler Stage - The input assembler stage is responsible for supplying data (triangles,
lines and points) to the pipeline.
Vertex Shader Stage - The vertex shader stage processes vertices, typically performing
operations such as transformations, skinning, and lighting. A vertex shader always takes a single
input vertex and produces a single output vertex.
397
Geometry Shader Stage - The geometry shader processes entire primitives. Its input is a full
primitive (which is three vertices for a triangle, two vertices for a line, or a single vertex for a
point). In addition, each primitive can also include the vertex data for any edge-adjacent
primitives. This could include at most an additional three vertices for a triangle or an additional
two vertices for a line. The Geometry Shader also supports limited geometry amplification and
de-amplification. Given an input primitive, the Geometry Shader can discard the primitive, or
emit one or more new primitives.
Stream Output Stage - The stream output stage is designed for streaming primitive data from
the pipeline to memory on its way to the rasterizer. Data can be streamed out and/or passed
into the rasterizer. Data streamed out to memory can be recirculated back into the pipeline as
input data or read-back from the CPU.
Rasterizer Stage - The rasterizer is responsible for clipping primitives, preparing primitives for
the pixel shader and determining how to invoke pixel shaders.
Pixel Shader Stage - The pixel shader stage receives interpolated data for a primitive and
generates per-pixel data such as color.
Output Merger Stage - The output merger stage is responsible for combining various types of
output data (pixel shader values, depth and stencil information) with the contents of the render
target and depth/stencil buffers to generate the final pipeline result.
398
Input-Assembler Stage (Direct3D 10)
The Direct3D 10 API separates functional areas of the pipeline into stages; the first stage in the pipeline
is the input-assembler (IA) stage.
The purpose of the input-assembler stage is to read primitive data (points, lines and/or triangles) from
user-filled buffers and assemble the data into primitives that will be used by the other pipeline stages.
The IA stage can assemble vertices into several different primitive types (such as line lists, triangle strips,
or primitives with adjacency). New primitive types (such as a line list with adjacency or a triangle list
with adjacency) have been added to support the geometry shader.
While assembling primitives, a secondary purpose of the IA is to attach system-generated values to help
make shaders more efficient. System-generated values are text strings that are also called semantics. All
three shader stages are constructed from a common shader core, and the shader core uses system-
generated values (such as a primitive id, an instance id, or a vertex id) so that a shader stage can reduce
processing to only those primitives, instances, or vertices that have not already been processed.
As shown in the pipeline block diagram, once the IA stage reads data from memory (assembles the data
into primitives and attaches system-generated values), the data is output to the vertex shader stage.
Create Input Buffers - Create and initialize input buffers with input vertex data.
Create the Input-Layout Object - Define how the vertex buffer data will be streamed into the IA
stage using an input-layout object.
Binding Objects to the Input-Assembler Stage - Bind the created objects (input buffers and the
input-layout object) to the IA stage.
Specify the Primitive Type - Identify how the vertices will be assembled into primitives.
Draw APIs - Send the data bound to the IA stage through the pipeline.
399
Create Input Buffers
There are two types of input buffers: vertex buffers and index buffers. Vertex buffers supply vertex data
to the IA stage. Index buffers are optional; they provide indices to vertices from the vertex buffer. You
may create one or more vertex buffers, and optionally an index buffer. For help getting started creating
these input resources, see the following guides:
Once the buffer resources are created, you need to create an input-layout object to describe the data
layout to the IA stage, and then you need to bind the buffer resources to the IA stage.
Creating and binding buffers is not necessary if your shaders doesn't buffers. Here is an example of a
simple vertex and pixel shader that draw a single triangle (see Using the Input-Assembler Stage without
Buffers (Direct3D 10)).
An input-layout object is created from an array of input-element descriptions as well as a pointer to the
compiled shader (see ID3D10Device::CreateInputLayout). The array contains one or more input
elements; each input element describes a single vertex-data element from a single vertex buffer. The
entire description describes all the vertex-data elements from all the vertex buffers that will be bound to
the IA stage. For example, the following layout description describes a single vertex buffer that contains
three vertex-data elements:
D3D10_INPUT_ELEMENT_DESC layout[] =
{
{ L"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0,
D3D10_INPUT_PER_VERTEX_DATA, 0 },
{ L"TEXTURE0", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12,
D3D10_INPUT_PER_VERTEX_DATA, 0 },
{ L"NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 1, 20,
D3D10_INPUT_PER_VERTEX_DATA, 0 },
};
Each input-element description describes a single element of vertex-buffer data, including its size, type,
location, and purpose. This declaration contains two elements, corresponding to two rows. Each row is
400
enclosed in curly braces to make it easier to read. The first row or element declares per-vertex position
data, the second row or element declares per-vertex color data.
Each row identifies the next element to be read from a vertex buffer. In each row, you can identify the
type of data by using the semantic, the semantic index, and the data format. A semantic is a text string
that identifies how the data will be used. In this example, the first row identifies 3-component position
data (XYZ, for example); the second row identifies 2-component texture data (UV, for example); and the
third row identifies 3-component color data (RGB, for example).
In this example description, the semantic index (which is the second parameter) is set to zero for all
three rows. The semantic index helps distinguish between two rows that use the same semantics. Since
there are no similar semantics in this example declaration, the semantic index can be set to its default
value, zero.
The third parameter is the format. The format (see DXGI_FORMAT) specifies the number of components
per element, and the data type, which defines the size of the data for each element. The format can be
fully typed at resource creation time, or you may create a resource using a typeless format, which
identifies the number of components in an element, but leaves the data type undefined.
Input Slots
Data enters the IA stage through inputs called input slots. The IA stage has n input slots, which are
designed to accommodate up to n vertex buffers that provide input data. Each vertex buffer must be
assigned to a different slot; this information is stored in the input-layout declaration when the input-
layout object is created. Since the slots are zero-based, the first slot is slot 0. You may also specify an
offset, from the start of each buffer to the first element in the buffer to be read.
The next two parameters are the input slot and the input offset. The input slot is the number IA stage
input. When you use multiple buffers, you can bind them to one or more input slots. The input offset is
the number of bytes between the start of the buffer and the beginning of the data.
401
1, // the number of buffers in the array
&g_pVertexBuffer, // the array of vertex buffers
&stride, // array of stride values, one for each buffer
&offset ); // array of offset values, one for each buffer
In the preceding example, a single vertex buffer was bound; however, multiple vertex buffers can be
bound with a single call to ID3D10Device::IASetVertexBuffers, and the following code shows such a call
to bind three vertex buffers:
UINT strides[3];
strides[0] = sizeof(SimpleVertex1);
strides[1] = sizeof(SimpleVertex2);
strides[2] = sizeof(SimpleVertex3);
UINT offsets[3] = { 0, 0, 0 };
g_pd3dDevice->IASetVertexBuffers(
0, //first input slot for binding
3, //number of buffers in the array
&g_pVertexBuffers, //array of three vertex buffers
&strides, //array of stride values, one for each buffer
&offsets ); //array of offset values, one for each buffer
g_pd3dDevice->IASetPrimitiveTopology( D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST
);
Draw APIs
After input resources have been bound to the pipeline, an application calls a draw API to render
primitives. There are several draw APIs, which are shown in the following table; some use index buffers,
some use instance data, and some reuse data from the streaming-output stage as input to the input-
assembler stage.
402
Draw API Description
403
Primitive Types (Direct3D 10)
The input-assembler stage reads data from vertex and index buffers, assembles the data into primitives,
and then sends the data to the remaining pipeline stages. The IA uses three pieces of information to
fully specify primitives: a vertex, a winding direction, and a leading vertex.
Indicates the vertex order when assembling a primitive. The winding direction can
Winding
be either clockwise or counterclockwise; specify this to Direct3D by calling
Direction
ID3D10Device::CreateRasterizerState.
Leading
The first vertex in a sequence of vertices.
Vertex
Number of
n The number of vertices in the assembled primitive.
Vertices
The following diagram shows how these three pieces of information can be combined to assemble the
primitive types supported by Direct3D.
404
Use ID3D10Device::IASetPrimitiveTopology to specify the primitive type of the data that will be
streamed into the input-assembler stage.
405
Using the Input-Assembler Stage without
Buffers (Direct3D 10)
Creating and binding buffers is not necessary if your shaders doesn't buffers. Here is an example of a
simple vertex and pixel shader that draw a single triangle.
Vertex Shader
For example, you could declare a vertex shader that creates its own vertices.
struct VSIn
{
uint vertexId : SV_VertexID;
};
struct VSOut
{
float4 pos : SV_Position;
float4 color : color;
};
if (input.vertexId == 0)
output.pos = float4(0.0, 0.5, 0.5, 1.0);
else if (input.vertexId == 2)
output.pos = float4(0.5, -0.5, 0.5, 1.0);
else if (input.vertexId == 1)
output.pos = float4(-0.5, -0.5, 0.5, 1.0);
return output;
}
struct PSIn
{
float4 pos : SV_Position;
linear float4 color : color;
};
406
struct PSOut
{
float4 color : SV_Target;
};
output.color = input.color;
return output;
}
Technique
VertexShader vsCompiled = CompileShader( vs_4_0, VSmain() );
technique10 t0
{
pass p0
{
SetVertexShader( vsCompiled );
SetGeometryShader( NULL );
SetPixelShader( CompileShader( ps_4_0, PSmain() ));
}
}
Application Code
m_pD3D10Device->IASetInputLayout( NULL );
m_pD3D10Device->IASetPrimitiveTopology( D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST
);
m_pD3D10Device->Draw( 3, 0 );
407
Using System-Generated Values
System-generated values are generated by the IA stage (based on user-supplied input semantics) to
allow certain efficiencies in shader operations. By attaching data, such as an instance id (visible to VS), a
vertex id (visible to VS), or a primitive id (visible to GS/PS), a subsequent shader stage may look for these
system values to optimize processing in that stage. For instance, the VS stage may look for the instance
id to grab additional per-vertex data for the shader or to perform other operations; the GS and PS stages
may use the primitive id to grab per-primitive data in the same way.
VertexID
A VertexID is used by each shader stage to identify each vertex. It is a 32-bit unsigned integer whose
default value is 0. It is assigned to a vertex when the primitive is processed by the IA stage. Attach the
vertex-id semantic to the shader input declaration to inform the IA stage to generate a per-vertex id.
The IA will add a vertex id to each vertex for use by shader stages. For each draw call, the vertex id is
incremented by 1. Across indexed draw calls, the count resets back to the start value. For DrawIndexed
and DrawIndexedInstanced, the vertex id represents the index value. If the vertex id overflows (exceeds
232), it wraps to 0.
For all primitive types, vertices have a vertex id associated with them (regardless of adjacency).
PrimitiveID
A PrimitiveID is used by each shader stage to identify each primitive. It is a 32-bit unsigned integer
whose default value is 0. It is assigned to a primitive when the primitive is processed by the IA stage. To
inform the IA stage to generate a primitive id, attach the primitive-id semantic to the shader input
declaration.
The IA stage will add a primitive id to each primitive for use by the geometry shader or the pixel shader
stage (whichever is the first stage active after the IA stage). For each indexed draw call, the primitive id
is incremented by 1, however, the primitive id resets to 0 whenever a new instance begins. All other
draw calls do not change the value of the instance id. If the instance id overflows (exceeds 232), it wraps
to 0.
The pixel shader stage does not have a separate input for a primitive id; however, any pixel shader input
that specifies a primitive id uses a constant interpolation mode.
There is no support for automatically generating a primitive id for adjacent primitives. For primitive
types with adjacency, such as a triangle strip with adjacency, a primitive id is only maintained for the
interior primitives (the non-adjacent primitives), just like the set of primitives in a triangle strip without
adjacency.
408
InstanceID
An InstanceID is used by each shader stage to identify the instance of the geometry that is currently
being processed. It is a 32-bit unsigned integer whose default value is 0.
The IA stage will add an instance id to each vertex if the vertex shader input declaration includes the
instance id semantic. For each indexed draw call, instance id is incremented by 1. All other draw calls do
not change the value of instance id. If instance id overflows (exceeds 232), it wraps to 0.
Example
The following example shows how system values are attached to an instanced triangle strip in the IA
stage.
These tables show the system values generated for both instances of the same triangle strip. The first
instance (instance U) is shown in blue, the second instance (instance V) is shown in green. The solid lines
connect the vertices in the primitives, the dashed lines connect the adjacent vertices.
The following tables show the system-generated values for the instance U.
Vertex Data C,U D,U E,U F,U G,U H,U I,U J,U K,U L,U
PrimitiveID 0 1 2
VertexID 0 1 2 3 4 5 6 7 8 9
InstanceID 0 0 0
InstanceID 0 0 0 0 0 0 0 0 0 0
The following tables show the system-generated values for the instance V.
Vertex Data C,V D,V E,V F,V G,V H,V I,V J,V K,V L,V
PrimitiveID 0 1 2
VertexID 0 1 2 3 4 5 6 7 8 9
InstanceID 1 1 1
InstanceID 1 1 1 1 1 1 1 1 1 1
409
The input assembler generates the each of the ids (vertex, primitive, and instance). Notice also that each
instance is given a unique instance id. The data ends with the strip cut, which separates each instance of
the triangle strip.
410
Shader Stages (Direct3D 10)
The Direct3D 10 pipeline contains 3 programmable-shader stages (shown as the rounded blocks in the
pipeline functional diagram):
Each shader stage exposes its own unique functionality, built on the shader model 4.0 common shader
core.
Vertex-Shader Stage
The vertex-shader (VS) stage processes vertices from the input assembler, performing per-vertex
operations such as transformations, skinning, morphing, and per-vertex lighting. Vertex shaders always
operate on a single input vertex and produce a single output vertex. The vertex shader stage must
always be active for the pipeline to execute. If no vertex modification or transformation is required, a
pass-through vertex shader must be created and set to the pipeline.
Each vertex shader input vertex can be comprised of up to 16 32-bit vectors (up to 4 components each)
and each output vertex can be comprised of as many as 16 32-bit 4-component vectors. All vertex
shaders must have a minimum of one input and one output, which can be as little as one scalar value.
The vertex-shader stage can consume two system generated values from the input assembler: VertexID
and InstanceID (see System Values and Semantics). Since VertexID and InstanceID are both meaningful
at a vertex level, and IDs generated by hardware can only be fed into the first stage that understands
them, these ID values can only be fed into the vertex-shader stage.
Vertex shaders are always run on all vertices, including adjacent vertices in input primitive topologies
with adjacency. The number of times that the vertex shader has been executed can be queried from the
CPU using the VSInvocations pipeline statistic.
A vertex shader can perform load and texture sampling operations where screen-space derivatives are
not required (using HLSL intrinsic functions samplelevel, samplecmplevelzero, samplegrad).
Geometry-Shader Stage
The geometry-shader (GS) stage runs application-specified shader code with vertices as input and the
ability to generate vertices on output. Unlike vertex shaders, which operate on a single vertex, the
geometry shader's inputs are the vertices for a full primitive (two vertices for lines, three vertices for
triangles, or single vertex for point). Geometry shaders can also bring in the vertex data for the edge-
adjacent primitives as input (an additional two vertices for a line, an additional three for a triangle). The
image below shows a triangle and a line with adjacent vertices.
411
TV Triangle vertex
AV Adjacent vertex
LV Line vertex
The geometry-shader stage can consume the SV_PrimitiveID system-generated value that is auto-
generated by the IA. This allows per-primitive data to be fetched or computed if desired.
The geometry-shader stage is capable of outputting multiple vertices forming a single selected topology
(GS stage output topologies available are: tristrip, linestrip, and pointlist). The number of primitives
emitted can vary freely within any invocation of the geometry shader, though the maximum number of
vertices that could be emitted must be declared statically. Strip lengths emitted from a geometry shader
invocation can be arbitrary, and new strips can be created via the RestartStrip HLSL intrinsic function.
Geometry shader output may be fed to the rasterizer stage and/or to a vertex buffer in memory via the
stream output stage. Output fed to memory is expanded to individual point/line/triangle lists (exactly as
they would be passed to the rasterizer).
When a geometry shader is active, it is invoked once for every primitive passed down or generated
earlier in the pipeline. Each invocation of the geometry shader sees as input the data for the invoking
primitive, whether that is a single point, a single line, or a single triangle. A triangle strip from earlier in
the pipeline would result in an invocation of the geometry shader for each individual triangle in the strip
(as if the strip were expanded out into a triangle list). All the input data for each vertex in the individual
primitive is available (i.e. 3 vertices for triangle), plus adjacent vertex data if applicable/available.
A geometry shader outputs data one vertex at a time by appending vertices to an output stream object.
The topology of the streams is determined by a fixed declaration, choosing one of: PointStream,
LineStream, or TriangleStream as the output for the GS stage. There are three types of stream objects
available, PointStream, LineStream and TriangleStream which are all templated objects. The topology of
the output is determined by their respective object type, while the format of the vertices appended to
412
the stream is determined by the template type. Execution of a geometry shader instance is atomic from
other invocations, except that data added to the streams is serial. The outputs of a given invocation of a
geometry shader are independent of other invocations (though ordering is respected). A geometry
shader generating triangle strips will start a new strip on every invocation.
Partially completed primitives could be generated by the geometry shader if the geometry shader ends
and the primitive is incomplete. Incomplete primitives are silently discarded. This is similar to the way
the IA treats partially completed primitives.
The geometry shader can perform load and texture sampling operations where screen-space derivatives
are not required (samplelevel, samplecmplevelzero, samplegrad).
Fur/Fin Generation
Per-Primitive Material Setup - Including generation of barycentric coordinates as primitive data so that a
pixel shader can perform custom attribute interpolation (for an example of higher-order normal
interpolation, see CubeMapGS Sample).
Pixel-Shader Stage
The pixel-shader (PS) stage is invoked by the rasterizer stage, to calculate a per-pixel value for each pixel
in a primitive that gets rendered. A pixel shader enables rich shading techniques such as per-pixel
lighting and post-processing. A pixel shader is a program that combines constant variables, texture
values, interpolated per-vertex values, and other data to produce per-pixel outputs. The stage preceding
the rasterizer stage (GS stage or the VS stage if the geometry shader is NULL) must output vertex
positions in homogenous clip space.
413
A pixel shader can input up to 32 32-bit 4-component data for the current pixel location. It is only when
the geometry shader is active that all 32 inputs can be fed with data from above in the pipeline. In the
absence of the geometry shader, only up to 16 4-component elements of data can be input from
upstream in the pipeline.
Input data available to the pixel shader includes vertex attributes that can be chosen, on a per-element
basis, to be interpolated with or without perspective correction, or be treated as per-primitive
constants. In addition, declarations in a pixel shader can indicate which attributes to apply centroid
evaluation rules to. Centroid evaluation is relevant only when multisampling is enabled, since cases arise
where the pixel center may not be covered by the primitive (though subpixel center(s) are covered,
hence causing the pixel shader to run once for the pixel). Attributes declared with centroid mode must
be evaluated at a location covered by the primitive, preferably at a location as close as possible to the
(non-covered) pixel center.
A pixel shader can output up to 8 32-bit 4-component data for the current pixel location to be combined
with the render target(s), or no color (if the pixel is discarded). A pixel shader can also output an
optional 32-bit float scalar depth value for the depth test (SV_Depth).
For each primitive entering the rasterizer, the pixel shader is invoked once for each pixel covered by the
primitive. When multisampling, the pixel shader is invoked once per covered pixel, though depth/stencil
tests occur for each covered multisample, and multisamples that pass the tests are updated with the
pixel shader output color(s).
If there is no geometry shader, the IA stage is capable of producing one scalar per-primitive system-
generated value to the pixel shader, the SV_PrimitiveID, which can be read as input to the pixel shader.
The pixel shader can also retrieve the the SV_IsFrontFace value, generated by the rasterizer stage.
One of the inputs to the pixel shader can be declared with the name SV_Position, which means it will be
initialized with the pixel's float32 xyzw position. Note that w is the reciprocal of the linearly interpolated
1/w value. When the rendertarget is a multisample buffer or a standard rendertarget, the xy
components of position contain pixel center coordinates (which have a fraction of 0.5f).
The pixel shader intrinsic functions produce or use derivatives of quantities with respect to screen space
x and y. The most common use for derivatives is to compute level-of-detail calculations for texture
sampling and in the case of anisotropic filtering, selecting samples along the axis of anisotropy. Typically,
hardware implementations run a pixel shader on multiple pixels (for example a 2x2 grid) simultaneously,
so that derivatives of quantities computed in the pixel shader can be reasonably approximated as deltas
of the values at the same point of execution in adjacent pixels.
414
Stream-Output Stage (Direct3D 10)
The stream-output stage (SO) is located in the pipeline right after the geometry-shader stage and just
before the rasterization stage.
The purpose of the SO stage is to write vertex data streamed out of the GS stage (or the VS stage if the
GS stage is inactive) to one or more buffer resources in memory. Data streamed out to memory can be
read back into the pipeline in a subsequent rendering pass, or can be copied to a staging resource for
readback to the CPU. Since variable amounts of data can be generated by a geometry shader, the
amount of data streamed out can vary. The ID3D10Device::DrawAuto API allows this variable amount of
data to be processed in a subsequent pass without the need to query (the GPU) about the amount of
data written to stream output.
When primitive types with adjacency are used as inputs to the geometry shader, the adjacent vertices
are discarded and not streamed out.
415
When a triangle or line strip is bound to the input assembler, the strips are always converted into lists
before they are streamed out. Vertices are always written out as complete primitives (e.g. 3 vertices at a
time for triangles); incomplete primitives are never streamed out.
There are two primary ways streaming-output data can be fed back into the pipeline. One is to generate
data that can be fed back into the input assembler, and the other is to generate data that can be read
into shaders using load functions. If only one buffer is bound, the buffer can capture up to 64 scalar
components of per-vertex data, as long as the total amount of data being output per-vertex is 256 bytes
or less. Vertex stride can be up to 2048 bytes. If more than one buffer is bound, each can only capture a
single element of per-vertex data. Stream output can write to up to 4 buffers simultaneously. If multiple
buffers are bound and have different sizes, as soon as one of the buffers can no longer hold any more
complete primitives, writing to all buffers is stopped.
struct GSPS_INPUT
{
float4 Pos : SV_POSITION;
float3 Norm : TEXCOORD0;
float2 Tex : TEXCOORD1;
};
[maxvertexcount(3)]
void GS( triangle GSPS_INPUT input[3], inout TriangleStream<GSPS_INPUT>
TriStream )
{
GSPS_INPUT output;
//
// Calculate the face normal
//
float3 faceEdgeA = input[1].Pos - input[0].Pos;
float3 faceEdgeB = input[2].Pos - input[0].Pos;
float3 faceNormal = normalize( cross(faceEdgeA, faceEdgeB) );
416
output.Pos = mul( output.Pos, Projection );
output.Norm = input[v].Norm;
output.Tex = input[v].Tex;
TriStream.Append( output );
}
TriStream.RestartStrip();
}
This shader calculates a face normal for each triangle, and outputs position, normal and texture
coordinate data. A geometry shader looks just like a vertex or pixel shader, with the following
exceptions:
GS function return type - The function return type does one thing, declares the maximum
number of vertices that can be output by the shader. In this case,
maxvertexcount[3]
The first parameter is an array of vertices (3 in this case) defined by a GSPS_INPUT structure
(which defines per-vertex data as a position, a normal and a texture coordinate). The first
parameter also uses the triangle keyword, which means the input assembler stage must output
data to the geometry shader as one of the triangle primitive types (triangle list or triangle strip).
Use the triangle and trianglestream keywords to identify individual triangles or a stream of
triangles in a GS.
GS intrinsic function - The lines of code in the shader function use common-shader-core HLSL
intrinsic functions except the last two lines, which call Append and RestartStrip. These functions
are only available to a geometry shader. Append informs the geometry shader to append the
output to the current strip; RestartStrip creates a new primitive strip. A new strip is implicitly
created in every invocation of the GS stage.
The rest of the shader looks very similar to a vertex or pixel shader. The geometry shader uses a
structure to declare input parameters and marks the position member with the SV_POSITION semantic
417
to tell the hardware that this is position data. The input structure identifies the other two input
parameters as texture coordinates (even though one of them will contain a face normal). You could use
your own custom semantic for the face normal if you prefer.
Having designed the geometry shader, call D3D10CompileShader to compile it like this:
Just like vertex and pixel shaders, you will need a shader flag to tell the compiler how you want the
shader compiled (for debugging, optimized for speed etc...), the entry point function, and the shader
model to validate against. This example creates a geometry shader built from the Tutorial13.fx file, by
using the GS function. The shader is compiled for shader model 4.0.
But first, you need to declare the SO stage input signature. This signature matches or validates the GS
outputs and the SO inputs at object creation time. Here's an example of the SO declaration:
D3D10_STREAM_OUTPUT_DECLARATION_ENTRY pDecl[] =
{
// semantic name, semantic index, start component,
// component count, output slot
{ L"SV_POSITION", 0, 0, 4, 0 },// output all components of position
{ L"TEXCOORD0", 0, 0, 3, 0 }, // output the first 3 of the normal
{ L"TEXCOORD1", 0, 0, 2, 0 }, // output the first 2 texture coordinates
};
A pointer to the compiled geometry shader (or vertex shader if no geometry shader will be
present and data will be streamed out directly from the VS). To get this pointer see Getting a
Pointer to a Compiled Shader.
A pointer to an array of declarations that describe the input data for the stream output stage.
See D3D10_SO_DECLARATION_ENTRY. You can supply up to 64 declarations, one for each
different type of element to be output from the SO stage. The array of declaration entries
418
describes the data layout regardless of whether only a single Buffer or multiple buffers are to be
bound for stream output.
The stream output declaration defines the way data is written to a buffer resource. You can add as many
components as you want to the output declaration. Use the SO stage to write to a single buffer resource
or many buffer resources. For a single buffer, the SO stage can write many different elements per-
vertex. For multiple buffers, the SO stage can only write a single element of per-vertex data to each
buffer.
ID3D10Buffer *m_pBuffer;
int m_nBufferSize = 1000000;
D3D10_BUFFER_DESC bufferDesc =
{
m_nBufferSize,
D3D10_USAGE_DEFAULT,
D3D10_BIND_STREAM_OUTPUT,
0,
0
};
D3D10Device->CreateBuffer( &bufferDesc, NULL, &m_pBuffer );
Create a buffer by calling ID3D10Device::CreateBuffer. This example illustrates default usage which is
typical for a buffer resource that is expected to be updated fairly frequently by the CPU. The binding flag
identifies the pipeline stage that the resource can be bound to. Any resource used by the SO stage must
also be created with the D3D10_BIND_STREAM_OUTPUT bind flag.
Once the buffer is successfully created, set it to the current device by calling
ID3D10Device::SOSetTargets:
UINT offset[1] = 0;
D3D10Device->SOSetTargets( 1, &m_pBuffer, offset );
419
This call takes the number of buffers, a pointer to the buffers, and an array of offsets (one offset into
each of the buffers that indicates where to begin writing data). Data will be written to these streaming-
output buffers when a Draw command is issued. An internal variable keeps track of the position for
where to begin writing data to the streaming-output buffers, and that variables will continue to
increment until SOSetTargets is called again and a new offset value is specified.
All data written out to the target buffers will be 32-bit values.
420
Rasterizer Stage (Direct3D 10)
This stage rasterizes primitives into pixels, interpolating specified vertex values across the primitive. To
do so, the stage clips vertices to the view frustum, sets up the primitives for mapping to the 2D
viewport, and determines how to invoke the pixel shader, if one is currently set to the pipeline. Some of
these features are optional (like pixel shaders), however, the rasterizer always performs clipping, a
perspective divide to transform the points into homogenous space, and maps the vertices to the
viewport.
Vertices (x,y,z,w), coming into the rasterizer stage are assumed to be in homogenous clip-space. In this
coordinate space the X axis points right, Y points up and Z points away from camera.
You may disable rasterization by telling the pipeline there is no pixel shader (set the pixel shader stage
to NULL with ID3D10Device::PSSetShader), and disabling depth and stencil testing (set DepthEnable and
StencilEnable to FALSE in D3D10_DEPTH_STENCIL_DESC). While disabled, rasterization-related pipeline
counters will not update.
On hardware that implements hierarchical Z-buffer optimizations, you may enable preloading the z-
buffer by setting the pixel shader stage to NULL while enabling depth and stencil testing.
D3D10_VIEWPORT vp[1];
vp[1].Width = 640.0f;
vp[1].Height = 480.0f;
vp[1].MinDepth = 0;
vp[1].MaxDepth = 1;
421
vp[1].TopLeftX = 0;
vp[1].TopLeftY = 0;
g_pd3dDevice->RSSetViewports( 1, vp );
The viewport description specifies the size of the viewport, the range to map depth to (using MinDepth
and MaxDepth), and the placement of the top left of the viewport. MinDepth must be less than or equal
to MaxDepth; the range for both MinDepth and MaxDepth is between 0.0 and 1.0, inclusive. It is
common for the viewport to map to a render target but it is not necessary; additionally, the viewport
does not have to have the same size or position as the render target.
You can create an array of viewports, but only one can be applied to a primitive output from the
geometry shader. Only one viewport can be set active at a time. The pipeline uses a default viewport
(and scissor rectangle, discussed in the next section) during rasterization. The default is always the first
viewport (or scissor rectangle) in the array. To perform a per-primitive selection of the viewport in the
geometry shader, specify the ViewportArrayIndex semantic on the appropriate GS output component in
the GS output signature declaration.
The maximum number of viewports (and scissorrects) that can be bound to the rasterizer stage at any
one time is 16 (specified with #define by
D3D10_VIEWPORT_AND_SCISSORRECT_OBJECT_COUNT_PER_PIPELINE).
To enable the scissor rectangle, use the ScissorEnable member (in D3D10_RASTERIZER_DESC). The
default scissor rectangle is an empty rectangle; that is, all rect values are 0. In other words, if you do not
set up the scissor rectangle and scissor is enabled, you will not send any pixels to the output-merger
stage. The most common setup is to initialize the scissor rectangle to the size of the viewport.
D3D10_RECT rects[1];
rects[0].left = 0;
rects[0].right = 640;
rects[0].top = 0;
rects[0].bottom = 480;
D3DDevice->RSSetScissorRects( 1, rects );
This method takes two parameters: (1) the number of rectangles in the array and (2) an array of
rectangles.
422
The pipeline uses a default scissor rectangle index during rasterization (the default is a zero-size
rectangle with clipping disabled). To override this, specify the SV_ViewportArrayIndex semantic to a GS
output component in the GS output signature declaration. This will cause the GS stage to mark this GS
output component as a system-generated component with this semantic. The rasterizer stage
recognizes this semantic and will use the parameter it is attached to as the scissor rectangle index to
access the array of scissor rectangles. Don't forget to tell the rasterizer stage to use the scissor rectangle
that you define by enabling the ScissorEnable value in the rasterizer description before creating the
rasterizer object.
ID3D10RasterizerState * g_pRasterState;
D3D10_RASTERIZER_DESC rasterizerState;
rasterizerState.FillMode = D3D10_FILL_SOLID;
rasterizerState.CullMode = D3D10_CULL_FRONT;
rasterizerState.FrontCounterClockwise = true;
rasterizerState.DepthBias = false;
rasterizerState.DepthBiasClamp = 0;
rasterizerState.SlopeScaledDepthBias = 0;
rasterizerState.DepthClipEnable = true;
rasterizerState.ScissorEnable = true;
rasterizerState.MultisampleEnable = false;
rasterizerState.AntialiasedLineEnable = false;
pd3dDevice->CreateRasterizerState( &rasterizerState, &g_pRasterState );
This example set of state accomplishes perhaps the most basic rasterizer setup:
Cull out — that is, remove back faces; assume counter-clockwise winding order for primitives
Turn off depth bias but enable depth buffering and enable the scissor rectangle
In addition, basic rasterizer operations, always include the following: clipping (to the view frustum),
perspective divide, and the viewport Scale. After successfully creating the rasterizer state object, set it
to the device like this:
pd3dDevice->RSSetState(g_pRasterState);
423
Output-Merger Stage (Direct3D 10)
The output-merger (OM) stage generates the final rendered pixel color using a combination of pipeline
state, the pixel data generated by the pixel shaders, the contents of the render targets, and the contents
of the depth/stencil buffers. The OM stage is the final step for determining which pixels are visible (with
depth-stencil testing) and blending the final pixel colors.
Depth-Stencil Testing
A depth-stencil buffer, which is created as a texture resource, can contain both depth data and stencil
data. The depth data is used to determine which pixels lie closest to the camera, and the stencil data is
used to mask which pixels can be updated. Ultimately, both the depth and stencil values data are used
by the output-merger stage to determine if a pixel should be drawn or not. This flow chart shows
conceptually how depth-stencil testing is done.
424
To configure depth-stencil testing, see Configuring Depth-Stencil Functionality (Direct3D 10). A depth-
stencil object encapsulates depth-stencil state. An application can specify depth-stencil state, or the OM
stage will use default values. Blending operations are performed on a per-pixel basis if multisampling is
disabled. If multisampling is enabled, blending occurs on a per-multisample basis.
The process of using the depth buffer to determine which pixel should be drawn is called depth
buffering, also sometimes called z-buffering.
Blending
Blending combines one or more pixel values to create a final pixel color. The following conceptual
diagram describes the process involved in blending pixel data.
Conceptually, you can visualize this flow chart implemented twice in the output-merger stage: the first
one blends RGB data, while in parallel, a second one blends alpha data. To see how to use the API to
create and set blend state, see Configuring Blending Functionality (Direct3D 10).
425
Configuring Depth-Stencil Functionality
(Direct3D 10)
This section covers the steps for setting up the depth-stencil buffer, and depth-stencil state for the
output-merger stage.
D3D10_DEPTH_STENCIL_DESC dsDesc;
426
dsDesc.FrontFace.StencilPassOp = D3D10_STENCIL_OP_KEEP;
dsDesc.FrontFace.StencilFunc = D3D10_COMPARISON_ALWAYS;
D3D10_DEPTH_STENCIL_VIEW_DESC descDSV;
descDSV.Format = DXGI_FORMAT_D32_FLOAT;
descDSV.ResourceType = D3D10_RESOURCE_TEXTURE2D;
descDSV.Texture2D.FirstArraySlice = 0;
descDSV.Texture2D.ArraySize = 1;
descDSV.Texture2D.MipSlice = 0;
Rendertargets must all be the same type of resource. If multisample antialiasing is used, all bound
rendertargets and depth buffers must be have the same sample counts.
427
When a buffer is used as a rendertarget, depth-stencil testing and multiple rendertargets are not
supported.
All rendertargets must have the same size in all dimensions (width and height, and depth for 3D
or array size for *Array types).
Write masks control what data gets written to a rendertarget. The output write masks control
on a per-rendertarget, per-component level what data gets written to the rendertarget(s).
Compositing
Your application can use the stencil buffer to composite 2D or 3D images onto a 3D scene. A mask in the
stencil buffer is used to occlude an area of the rendering target surface. Stored 2D information, such as
text or bitmaps, can then be written to the occluded area. Alternately, your application can render
additional 3D primitives to the stencil-masked region of the rendering target surface. It can even render
an entire scene.
Games often composite multiple 3D scenes together. For instance, driving games typically display a rear-
view mirror. The mirror contains the view of the 3D scene behind the driver. It is essentially a second 3D
scene composited with the driver's forward view.
Decaling
Direct3D applications use decaling to control which pixels from a particular primitive image are drawn to
the rendering target surface. Applications apply decals to the images of primitives to enable coplanar
polygons to render correctly.
For instance, when applying tire marks and yellow lines to a roadway, the markings should appear
directly on top of the road. However, the z values of the markings and the road are the same. Therefore,
the depth buffer might not produce a clean separation between the two. Some pixels in the back
primitive may be rendered on top of the front primitive and vice versa. The resulting image appears to
shimmer from frame to frame. This effect is called z-fighting or flimmering.
To solve this problem, use a stencil to mask the section of the back primitive where the decal will
appear. Turn off z-buffering and render the image of the front primitive into the masked-off area of the
render-target surface.
428
Outlines and Silhouettes
You can use the stencil buffer for more abstract effects, such as outlining and silhouetting.
If your application does two render passes - one to generate the stencil mask and second to apply the
stencil mask to the image, but with the primitives slightly smaller on the second pass - the resulting
image will contain only the primitive's outline. The application can then fill the stencil-masked area of
the image with a solid color, giving the primitive an embossed look.
If the stencil mask is the same size and shape as the primitive you are rendering, the resulting image
contains a hole where the primitive should be. Your application can then fill the hole with black to
produce a silhouette of the primitive.
Two-Sided Stencil
Shadow Volumes are used for drawing shadows with the stencil buffer. The application computes the
shadow volumes cast by occluding geometry, by computing the silhouette edges and extruding them
away from the light into a set of 3D volumes. These volumes are then rendered twice into the stencil
buffer.
The first render draws forward-facing polygons, and increments the stencil-buffer values. The second
render draws the back-facing polygons of the shadow volume, and decrements the stencil buffer values.
Normally, all incremented and decremented values cancel each other out. However, the scene was
already rendered with normal geometry causing some pixels to fail the z-buffer test as the shadow
volume is rendered. Values left in the stencil buffer correspond to pixels that are in the shadow. These
remaining stencil-buffer contents are used as a mask, to alpha-blend a large, all-encompassing black
quad into the scene. With the stencil buffer acting as a mask, the result is to darken pixels that are in the
shadows.
This means that the shadow geometry is drawn twice per light source, hence putting pressure on the
vertex throughput of the GPU. The two-sided stencil feature has been designed to mitigate this
situation. In this approach, there are two sets of stencil state (named below), one set each for the front-
facing triangles and the other for the back-facing triangles. This way, only a single pass is drawn per
shadow volume, per light.
429
Configuring Blending Functionality
(Direct3D 10)
Blending operations are performed on every pixel shader output (RGBA value) before the output value is
written to a render target. If multisampling is enabled, blending is done on each multisample; otherwise,
blending is performed on each pixel.
For instance, here is a very simple example of blend-state creation that disables alpha blending and uses
no per-component pixel masking.
D3D10_BLEND_DESC BlendState;
ZeroMemory(&BlendState, sizeof(D3D10_BLEND_DESC));
BlendState.BlendEnable[0] = FALSE;
BlendState.RenderTargetWriteMask[0] = D3D10_COLOR_WRITE_ENABLE_ALL;
pd3dDevice->CreateBlendState(&BlendState, &g_pBlendStateNoBlend);
float blendFactor = 0;
UINT sampleMask = 0xffffffff;
This API takes three parameters: the blend-state object, a four-component blend factor, and a sample
mask. You may pass in NULL for the blend-state object (which tells the runtime to use default blend
state - see ID3D10Device::OMSetBlendState) or pass in a blend-state object as shown here. The blend
factor gives you per-component control over blending the new per-pixel values with the existing value.
The sample mask is a user-defined mask that determines how to sample the existing rendertarget
before updating it. The default sampling mask is 0xffffffff which designates point sampling.
430
Advanced Blending Topics
Alpha-To-Coverage
Alpha-to-coverage is a multisampling technique that is most useful for situations such as dense foliage
where there are several overlapping polygons. It can be turned on by setting the
AlphaToCoverageEnable variable to true in the D3D10_BLEND_DESC. Alpha-to-coverage will work
regardless of whether or not multisampling is also turned on in the rasterizer state (by setting
MultisampleEnable to true or false).
Alpha-to-coverage works by taking the alpha component of an rgba value after it is output from a pixel
shader and converting it into an n-step coverage mask (where n is the sample count). This n-step
coverage mask is then ANDed with the multisample coverage mask and the result is used to determine
which samples should get updated for all of the rendertargets currently bound to the output merger.
The original alpha value in the output of the pixel shader is not changed when that alpha value is used to
create the n-step coverage mask (alpha blending will still occur on a per-sample basis). Alpha-to-
coverage multisampling is essentially the same as regular multisampling except that the n-step coverage
mask is generated and ANDed with the multisample coverage mask.
Additional options are available for the SrcBlend, DestBlend, SrcBlendAlpha or DestBlendAlpha terms in
the blend equation. The presence of any of the following choices in the blend equation means that dual
source color blending is enabled:
D3D10_BLEND_SRC1COLOR
D3D10_BLEND_INVSRC1COLOR
D3D10_BLEND_SRC1ALPHA
D3D10_BLEND_INVSRC1ALPHA
When dual source color blending is enabled, the pixel shader must have only a single rendertarget
bound, at slot 0, and must output both SV_TARGET0 and SV_TARGET1. Writing SV_Depth is still valid
when performing dual source color blending.
The configured blend equation and the rendertarget write mask at slot 0 imply exactly which
components from pixel shader outputs must be present. If the expected output components are not
present, results are undefined. If extra components are output, they are ignored.
431
Examples:
There are times when a Shader computes 2 results that are useful on a single pass, but needs to
combine one into the destination with a multiply and the other in with an add. This would look like:
SrcBlend = D3D10_BLEND_ONE;
DestBlend = D3D10_BLEND_SRC1COLOR;
Next is a blend mode setup that takes pixel shader output color SV_TARGET0 as source color, and uses
pixel shader output color SV_TARGET1 to blend with the destination color.
SrcBlend = D3D10_BLEND_SRC1COLOR;
DestBlend = D3D10_BLEND_INVSRC1COLOR;
SrcBlend = D3D10_BLEND_SRC1ALPHA;
DestBlend = D3D10_BLEND_SRCCOLOR;
RenderTargetWriteMask[0] = ( D3D10_COLOR_WRITE_ENABLE_RED |
D3D10_COLOR_WRITE_ENABLE_ALPHA );
432
Resources (Direct3D 10)
A resource is an area in memory that can be accessed by the Direct3D pipeline. In order for the pipeline
to access memory efficiently, data that is provided to the pipeline (such as input geometry, shader
resources, textures etc) must be stored in a resource. There are two types of resources from which all
Direct3D resources derive: a buffer or a texture.
Each application will typically create many resources. Examples of resource include: vertex buffers, index
buffer, constant buffer, textures, and shader resources. There are several options that determine how
resources can be used. You can create resources that are strongly typed or typeless; you can control
whether resources have both read and write access; you can make resources accessible to only the CPU,
GPU, or both. Naturally, there will be speed vs. functionality tradeoff - the more functionality you allow
a resource to have, the less performance you should expect.
Since an application often uses many textures, Direct3D also introduces the concept of a texture array to
simplify texture management. A texture array contains one or more textures (all of the same type and
dimensions) that can be indexed from within an application or by shaders. Texture arrays allow you to
use a single interface with multiple indexes to access many textures. You can create as many texture
arrays to manage different texture types as you need.
Once you have created the resources that your application will use, you connect or bind each resource
to the pipeline stages that will use them. This is accomplished by calling a bind API, which takes a
pointer to the resource. Since more than one pipeline stage may need access to the same resource,
Direct3D 10 introduces the concept of a resource view. A view identifies the portion of a resource that
can be accessed. You can create m views or a resource and bind them to n pipeline stages, assuming you
follow binding rules for shared resource (the runtime will generate errors at compile time if you don't).
A resource view provides a general model for access to a resource (textures, buffers, etc.). Because you
can use a view to tell the runtime what data to access and how to access it, resource views allow you
create typeless resources. That is, you can create resources for a given size at compile time, and then
declare the data type within the resource when the resource gets bound to the pipeline. Views expose
many new capabilities for using resources, such as the ability to read back depth/stencil surfaces in the
shader, generating dynamic cubemaps in a single pass, and rendering simultaneously to multiple slices
of a volume.
433
Resource Types (Direct3D 10)
All resources used by the Direct3D pipeline derive from two basic resource types: buffers and textures. A
buffer is a collection of elements; a texture is a collection of texels.
There are two ways to fully specify the layout (or memory footprint) of a resource:
Typeless - fully specify the type when the resource is bound to the pipeline.
Buffer Resource
A buffer resource is a collection of data; each piece of data is called an element. Each element can be of
a different type, or even a different format. For example, an application could store both index and
vertex information in the same buffer. When the buffer is bound to the pipeline, an application specifies
the information necessary to read and interpret buffer data.
A buffer is created as an unstructured resource (in essence just a big chunk of memory) which is
interpreted when it is bound to a pipeline stage. Unlike a texture resource, a buffer cannot be filtered,
does not contain multiple subresources, and cannot be multisampled.
Buffer Element
An element is the smallest unit of memory that can be read or written by the pipeline. An element can
be read by a shader or set as state in a Direct3D device. Some examples of an element could be: a
position, a vertex normal, or a texture coordinate in a vertex buffer, and index in an index buffer, or a
single state, such as depth/stencil.
An element is made up of 1 to 4 components. Examples of an element include: a packed data value (like
R8G8B8A8), a single 8-bit integer, four 32-bit float values, etc. Specifically, an element is any one of the
DXGI formats.
Although this description of elements applies to both buffers and textures, the documentation will use
the term element when referring to buffers, and texel (short for texture element) when referring to
textures for the sake of clarity.
Vertex Buffer
Index Buffer
Shader-Constant Buffer
Vertex Buffer
A vertex buffer is a collection of elements. A vertex buffer which contains position data can be visualized
like this.
434
Figure 1. Single-Element Vertex Buffer
More often, a vertex buffer contains all the data needed to fully specify 3D vertices. This data is usually
organized as structures of elements, which can be visualized like this.
This vertex buffer contains eight vertices; the data for each vertex is made up of three elements
(position, normal, and texture coordinates). The position and normal are typically specified using three
32-bit floats (DXGI_FORMAT_R32G32B32_FLOAT) and the texture coordinates using two 32-bit floats
(DXGI_FORMAT_R32G32_FLOAT). Every element in a vertex buffer has an identical data structure to
every other element.
To access data from a vertex buffer you need to know which vertex to access and these other buffer
parameters:
Offset - the number of bytes from the start of the buffer to the data for the first vertex. The offset is
supplied to ID3D10Device::IASetVertexBuffers.
BaseVertexLocation - the number of bytes from the offset to the first vertex used by the appropriate
draw call (see Draw APIs).
To create a buffer resource that can be used as a vertex buffer, see Create a Vertex Buffer.
Index Buffer
An index buffer contains a sequential set of indices; each index is used to identify a vertex in a vertex
buffer. Using an index buffer with one or more vertex buffers to supply data to the IA stage is called
indexing. An index buffer can be visualized like this.
435
Figure 3. Index Buffer
The sequential indices stored in an index buffer are located with the following parameters:
Offset - the number of bytes from the start of the buffer to the first index. The offset is supplied
to ID3D10Device::IASetIndexBuffer.
StartIndexLocation - the number of bytes from the offset to the first vertex used by the
appropriate draw call (see Draw APIs).
An index buffer contains 16-bit or 32-bit indices. To create an index buffer, see Create an Index Buffer.
Shader-Constant Buffer
Direct3D 10 introduced a new buffer for supplying shader constants. It is called a shader-constant
buffer. Conceptually, it looks just like a vertex buffer).
Texture Resource
A texture resource is a structured collection of data designed to store texture data. The data in a texture
resource is made up of one or more subresources, which are themselves organized into arrays of texels.
Unlike buffers, textures can be filtered by texture samplers as they are read by shader units. The type of
texture impacts how the texture is filtered.
Texels
Texture3D Resource
436
Texels
A texel (or texture element) represents the smallest unit of a texture that can be read or written to by
the pipeline. Each texel contains 1 to 4 components, arranged in one of the DXGI formats.
Individual texel components cannot be fetched (read) directly. The entire texel must be fetched by the
runtime before an application can access a specific component. For example, a shader cannot fetch just
the green component of an R8G8B8A8 texture; it would need to fetch the entire texel first and then
access the green component.
Figure 2 illustrates a Texture1D resource with 3 mipmap levels. The top-most mipmap level is the largest
level; each successive level is a power of 2 (on each side) smaller. In this example, since the top-level
texture width is 5 elements, there are two mipmap levels before the texture width is reduced to 1. Each
element in each mipmap level is addressable by the u vector (which is commonly called a texture
coordinate).
Each element in each mipmap level contains a single texel, or texture value. The data type of each
element is defined by the texel format which is once again a DXGI_FORMAT value. A texture resource
may be typed or typeless at resource-creation time, but when bound to the pipeline, its interpretation
must be provided in a view.
This texture array contains three textures. Each of the three textures has a texture width of 5 (which is
the number of elements in the 1st layer). Each texture also contains a 3 layer mipmap.
437
All texture arrays in Direct3D are a homogenous array of textures; this means that every texture in a
texture array must have the same data format and size (including texture width and number of
miplevels). You may create texture arrays of different sizes, as long as all the textures in each array
match in size.
Subresources
One interesting feature with Direct3D texture resources (including textures and texture arrays) is that
they are made up of subresources. A subresource is a texture and mipmap-level combination. For a
single texture, a subresource is a single mipmap level. This 1D texture is made up of 3 subresources.
For a texture array, a subresource is an array slice and a particular mipmap level. Here is an example of
an array of subresources, within a 2D texture array.
This array of subresources contains the top mipmap levels of all three textures in the texture array
resource. Direct3D uses a resource view to access this array of texture subresources in a texture array.
Indexing Subresources
A texture array can contain multiple textures, each with multiple mipmap levels. Each subresource is a
mipmap-level in a single texture. When accessing a particular subresource, this is how the subresources
are indexed within the texture array.
438
The index (which is an unsigned integer) starts at zero in the first texture in the array, and increments
through the mipmap levels for that texture. Subresource indices in 2D texture arrays follow the same
pattern as 1D texture arrays.
Subresources are indexed somewhat differently in a 3D texture resource (also known as a volume
texture). Each subresource is a mipmap level in the Texture3D.
The index starts at zero in the first mipmap level in the Texture3D and increments through the mipmap
levels for the rest of the texture.
This texture resource contains a single 3x5 texture with three mipmap levels.
439
A Texture2DArray resource is a homogeneous array of 2D textures; that is, each texture has the same
data format and dimensions (including mipmap levels). It has a similar layout as the Texture1DArray
resource except that the textures now contain 2D data, and therefore looks like this:
This texture array contains three textures; each texture is 3x5 with two mipmap levels.
A Texture2DArray that contains 6 textures may be read from within shaders with the cube map intrinsic
functions, after they are bound to the pipeline with a cube-texture view. Texture cubes are addressed
from the shader with a 3D vector pointing out from the center of the texture cube.
Texture3D Resource
A Texture3D resource (also known as a volume texture) contains a 3D volume of texels. Since it is a
texture resource, it may contain mipmap levels. A fully populated Texture3D resource looks like this:
440
Figure 13. Texture3D Resource
When a Texture3D mipmap slice is bound as a rendertarget output, (by creating a rendertarget view),
the Texture3D behaves identically to a Texture2DArray with n array slices where n is the depth (3rd
dimension) of the Texture3D. The particular slice in the Texture3D to render is chosen from the
geometry shader stage, by declaring a scalar component of output data as the
SV_RenderTargetArrayIndex system-value.
441
Choosing a Resource (Direct3D 10)
A resource is a collection of data that is used by the 3D pipeline. Creating resources and defining their
behavior is the first step toward programming your application. This guide covers basic topics for
choosing the resources required by your application.
This table lists the types of resources that can be bound to each pipeline stage. It includes whether the
resource can be bound as an input or an output, as well as the bind API.
Pipeline
In/Out Resource Resource Type Bind API
Stage
Input
In Vertex Buffer Buffer ID3D10Device::IASetVertexBuffers
Assembler
Input
In Index Buffer Buffer ID3D10Device::IASetIndexBuffer
Assembler
Texture1D, ID3D10Device::VSSetShaderResources,
Shader Shader-
In Texture2D, ID3D10Device::GSSetShaderResources,
Stages ResourceView
Texture3D ID3D10Device::PSSetShaderResources
Shader- ID3D10Device::VSSetConstantBuffers,
Shader
In Constant Buffer ID3D10Device::GSSetConstantBuffers,
Stages
Buffer ID3D10Device::PSSetConstantBuffers
Stream
Out Buffer Buffer ID3D10Device::SOSetTargets
Output
Buffer, Texture1D,
Output Rendertarget
Out Texture2D, ID3D10Device::OMSetRenderTargets
Merger View
Texture3D
442
Identify How Each Resource Will Be Used
Once you have chosen the pipeline stages that your application will use (and therefore the resources
that each stage will require), the next step is to determine how each resource will be used, that is,
whether a resource can be accessed by the CPU or the GPU.
The hardware that your application is running on will have a minimum of at least one CPU and one GPU.
To pick a usage value, consider which type of processor needs to read or write to the resource from the
following options (see D3D10_USAGE).
Default usage should be used for a resource that is expected to be updated by the CPU infrequently (less
than once per frame). Ideally, the CPU would never write directly to a resource with default usage so as
to avoid potential performance penalties.
Dynamic usage should be used for a resource that the CPU updates relatively frequently (once or more
per frame). A typical scenario for a dynamic resource would be to create dynamic vertex and index
buffers that would be filled at runtime with data about the geometry visible from the point of view of
the user for each frame. These buffers would be used to render only the geometry visible to the user for
that frame.
Staging usage should be used to copy data to and from other resources. A typical scenario would be to
copy data in a resource with default usage (which the CPU cannot access) to a resource with staging
usage (which the CPU can access).
Immutable resources should be used when the data in the resource will never change.
If you are unsure what usage to choose, start with the default usage as it is expected to be used most
often.
443
specifically, a resource can be bound as an input and an output simultaneously as long as reading and
writing part of a resource cannot happen at the same time.
When binding a resource, think about how the GPU and the CPU will access the resource. Resources
that are designed for a single purpose (do not use multiple usage, bind, and cpu access flags) will more
than likely result in better performance.
For example, consider the case of a render target used as a texture multiple times. It may be faster to
have two resources: a render target and a texture used as a shader resource. Each resource would use
only one bind flag (D3D10_BIND_RENDER_TARGET or D3D10_BIND_SHADER_RESOURCE). The data
would be copied from the render-target texture to the shader texture using
ID3D10Device::CopyResource or ID3D10Device::CopySubresourceRegion. This may improve
performance by isolating the render-target write from the shader-texture read. Of course, the only way
to be sure is to implement both approaches and measure the performance difference in your particular
application.
444
Creating Buffer Resources (Direct3D 10)
A resource is a collection of data that is used by the 3D pipeline. The size of a resource is limited by
D3D10_REQ_RESOURCE_SIZE_IN_MEGABYTES. Creating resources and defining their behavior is the first
step towards the rendering of geometric data.
A Buffer Description
When creating a vertex buffer, a buffer description (see D3D10_BUFFER_DESC) is used to define how
data is organized within the buffer, how the pipeline can access the buffer, and how the buffer will be
used.
The following example demonstrates how to create a buffer description for a single triangle with
vertices that contain position and color values.
struct SimpleVertex
{
D3DXVECTOR3 Position;
D3DXVECTOR3 Color;
};
D3D10_BUFFER_DESC bufferDesc;
bufferDesc.Usage = D3D10_USAGE_DEFAULT;
bufferDesc.ByteWidth = sizeof( SimpleVertex ) * 3;
bufferDesc.BindFlags = D3D10_BIND_VERTEX_BUFFER;
bufferDesc.CPUAccessFlags = 0;
bufferDesc.MiscFlags = 0;
In this example, the buffer description is initialized with almost all default settings for usage, CPU access
and miscellaneous flags. The other settings are for the bind flag which identifies the resource as a vertex
buffer only, and the size of the buffer.
The usage and CPU access flags are important for performance. These two flags together determine how
often a resource gets accessed, what type of memory the resource could be loaded into, and what
processor needs to access the resource. Default usage this resource will not be updated very often.
Setting CPU access to 0 means that the CPU will not need to either read or write the resource. Taken in
445
combination, this means that the runtime can load the resource into the highest performing memory for
the GPU since the resource does not require CPU access.
As expected, there is a tradeoff between best performance and any-time accessibility by either
processor. For example, the default usage with no CPU access means that the resource can be made
available to the GPU exclusively. This could include loading the resource into memory not directly
accessible by the CPU. The resource could only be modified with ID3D10Device::UpdateSubresource.
A Subresource Description
A resource is made up of subresources. Applications can optionally provide some initial data when the
buffer is created to both create it and initialize it at the same time. This is accomplished by using
D3D10_SUBRESOURCE_DATA. The subresource description points to the actual resource data, and also
contains information about the size and layout of that data.
Resources created with the D3D10_USAGE_IMMUTABLE flag (see D3D10_USAGE) must be initialized at
creation time; resources that use any of the other creation flags can be updated after initialization. This
can be accomplished using ID3D10Device::CopyResource, ID3D10Device::CopySubresourceRegion,
ID3D10Device::UpdateSubresource, or by accessing its underlying memory using the resource's Map
method.
A buffer is just a collection of elements and is laid out as a 1D array. As a result, the system memory
pitch and system memory slice pitch are both the same; the size of the vertex data declaration.
This description gives the pipeline a clear idea of how to access the elements in the resource.
The following code snippet demonstrates how to create a vertex buffer from an array of vertex data
declared by the application. First, a buffer description is created and filled with information about the
data to be stored in the vertex buffer and how that data will be used. Next, a subresource description is
created and filled with the actual information the vertex buffer will be initialized with. Finally, the vertex
buffer is created by calling ID3D10Device::CreateBuffer
struct SimpleVertexCombined
{
D3DXVECTOR3 Pos;
D3DXVECTOR3 Col;
};
446
SimpleVertexCombined verticesCombo[] =
{
D3DXVECTOR3( 0.0f, 0.5f, 0.5f ),
D3DXVECTOR3( 0.0f, 0.0f, 0.5f ),
D3DXVECTOR3( 0.5f, -0.5f, 0.5f ),
D3DXVECTOR3( 0.5f, 0.0f, 0.0f ),
D3DXVECTOR3( -0.5f, -0.5f, 0.5f ),
D3DXVECTOR3( 0.0f, 0.5f, 0.0f ),
};
D3D10_BUFFER_DESC bufferDesc;
bufferDesc.Usage = D3D10_USAGE_DEFAULT;
bbufferDescd.ByteWidth = sizeof( SimpleVertexCombined ) * 3;
bufferDesc.BindFlags = D3D10_BIND_VERTEX_BUFFER;
bufferDesc.CPUAccessFlags = 0;
bufferDesc.MiscFlags = 0;
D3D10_SUBRESOURCE_DATA InitData;
InitData.pSysMem = verticesCombo;
InitData.SysMemPitch = 0;
InitData.SysMemSlicePitch = 0;
hr = g_pd3dDevice->CreateBuffer( &bd, &InitData, &g_pVertexBuffer[0] );
The following code snippet demonstrates how to create an index buffer from an array of index data.
First, a buffer description is created and filled with information about the data to be stored in the index
buffer and how that data will be used. Next, a subresource description is created and filled with the
actual information the index buffer will be initialized with.
// Create indices
unsigned int indices[] = { 0, 1, 2 };
D3D10_BUFFER_DESC bufferDesc;
bufferDesc.Usage = D3D10_USAGE_DEFAULT;
bufferDesc.ByteWidth = sizeof( unsigned int ) * 3;
bufferDesc.BindFlags = D3D10_BIND_INDEX_BUFFER;
bufferDesc.CPUAccessFlags = 0;
bbufferDescd.MiscFlags = 0;
D3D10_SUBRESOURCE_DATA InitData;
InitData.pSysMem = indices;
447
InitData.SysMemPitch = 0;
InitData.SysMemSlicePitch = 0;
hr = g_pd3dDevice->CreateBuffer( &bd, &InitData, &g_pIndexBuffer );
if( FAILED( hr ) )
return hr;
448
Creating Texture Resources (Direct3D 10)
A texture resource is a structured collection of data. Typically, color values are stored in textures and
accessed during rendering by the pipeline at various stages for both input and output. Creating textures
and defining how they will be used is an important part of rendering interesting-looking scenes in
Direct3D 10.
Even though textures typically contain color information, creating textures using different formats
enables them to store different kinds of data. This data can then be leveraged by the Direct3D 10
pipeline in non-traditional ways.
All textures have limits on how much memory they consume, and how many texels they contain. These
limits are specified by resource constants.
On the other hand, a generic texture manager may implement something like the following to handle
different the texture types.
switch( type )
{
case D3D10_RESOURCE_DIMENSION_TEXTURE1D:
{
pTexture1D = (ID3D10Texture1D*)pResource;
// ...
449
break;
}
case D3D10_RESOURCE_DIMENSION_TEXTURE2D:
{
pTexture2D = (ID3D10Texture2D*)pResource;
// ...
break;
}
case D3D10_RESOURCE_DIMENSION_TEXTURE3D:
{
pTexture3D = (ID3D10Texture3D*)pResource;
break;
}
default:
{
// error
break;
}
}
Rendering to a texture
The most common case of creating an empty texture to be filled with data during runtime is the case
where an application wants to render to a texture and then use the results of the rendering operation in
a subsequent pass. Textures created with this purpose should specify default usage.
The following code sample creates an empty texture that the pipeline can render to and subsequently
use as an input to shader.
450
ID3D10Texture2D pRenderTarget = NULL;
pDevice->CreateTexture2D( &desc, NULL, &pRenderTarget );
Creating the texture requires the application to specify some information about the properties the
texture will have. The width and height of the texture in texels is set to 256. For this render target, a
single miplevel is all we need. Only one render target is required so the array size is set to 1. Each texel
contains four 32-bit floating point values, which can be used to store very precise information (see
DXGI_FORMAT). One sample per pixel is all that will be needed. The usage is set to default because this
allows for the most efficient placement of the render target in memory. Finally, the fact that the texture
will be bound as a render target and a shader resource at different points in time is specified.
Textures cannot be bound for rendering to the pipeline directly. Because of this, a render-target view
must first be created to describe to the pipeline how to access the render target texture. This can be
seen in the following code sample.
D3D10_RENDER_TARGET_VIEW_DESC rtDesc;
rtDesc.Format = desc.Format;
rtDesc.ViewDimension = D3D10_RTV_DIMENSION_TEXTURE2D;
rtDesc.Texture2D.MipSlice = 0;
The format of the render-target view is simply set to the format of the original texture. The information
in the resource should be interpreted as a 2D texture, and we only want to use the first miplevel of the
render target.
Similarly to how a render-target view must be created so that the render target can be bound for output
to the pipeline, a shader-resource view must be created so that the render target can be bound to the
pipeline as an input. The following code sample demonstrates this.
The parameters of shader-resource view descriptions are very similar to those of render-target view
descriptions and were chosen for the same reasons.
451
Filling Textures Manually
Sometimes applications would like to compute values at runtime, put them into a texture manually and
then have the graphics pipeline use this texture in later rendering operations. To do this, the application
must create an empty texture in such a way to allow the CPU to access the underlying memory. This is
done by creating a dynamic texture and gaining access to the underlying memory by calling a particular
method. The following code sample demonstrates how to do this.
D3D10_TEXTURE2D_DESC desc;
desc.Width = 256;
desc.Height = 256;
desc.MipLevels = desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
desc.SampleDesc.Count = 1;
desc.Usage = D3D10_USAGE_DYNAMIC;
desc.BindFlags = D3D10_BIND_SHADER_RESOURCE;
desc.CPUAccessFlags = D3D10_CPU_ACCESS_WRITE;
ID3D10Texture2D* pTexture = NULL;
pd3dDevice->CreateTexture2D( &desc, NULL, &pTexture );
Note that the format is set to a 32 bits per pixel where each component is defined by 8 bits. The usage
parameter is set to dynamic while the bind flags are set to specify the texture will be accessed by a
shader. The rest of the texture description is similar to creating a render target.
Calling ID3D10Texture2D::Map enables the application to access the underlying memory of the texture.
The pointer retrieved is then used to fill the texture with data. This can be seen in the following code
sample.
D3D10_MAPPED_TEXTURE2D mappedTex;
pTexture->Map( D3D10CalcSubresource(0, 0, 1), D3D10_MAP_WRITE_DISCARD, 0,
&mappedTex );
pTexture->Unmap( D3D10CalcSubresource(0, 0, 1) );
452
Multiple Rendertargets
Up to eight rendertarget views can be bound to the pipeline at a time (with a call to
ID3D10Device::OMSetRenderTargets). For each pixel (or each sample if multisampling is enabled), the
blend process will be done on each rendertarget view bound to the output merger. Two of the blend
state variables - BlendEnable and RenderTargetWriteMask - are arrays of eight, with each member of
the arrays corresponding to each rendertarget view set to the output merger. When setting multiple
rendertargets to the output merger, each rendertarget must be the same type of resource (i.e. buffer,
texture1D[array], texture2D[array], texture3D) and must have the same size in all dimensions (width,
height, depth for texture3Ds, and array size for texture arrays). If the rendertargets are multisampled
textures, then they must all have the same number of samples per pixel.
There can only be one Depth/Stencil buffer active, regardless of how many RenderTargets are active.
Should resource views of TextureArray(s) be set as RenderTarget(s), the resource view of Depth/Stencil
(if bound) must also be the same dimensions and array size. Note that this does not mean that the
Resources, themselves, need to be of the same dimensions (including array size). Only that the views
that are used together must be of the same effective dimensions.
453
Copying and Accessing Resource Data
(Direct3D 10)
It is no longer necessary to think about resources as being created in either video memory or system
memory. Or whether or not the runtime should manage the memory. Thanks to the architecture of the
new WDDM (Windows Display Driver Model), applications now create Direct3D 10 resources with
different usage flags to indicate how the application intends on using the resource data. The new driver
model virtualizes the memory used by resources; it then becomes the responsibility of the operating
system/driver/memory manager to place resources in the most performant area of memory possible
given the expected usage.
The default case is for resources to be available to the GPU. Of course, having said that, there are times
when the resource data needs to be available to the CPU. Copying resource data around so that the
appropriate processor can access it without impacting performance requires some knowledge of how
the API methods work.
Ideally, all resources would be located in video memory so that the GPU can have immediate access to
them. However, it is sometimes necessary for the CPU to read the resource data or for the GPU to
access resource data the CPU has written to. Direct3D 10 handles these different scenarios by
requesting the application specify a usage type, and then offers several methods for copying resource
data when necessary.
Depending on how the resource was created, it is not always possible to directly access the underlying
data. This may mean that the resource data must be copied from the source resource to another
resource that is accessible by the appropriate processor. In terms of Direct3D 10, default resources can
be accessed directly by the GPU, dynamic and staging resources can be directly accessed by the CPU.
Once a resource has been created, its usage cannot be changed. Instead, copy the contents of one
resource to another resource that was created with a different usage. Direct3D 10 provides this
functionality with three different methods. The first two methods( ID3D10Device::CopyResource and
ID3D10Device::CopySubresourceRegion) are designed to copy resource data from one resource to
another. The third method (ID3D10Device::UpdateSubresource) is designed to copy data from memory
to a resource.
There are two main kinds of resources: mappable and non-mappable. Resources created with dynamic
or staging usages are mappable, while resources created with default or immutable usages are non-
mappable.
454
Copying data among non-mappable resources is very fast because this is the most common case and has
been optimized to perform well. Since these resources are not directly accessible by the CPU, they are
optimized so that the GPU can manipulate them quickly.
Copying data among mappable resources is more problematic because the performance will depend on
the usage the resource was created with. For example, the GPU can read a dynamic resource fairly
quickly but cannot write to them, and the GPU cannot read or write to staging resources directly.
Applications that wish to copy data from a resource with default usage to a resource with staging usage
(to allow the CPU to read the data -- i.e. the GPU readback problem) must do so with care. See Accessing
Resource Data for more details on this last case.
Performance can grind to a halt if the application tries to map a resource at the wrong time. If the
application tries to access the results of an operation before that operation is finished, a pipeline stall
will occur.
Performing a map operation at the wrong time could potentially cause a severe drop in performance by
forcing the GPU and the CPU to synchronize with each other. This synchronization will occur if the
application wants to access a resource before the GPU is finished copying it into a resource the CPU can
map.
The CPU can only read from resources created with the D3D10_USAGE_STAGING flag. Since resources
created with this flag cannot be set as outputs of the pipeline, if the CPU wants to read the data in a
resource generated by the GPU, the data must be copied to a resource created with the staging flag. The
application may do this by using the ID3D10Device::CopyResource or
ID3D10Device::CopySubresourceRegion methods to copy the contents of one resource to another. The
application can then gain access to this resource by calling the appropriate Map method. When access
to the resource is no longer needed, the application should then call the corresponding Unmap method.
For example, ID3D10Texture2D::Map and ID3D10Texture2D::Unmap. The different Map methods return
some specific values depending on the input flags. See Map Remarks section for details.
Performance Considerations
It is best to think of a PC as a machine running as a parallel architecture with two main types of
processors: one or more CPU's and one or more GPU's. As in any parallel architecture, the best
performance is achieved when each processor is scheduled with enough tasks to prevent it from going
idle and when the work of one processor is not waiting on the work of another.
455
The worst-case scenario for GPU/CPU parallelism is the need to force one processor to wait for the
results of work done by another. Direct3D 10 tries to remove this cost by making the
ID3D10Device::CopyResource and ID3D10Device::CopySubresourceRegion methods asynchronous; the
copy has not necessarily executed by the time the method returns. The benefit of this is that the
application does not pay the performance cost of actually copying the data until the CPU accesses the
data, which is when Map is called. If the Map method is called after the data has actually been copied,
no performance loss occurs. On the other hand, if the Map method is called before the data has been
copied, then a pipeline stall will occur.
Asynchronous calls in Direct3D 10 (which are the vast majority of methods, and especially rendering
calls) are stored in what is called a command buffer. This buffer is internal to the graphics driver and is
used to batch calls to the underlying hardware so that the costly switch from user mode to kernel mode
in Microsoft Windows occurs as rarely as possible.
The command buffer is flushed, thus causing a user/kernel mode switch, in one of four situations, which
are as follows.
1. IDXGISwapChain::Present is called.
2. ID3D10Device::Flush is called.
3. The command buffer is full; its size is dynamic and is controlled by the Operating System and the
graphics driver.
4. The CPU requires access to the results of a command waiting to execute in the command buffer.
Of the four situations above, number four is the most critical to performance. If the application issues a
ID3D10Device::CopyResource or ID3D10Device::CopySubresourceRegion call, this call is queued in the
command buffer. If the application then tries to map the staging resource that was the target of the
copy call before the command buffer has been flushed, a pipeline stall will occur because not only does
the Copy method call need to execute, but all other buffered commands in the command buffer must
execute as well. This will cause the GPU and CPU to synchronize because the CPU will be waiting to
access the staging resource while the GPU is emptying the command buffer and finally filling the
resource the CPU needs. Once the GPU finishes the copy, the CPU will begin accessing the staging
resource, but during this time, the GPU will be sitting idle.
Doing this frequently at runtime will severely degrade performance. For that reason, mapping of
resources created with default usage should be done with care. The application needs to wait long
enough for the command buffer to be emptied and thus have all of those commands finish executing
before it tries to map the corresponding staging resource. How long should the application wait? At
least two frames because this will enable parallelism between the CPU(s) and the GPU to be maximally
leveraged. The way the GPU works is that while the application is processing frame N by submitting calls
to the command buffer, the GPU is busy executing the calls from the previous frame, N-1.
456
So if an application wants to map a resource that originates in video memory and calls
ID3D10Device::CopyResource/ID3D10Device::CopySubresourceRegion at frame N, this call will actually
begin to execute at frame N+1, when the application is submitting calls for the next frame. The copy
should be finished when the application is processing frame N+2.
GPU finished executing calls sent from CPU during frame N. Results ready.
N+2 GPU executing calls sent from CPU during frame N+1.
GPU finished executing calls sent from CPU during frame N+1. Results ready.
N+3 GPU executing calls sent from CPU during frame N+2.
N+4 ...
457
Memory Structure and Views (Direct3D 10)
Direct3D 10 enables memory to be allocated and accessed in very flexible ways. How the runtime
accesses resources is determined in large part by how the memory is allocated.
Memory Structure
Resources can be created with varying types of memory structures. Less restrictive memory structures
offer applications more flexibility in how they use the data stored in resources, but may not be optimal
for performance.
Unstructured Resources
Unstructured resources are essentially resources allocated with a single contiguous block of memory.
This memory is allocated without any knowledge of the type of data that will be stored in it. Only buffers
can be allocated as unstructured resources.
Structured Resources
Textures are created as a structured resource. These are split into two groups.
Typeless
Typed
Typeless
In a structured, yet typeless resource, the data types of individual texels are not supplied when the
resource is first created. The application must choose from the available typeless formats. This allows
the runtime to allocate the appropriate amount of memory and generate a full mipmap chain. However,
the exact data format (whether the memory will be interpreted as integers, floating point values,
unsigned integers etc.) is not determined until the resource is bound to the pipeline with a view.
The format of each element in a typeless resource is specified when the resource is bound to a pipeline
stage using a view. Because the resource is weakly typed storage, limited reuse or reinterpretation of
the memory is enabled, as long as the component bit counts remain the same.
The same resource may be bound to multiple slots in the graphics pipeline with a view of different, fully
qualified formats at each location. For example, a resource created with the format
DXGI_FORMAT_R32G32B32A32_TYPELESS could be used as an DXGI_FORMAT_R32G32B32A32_FLOAT
and an DXGI_FORMAT_R32G32B32A32_UINT at different locations in the pipeline simultaneously.
Please see binding resources for more details.
Typed
Creating a fully typed resource restricts the resource to the format it was created with. This enables the
runtime to optimize access, especially if the resource is created with flags indicating that it cannot be
mapped by the application.
458
Since the resource is created with a specific type, any views used with this type of resource must have
identical formats as the resource itself. Resources created with a specific type cannot be reinterpreted
using the view mechanism.
Views
In Direct3D 10, texture resources are accessed with a view, which is a mechanism for hardware
interpretation of a resource in memory. A view allows a particular pipeline stage to access only the
subresources it needs, in the representation desired by the application.
A view supports the notion of a typeless resource - that is, the resource is created is of a certain size, but
exactly how the memory is interpreted is determined when the resource is bound to the pipeline. See
typeless resources for more details.
Here is an example of binding a Texture2DArray resource with 6 textures two different ways through
two different views. (Note: a subresource cannot be bound as both input and output to the pipeline
simultaneously.)
The Texture2DArray can be used as a shader resource by creating a shader resource view for it. The
resource is then addressed as an array of textures.
Using a Texture2DArray as a render target. The resource can be viewed as an array of 2D textures (6 in
this case) with mipmap levels (3 in this case).
Create a view object for a rendertarget by calling calling CreateRenderTargetView. Then call
OMSetRenderTargets to set the rendertarget view to the pipeline. Render into the rendertargets by
calling Draw and using the RenderTargetArrayIndex to index into the proper texture in the array. You
can use a subresource (a mipmap level, array index combination) to bind to any array of subresources.
So you could bind to the second mipmap level and only update this particular mipmap level if you
wanted like this:
459
Figure 2. Views can access an array of Subresources
Render-Target Views
Creating a rendertarget view for a resource will enable the user to bind the resource to the pipeline as a
rendertarget. A rendertarget is a resource that will be written to by the output merger stage at the end
of a render pass; after a render pass the rendertargets contain the color information of a rendered
image. When creating a rendertarget view, only one mipmap level of a texture resource may be used as
a rendertarget. When binding a rendertarget view to the pipeline, there will be a corresponding depth
stencil view that will be used with the rendertarget in the output merger stage.
In Direct3D 10, you no longer bind a resource directly to the pipeline, you create a view of a resource,
and then set the view to the pipeline. This allows validation and mapping in the runtime and driver to
occur at view creation, minimizing type checking at bind-time.
460
DXGI Overview (Direct3D 10)
DirectX Graphics Infrastructure (DXGI) recognizes that some parts of graphics evolve more slowly than
others. The primary goal of DXGI is to manage low-level tasks that can be independent of the DirectX
graphics runtime. DXGI provides a common framework for future graphics components, the first
component that takes advantage of DXGI is Direct3D 10.
In previous versions of Direct3D, low-level tasks like enumeration of hardware devices, presenting
rendered frames to an output, controlling gamma, and managing a full-screen transition were included
in the 3D runtime. These tasks are now implemented in DXGI.
DXGI's purpose is to communicate with the kernel mode driver and the system hardware.
An application has the option of talking to DXGI directly, or calling the Direct3D APIs in D3D10Core
(which handles the communications with DXGI for you). You may want to work with DXGI directly if your
application needs to enumerate devices or control how data is presented to an output.
Enumerating Adapters
An adapter is an abstraction of the hardware and the software capability of your computer. There are
generally many adapters on your machine. Some devices are implemented in hardware (like your video
card) and some are implemented in software (like the Direct3D reference rasterizer). Adapters
implement functonality used by a graphic application. The following diagram shows a system with a
single computer, two adapters (video cards) and three output monitors.
461
When enumerating these pieces of hardware, DXGI creates an IDXGIOutput interface for each output (or
monitor) and an IDXGIAdapter interface for each video card (even if it is a video card built into a
motherboard). Enumeration is done by using an IDXGIFactory interface in two stages. First, call
IDXGIFactory::EnumAdapters to return a set of IDXGIAdapter interfaces that represent the video
hardware. Second, call IDXGIAdapter::CheckInterfaceSupport to see if the video hardware is associated
with a driver that supports the API you wish to use (such as Direct3D 10).
If your application is calling Direct3D, there is no need to enumerate devices since the DirectX runtime
will create a default adapter interface when D3D10CreateDevice is called.
Presentation
Your application's job is to render frames and ask DXGI to present those frames to the output. For
performance, this requires at least two buffers so that the application can render one buffer while
presenting another one. There may be more than two buffers required depending on the time it takes to
render a frame or the desired frame rate for presentation. The set of buffers created is called a swap
chain.
462
Figure 1. Swap Chain
A swap chain has one front buffer and one or more back buffers. Each application is responsible for
creating its own swap chain. To maximize the speed of the presentation of the data to an output, a swap
chain is almost always created in the memory in a display sub-system.
The display sub-system (which is often a video card but could be implemented on a motherboard)
contains a GPU, a digital to analog converter (DAC) and memory. The swap chain is allocated within this
memory to make presentation very fast. The display sub-system is responsible for presenting the data in
the front buffer to the output.
Direct3D 10 is the first graphics component to use DXGI. DXGI has some different swap chain behaviors.
In DXGI, a swap chain is tied to a window when the swap chain is created. This change improves
performance and saves memory. Previous versions of Direct3D allowed the swap chain to
change the window that the swap chain is tied to.
In DXGI, a swap chain is tied to a rendering device on creation. A change to the rendering device
requires the swap chain to be recreated.
A swap chain's buffers are created at a particular size and in a particular format. The application
specifies these values (or you can inherit the size from the target window) at startup, and can then
optionally modify them as the window size changes in response to user input or program events.
463
Once the swapchain has been created, you will typically want to render images into it. Here's a code
fragment that sets up a Direct3D10 context to render into a swapchain. This code extracts a buffer from
the swapchain, creates a render-target-view from that buffer, then sets it on the device:
ID3D10Resource * pBB;
ThrowFailure( pSwapChain->GetBuffer(0, __uuidof(pBB),
reinterpret_cast<void**>(&pBB)), "Couldn't get back buffer");
ID3D10RenderTargetView * pView;
D3D10_RENDER_TARGET_VIEW_DESC rtd;
rtd.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
rtd.ViewDimension = D3D10_RTV_DIMENSION_TEXTURE2D;
rtd.Texture2D.MipSlice = 0;
ThrowFailure( pD3D10Device->CreateRenderTargetView(pBB, &rtd, &pView),
"Couldn't create view" );
pD3D10Device->OMSetRenderTargets(1, &pView, 0);
If you've previously called IDXGIFactory::MakeWindowAssociation, the user can press the Alt-Enter key
combination and DXGI will transition your application between windowed and fullscreen mode . Or you
can disable Alt-Enter by not calling IDXGIFactory::MakeWindowAssociation.
IDXGIFactory::MakeWindowAssociation is recommended, because a standard control mechanism for
the user is strongly desired.
While you don't have to write any more code than has been described, a few simple steps can make
your application more responsive. The most important consideration is the resizing of the swap chain's
buffers in response to the resizing of the output window. Naturally, the application's best route is to
respond to WM_SIZE, and call IDXGISwapChain::ResizeBuffers, passing the size contained in the
message's parameters. This behavior obviously makes your application respond well to the user when
he or she drags the window's borders, but it is also exactly what enables a smooth fullscreen transition.
Your window will receive a WM_SIZE message whenever such a transition happens, and calling
IDXGISwapChain::ResizeBuffers is the swap chain's chance to re-allocate the buffers' storage for optimal
presentation. This is why the application is required to release any references it has on the existing
buffers before it calls IDXGISwapChain::ResizeBuffers.
IDXGISwapChain::Present will inform you if your output window is entirely occluded via
DXGI_STATUS_OCCLUDED. When this occurs, it is recommended that your application go into standby
mode (by calling IDXGISwapChain::Present with DXGI_PRESENT_TEST) since resources used to render
the frame are wasted. Using DXGI_PRESENT_TEST will prevent any data from being presented while still
464
performing the occlusion check. Once IDXGISwapChain::Present returns S_OK, you should exit standby
mode; do not use the return code to switch to standby mode as doing so can leave the swapchain
unable to relinquish fullscreen mode.
To resize the output while either fullscreen or windowed, then it's advisable to call
IDXGISwapChain::ResizeTarget, since this method resizes the target window also. Since the target
window is resized, WM_SIZE is sent, and your code will naturally call IDXGISwapChain::ResizeBuffers in
response. It's thus a waste of effort to resize your buffers, and then subsequently resize the target.
Mode Switches
The DXGI swap chain might change the display mode of an output when making a full-screen transition.
To enable the automatic display mode change, you must specify
DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH in the swap-chain description. If the display mode
autoatically changes, DXGI will choose the most modest mode (size and resolution will not change, but
the color depth may). Resizing swap chain buffers will not cause a mode switch. The swap chain makes
an implicit promise that if you choose a back buffer that exactly matches a display mode supported by
the target output, then it will switch to that display mode when entering fullscreen mode on that
output. Consequently, you choose a display mode by choosing your back buffer size and format.
465