0% found this document useful (0 votes)

383 views40 pages

Advanced Deferred Shading Techniques

The document discusses optimization techniques for deferred shading rendering pipelines. It describes a fully deferred pipeline that renders scene geometry to generate G-Buffer textures storing material properties, then performs lighting calculations by rendering light volumes. It also describes a light pre-pass method that renders normals and lighting to buffers in separate passes before combining them. The document notes pros and cons of each approach and discusses ways to reduce costs such as using fewer render targets, avoiding slow formats, and culling unnecessary lighting calculations.

Uploaded by

Nicolas Bertoa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPS, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

383 views40 pages

Advanced Deferred Shading Techniques

Uploaded by

Nicolas Bertoa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPS, PDF, TXT or read online on Scribd

You are on page 1/ 40

Deferred Shading

Optimizations
Nicolas Thibieroz, AMD
[email protected]

Fully Deferred Engine

Building Pass
Render unique scene geometry pass into
G-Buffer RTs

Store material properties (albedo, normal,

specular, etc.)
Write to depth buffer as normal

GGBuffer
Buffer
MRTs
MRTs

Depth
Buffer

G-Buffer

Fully Deferred Engine

Passes
GGBuffer
Buffer
MRTs
MRTs

Add lighting contributions

into accumulation buffer

Use G-Buffer RTs as inputs

Render geometries
enclosing light area

Accum
.
Buffer

Shading
Depth
Buffer

Fully Deferred: Pros and

Cons

Scene geometry
decoupled from lighting
Shading/lighting only
applied to visible
fragments
Reduction in Render
States
G-Buffer already produces
data required for postprocessing

Significant engine rework

Requires more memory
Costly and complex MSAA
Forward rendering required
for translucent objects

Light Pre-pass
Render 1st geometry pass into
normal (and depth) buffer

Render Normals

Depth
Buffer

Uses a single color RT

No Multiple Render Targets
required

Normal
Buffer

Light Pre-pass

Lighting Accumulation

Normal
Buffer

Depth
Buffer

Perform all lighting

calculation into light
buffer

Use normal and depth

buffer as input textures
Render geometries
enclosing light area
Write LightColor * N.L *
Attenuation in RGB,
specular in A

Light
Buffer

Light Pre-pass Combine lighting with

materials
Render 2nd geometry
pass
using light buffer as
input

Fetch geometry
material
Combine with light
data

Light
Buffer

Depth
Buffer

Output

Light Pre-pass: Pros and

Cons

Scene geometry
decoupled from lighting
Shading/lighting only
applied to visible
fragments
G-Buffer already produces
data required for postprocessing
One material fetch per
pixel regardless of number
of lights

Significant engine rework

Costly and complex MSAA
Forward rendering required for
translucent objects
Two scene geometry passes
required
Unique lighting model

Semi-Deferred: Other
Methods

Light-indexed Deferred Rendering

Store ids of visible lights into light buffer

Using stencil or blending to mark light ids

Deferred Shadows
Most basic form of deferred rendering
Perform shadowing from screen-sized depth buffer
Most graphic engines now employ deferred shadows

G-Buffer Building Pass

(Fully Deferred)

G-Buffer Building Pass

Export Cost

GPUs can be
bottlenecked by
export cost

Pixel
Shader

Export cost is the cost of

writing PS outputs into RTs

Common scenario as PS
is typically short for this
pass!

Argh!
MRT
#0

MRT
#1

MRT
#2

G-Buffer

MRT
#3

Reducing Export Cost

Render objects in front-to-back order
Use fewer render targets in your MRT config
This also means less fetches during shading
passes
And less memory usage!

Avoid slow formats

Export Cost Rules

AMD GPUs
Each RT adds to export
cost
Avoid slow formats:
R32G32B32A32, R32G32, R32,
R32G32B32A32f, R32G32f,
R16G16B16A16.
+ R32F, R16G16, R16 on older GPUs

Total export cost =

(Num RTs) * (Slowest RT)

nVidia GPUs
Each RT adds to
export cost
RT export cost
proportional to bit
depth except:
<32bpp same speed as 32bpp
sRGB formats are slower
1010102 and 111110 slower
than 8888

Reducing Export Cost

Depth Buffer as Texture Input
No need to store depth into a color RT
Simply re-use the depth buffer as texture
input during shading passes
The same Depth buffer can remain bound
for depth rejection in DX11

Reducing Export Cost

Data Packing
Trade render target storage for a few extra ALU instructions
ALUs used to pack / unpack data
Example: normals with two components + sign

ALU cost is typically negligible compared to the performance

saving of writing and fetching to/from fewer textures
Aggressive packing may prevent filtering later on!
E.g. During post-process effects

Shading Passes
(Full and Semi-Deferred)

Light Processing
Add light contributions to accumulation buffer
Can use either:
Light volumes
Screen-aligned quads

In all cases:
Cull lights as needed before sending them to the
GPU
Dont render lights on skybox area

Light Volume Rendering

Render light volumes corresponding to lights
range

Fullscreen tri/quad (ambient or directional light)

Sphere (point light)
Cone/pyramid (spot light)
Custom shapes (level editor)

Tight fit between light coverage and processed

area
2D projection of volume define shaded area
Additively blend each light contribution to the
accumulation buffer
Use early depth/stencil culling optimizations

Light Volume Rendering

Full slides available in

backup section

Light Volume Rendering

Geometry Optimization
Always make sure your light volumes are
geometry-optimized!
For both index re-use (post VS cache) and sequential
vertex reads (pre VS cache)
Common oversight for algorithmically generated meshes
(spheres, cones, etc.)
Especially important when depth/stencil-only rendering is
used!!
No pixel shader = more likely to be VS fetch limited!

Screen-Aligned Quads
Far

Alternative to light volumes: render

a camera-facing quad for each light
Quad screen coordinates need to cover
the extents of the light volume

Light

Simpler geometry but coarser

rendering
Not as simple as it seems
Spheres (point lights) project to ellipses
in post-perspective space!
Can cause problems when close to
camera

Near

Camera

Points lights as quads

Incorrect sphere quad enclosure

Correct sphere quad enclosure

SwapChain:

Screen-Aligned Quads 2

Additively render each quad onto

accumulation buffer

LMaxZ

Process light equation as normal

Set quad Z coordinates to Min Z of light

Early Z will reject lights behind geometry with
Z Mode = LESSEQUAL

Watch out for clipping issues

Need to clamp quad Z to near clip plane Z if:
Light MinZ < Near Clip Plane Z < Light MaxZ

Saves on geometry cost but not as accurate

as volumes

LMinZ

DirectCompute Lighting

See Johan Anderssons presentation

Accessing Light Properties

Avoid using dynamic constant
buffer indexing in Pixel Shader
This generates redundant
memory operations repeated for
every pixel
Instead fetch light properties
from CB in VS (or GS)
And pass them to PS as
interpolants
No actual interpolation needed
Use nointerpolation to reduce
number of shader instructions

struct LIGHT_STRUCT
PS_QUAD_INPUT
VS_PointLight(VS_INPUT i)
{
float4 vColor;Out=(PS_QUAD_INPUT)0;
PS_QUAD_INPUT
float4 vPos;
};// Pass position
cbuffer
cbPointLightArray
Out.vPosition
= float4(i.vNDCPosition, 1.0);
{
LIGHT_STRUCT
//
Pass lightg_Light[NUM_LIGHTS];
properties to PS
};uint uIndex = i.uVertexIndex/4;
Out.vLightColor = g_Light[uIndex].vColor;
float4
PS_PointLight(PS_INPUT
i) : SV_TARGET
Out.vLightPos
= g_Light[uLightIndex].vPos;
{
// ... Out;
return
} uint uIndex = i.uPrimIndex/2;
float4 vColor
= g_Light[uIndex].vColor;
float4
vLightPos = g_Light[uIndex].vPos;
struct
PS_QUAD_INPUT
{ // ...
nointerpolation float4 vLightColor: LCOLOR;
nointerpolation float4 vLightPos : LPOS;
float4 vPosition
: SV_POSITION;
};

Texture Read Costs

Shading passes fetch G-Buffer data for each sample
Make sure point sampling filtering is used!
AMD: Point sampling filtering is fast for all formats
nVidia: prefer 16F over 32F

Post-processing passes may require filtering...

AMD: watch out for slow bilinear
formats
DXGI_FORMAT_R32G32_*
DXGI_FORMAT_R16G16B16A16_*
DXGI_FORMAT_R32G32B32[A32]_*

nVidia: no penalty for using bilinear

over point sampling filtering for
formats < 128 bpp

Blending Costs

Additively blending lights into accumulation buffer is not free

Higher blending cost when fatter color RT formats are used
Blending even more expensive when MSAA is enabled
Use Discard() to get rid of pixels not contributing any light
Use this regardless of the light processing method used
if ( dot(vColor.xyz, 1.0) == 0 ) discard;

Can result in a significant increase in performance!

MultiSampling Anti-Aliasing
MSAA with (semi-) deferred engines more
complex than just enabling MSAA
Deferred render targets must be
multisampled
Increase memory cost considerably!

Each qualifying sample must be individually lit

Impacts performance significantly

MultiSampling Anti-Aliasing
2

Detecting pixel edges reduce processing cost

Per-pixel shading on non-edge pixels
Per-sample shading on edge pixels

Edge detection via centroid is a neat trick, but is not that useful
Produces too many edges that dont need to be shaded per sample
Especially when tessellation is used!!
Doesnt detect edges from transparent textures

Better to detect edges checking depth and normal

discontinuities
Or consider alternative FSAA methods...

MSAA Edge
Detection

Conclusion

Questions?

[email protected]

Backup

Light Volume Rendering

Early Z culling Optimizations 1
When camera is inside the light volume
Set Z Mode = GREATER
Render volumes back faces

Only samples fully inside the volume

get shaded
Optimal use of early Z culling
No need for stencil
High efficiency
Depth test passes
Depth test fails

Light Volume Rendering

Early Z culling Optimizations 2a
Previous optimization does not
work if camera is outside volume!
Back faces also pass the
Z=GREATER test for objects in
front of volume
Those objects shouldnt be lit

This results in wasted processing!

Depth test passes
Depth test fails

Light Volume Rendering

Early Z culling Optimizations 2b
Alternative:
When camera is outside the light
volume:
Set Z Mode = LESSEQUAL
Render volumes front faces

Solves the case for objects in front of

volume
Depth test passes
Depth test fails

Light Volume Rendering

Early Z culling Optimizations 2c
Alternative:
When camera is outside the light volume:
Set Z Mode = LESSEQUAL
Render volumes front faces

Solves the case for objects in front of

volume
But generates wasted processing for
objects behind the volume!
Depth test passes
Depth test fails

Light Volume Rendering

Early stencil culling Optimizations
Stencil can be used to mark samples
inside the light volume
Render volume with stencil-only pass:

Clear stencil to 0
Z Mode = LESSEQUAL
If depth test fails:
Increment stencil for back faces
Decrement stencil for front faces

Render some geometry where stencil !=

Depth test passes

Depth test fails

-1

8how Might Prototyping Be Used As Part of The SDLC
No ratings yet
8how Might Prototyping Be Used As Part of The SDLC
3 pages
PHP & MySQL Tutorial: Product Table
No ratings yet
PHP & MySQL Tutorial: Product Table
9 pages
Optical Storage Media Basics
100% (1)
Optical Storage Media Basics
5 pages
Use Case Analysis for Apartment Management
No ratings yet
Use Case Analysis for Apartment Management
37 pages
Safedisc
No ratings yet
Safedisc
35 pages
Object Lifetime and Storage Management
No ratings yet
Object Lifetime and Storage Management
16 pages
Understanding Network Basics and Topologies
No ratings yet
Understanding Network Basics and Topologies
51 pages
Advanced Java and J2EE
No ratings yet
Advanced Java and J2EE
2 pages
Software Architecture in Context of MDSD
No ratings yet
Software Architecture in Context of MDSD
3 pages
Component Level Design and Components Level Design For Web Apps
No ratings yet
Component Level Design and Components Level Design For Web Apps
11 pages
0307
No ratings yet
0307
39 pages
22 5COSC020W LECT03 Mapping
No ratings yet
22 5COSC020W LECT03 Mapping
34 pages
Internship Report For 2 Weeks (Transmission)
No ratings yet
Internship Report For 2 Weeks (Transmission)
11 pages
Inern Report NLC
No ratings yet
Inern Report NLC
26 pages
Motherboard Architecture
100% (3)
Motherboard Architecture
4 pages
Association Aggregation-Final
No ratings yet
Association Aggregation-Final
73 pages
Patterson6e MIPS Ch05 Modified Part2
No ratings yet
Patterson6e MIPS Ch05 Modified Part2
121 pages
Chapter 4 - Architectural Design and User Interface Design (PART 1)
No ratings yet
Chapter 4 - Architectural Design and User Interface Design (PART 1)
40 pages
Two Dimensional Transformations: Helena Wong, 2001
No ratings yet
Two Dimensional Transformations: Helena Wong, 2001
35 pages
C21 - Me - Iv Sem
No ratings yet
C21 - Me - Iv Sem
101 pages
Lab-1 PPT Dsa-Bcsl305
No ratings yet
Lab-1 PPT Dsa-Bcsl305
13 pages
Hospital Management System DFDs
No ratings yet
Hospital Management System DFDs
40 pages
Data Flow Diagrams (DFDS)
No ratings yet
Data Flow Diagrams (DFDS)
38 pages
CS603-PPT Slides by Pin and Muhammad
No ratings yet
CS603-PPT Slides by Pin and Muhammad
1,563 pages
Cache Memory Tutorial Guide
No ratings yet
Cache Memory Tutorial Guide
6 pages
BMIT2023 Chapter 01
No ratings yet
BMIT2023 Chapter 01
63 pages
SE - MODULE 2 - ch2
No ratings yet
SE - MODULE 2 - ch2
18 pages
On The Cover: October 2002, Volume 8 Number 10
No ratings yet
On The Cover: October 2002, Volume 8 Number 10
41 pages
Lectures 1 and 2 CEN545
100% (1)
Lectures 1 and 2 CEN545
10 pages
Case Study On Dbms & Rdbms
No ratings yet
Case Study On Dbms & Rdbms
36 pages
Distributed Databases, NOSQL Systems and BIGDATA
No ratings yet
Distributed Databases, NOSQL Systems and BIGDATA
62 pages
Soft Computing Assignment
100% (1)
Soft Computing Assignment
13 pages
CASE STUDY-Functional and Non Functional
0% (1)
CASE STUDY-Functional and Non Functional
3 pages
Teamcenter Enterprise Guide
No ratings yet
Teamcenter Enterprise Guide
148 pages
Entity Framework Model Setup Guide
No ratings yet
Entity Framework Model Setup Guide
19 pages
Computer Graphics Course Code: 3331602
No ratings yet
Computer Graphics Course Code: 3331602
7 pages
Computer Graphics & Multimedia Overview
No ratings yet
Computer Graphics & Multimedia Overview
98 pages
Final Report
No ratings yet
Final Report
8 pages
Chapter-4 Computer Visualization
No ratings yet
Chapter-4 Computer Visualization
55 pages
Cse327 Lecture 1 Mma1
No ratings yet
Cse327 Lecture 1 Mma1
41 pages
Steganography Data Flow Diagrams
No ratings yet
Steganography Data Flow Diagrams
14 pages
To Object-Oriented Modeling Techniques
No ratings yet
To Object-Oriented Modeling Techniques
33 pages
CS 300 Midterm II Exam Instructions
No ratings yet
CS 300 Midterm II Exam Instructions
10 pages
La Presse Digital Shift Analysis
No ratings yet
La Presse Digital Shift Analysis
5 pages
Lect3-Filtering in Spatial Domain I
No ratings yet
Lect3-Filtering in Spatial Domain I
55 pages
Chapter 12 A
No ratings yet
Chapter 12 A
57 pages
Cascaded Light Propagation Volumes For Real-Time Indirect Illumination
No ratings yet
Cascaded Light Propagation Volumes For Real-Time Indirect Illumination
9 pages
Final Cut Pro User Manual
No ratings yet
Final Cut Pro User Manual
2,033 pages
DSA Project
No ratings yet
DSA Project
7 pages
Electronics Computing Assignment
No ratings yet
Electronics Computing Assignment
6 pages
Types of Video Card
No ratings yet
Types of Video Card
22 pages
Problems With Social Media and Their Solutions
No ratings yet
Problems With Social Media and Their Solutions
12 pages
Apple’s Evolution: From Struggles to Success
No ratings yet
Apple’s Evolution: From Struggles to Success
1 page
Resume Sample
No ratings yet
Resume Sample
3 pages
Canadian Document Management App
No ratings yet
Canadian Document Management App
61 pages
Use Case for Seller Registration
No ratings yet
Use Case for Seller Registration
5 pages
CP4P File Systems and Visual Studio
No ratings yet
CP4P File Systems and Visual Studio
38 pages
Siggraph2016 Idtech6
No ratings yet
Siggraph2016 Idtech6
58 pages
GDC09 Valient Rendering Technology of Killzone 2 Extended Presenter Notes PDF
No ratings yet
GDC09 Valient Rendering Technology of Killzone 2 Extended Presenter Notes PDF
8 pages
Practical Clustered Shading
No ratings yet
Practical Clustered Shading
41 pages
ACT 202204 Form Z08 PDF
No ratings yet
ACT 202204 Form Z08 PDF
8 pages
Kidus Yared
No ratings yet
Kidus Yared
5 pages
Signal Flow Graph
No ratings yet
Signal Flow Graph
16 pages
Prac 1-10 SE
No ratings yet
Prac 1-10 SE
53 pages
JumboRemoteManual 538744 PDF
No ratings yet
JumboRemoteManual 538744 PDF
33 pages
MASS: Customer Fields Are Not Mass-Maintainable: Symptom
No ratings yet
MASS: Customer Fields Are Not Mass-Maintainable: Symptom
3 pages
Fluorescent Lamp Color Selection Guide
No ratings yet
Fluorescent Lamp Color Selection Guide
6 pages
Esd CAN-USB3-FD - 1 Port CAN FD
No ratings yet
Esd CAN-USB3-FD - 1 Port CAN FD
1 page
Part 5 Floating Point Add Sub Mul
No ratings yet
Part 5 Floating Point Add Sub Mul
20 pages
Module 6 - Reading1 - NetworksandTelecommunications
No ratings yet
Module 6 - Reading1 - NetworksandTelecommunications
8 pages
Soft Starter Setup Guide
No ratings yet
Soft Starter Setup Guide
24 pages
IT Grade 7 Students Text Zero Draft S
No ratings yet
IT Grade 7 Students Text Zero Draft S
41 pages
Quick Tour: Accurate Tracking Easy Installation
No ratings yet
Quick Tour: Accurate Tracking Easy Installation
7 pages
Brochure AVEVA InTouch2023 Overview 22-07
No ratings yet
Brochure AVEVA InTouch2023 Overview 22-07
3 pages
Strengthening Cybersecurity in Africa
No ratings yet
Strengthening Cybersecurity in Africa
9 pages
LTE Handover for Telecom Students
No ratings yet
LTE Handover for Telecom Students
17 pages
253 Companies Hiring Remotely in December (391 Jobs)
No ratings yet
253 Companies Hiring Remotely in December (391 Jobs)
78 pages
Guided Radar Measurement Time-of-Flight Levelflex FMP55
No ratings yet
Guided Radar Measurement Time-of-Flight Levelflex FMP55
4 pages
Integrated Project Delivery With Blockchain: An Automated Financial System
No ratings yet
Integrated Project Delivery With Blockchain: An Automated Financial System
17 pages
Csat 2025 Quant Based Reasoning
No ratings yet
Csat 2025 Quant Based Reasoning
12 pages
PQ167 Quick Start: Using The Pq100 Immediately
No ratings yet
PQ167 Quick Start: Using The Pq100 Immediately
16 pages
INFO2320 Project Guide
0% (1)
INFO2320 Project Guide
5 pages
ACR Nauticast™-B: The Science of Survival
No ratings yet
ACR Nauticast™-B: The Science of Survival
2 pages
SkypeLync ProgramOverview 1 0 Final
No ratings yet
SkypeLync ProgramOverview 1 0 Final
18 pages
PowerShell - Docs - Learning-Powershell at Master PowerShell - PowerShell GitHub
No ratings yet
PowerShell - Docs - Learning-Powershell at Master PowerShell - PowerShell GitHub
4 pages
Rpcgen Tutorial (ONC+ Developer's Guide)
No ratings yet
Rpcgen Tutorial (ONC+ Developer's Guide)
1 page
Software Architecture and UML
No ratings yet
Software Architecture and UML
6 pages
Changzhou Wantai Electrical Appliance Co., LTD: Product Features
No ratings yet
Changzhou Wantai Electrical Appliance Co., LTD: Product Features
10 pages
XMR-X5 Body Camera Specs
No ratings yet
XMR-X5 Body Camera Specs
2 pages
Downloaded From Manuals Search Engine
No ratings yet
Downloaded From Manuals Search Engine
65 pages

Advanced Deferred Shading Techniques

Uploaded by

Advanced Deferred Shading Techniques

Uploaded by

Deferred Shading

Fully Deferred Engine

Store material properties (albedo, normal,

Fully Deferred Engine

Add lighting contributions

Use G-Buffer RTs as inputs

Fully Deferred: Pros and

Significant engine rework

Uses a single color RT

Perform all lighting

Use normal and depth

Light Pre-pass Combine lighting with

Light Pre-pass: Pros and

Significant engine rework

Light-indexed Deferred Rendering

Store ids of visible lights into light buffer

G-Buffer Building Pass

G-Buffer Building Pass

Export cost is the cost of

Reducing Export Cost

Avoid slow formats

Export Cost Rules

Total export cost =

Reducing Export Cost

Reducing Export Cost

ALU cost is typically negligible compared to the performance

Light Volume Rendering

Fullscreen tri/quad (ambient or directional light)

Tight fit between light coverage and processed

Light Volume Rendering

Full slides available in

Light Volume Rendering

Alternative to light volumes: render

Simpler geometry but coarser

Points lights as quads

Incorrect sphere quad enclosure

Correct sphere quad enclosure

Additively render each quad onto

Process light equation as normal

Set quad Z coordinates to Min Z of light

Watch out for clipping issues

Saves on geometry cost but not as accurate

See Johan Anderssons presentation

Accessing Light Properties

Texture Read Costs

Post-processing passes may require filtering...

nVidia: no penalty for using bilinear

Additively blending lights into accumulation buffer is not free

Can result in a significant increase in performance!

Each qualifying sample must be individually lit

Detecting pixel edges reduce processing cost

Better to detect edges checking depth and normal

Light Volume Rendering

Only samples fully inside the volume

Light Volume Rendering

This results in wasted processing!

Light Volume Rendering

Solves the case for objects in front of

Light Volume Rendering

Solves the case for objects in front of

Light Volume Rendering

Render some geometry where stencil !=

Depth test passes

You might also like