100% found this document useful (1 vote)

3K views58 pages

GPU & Game Development Insights

Introduction to GPU Compute (think Cuda, OpenCL, DirectCompute...) for "non-GPU" programmers. Slight focus on the videogame bussiness.

Uploaded by

c0de517e.blogspot.com

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

3K views58 pages

GPU & Game Development Insights

Introduction to GPU Compute (think Cuda, OpenCL, DirectCompute...) for "non-GPU" programmers. Slight focus on the videogame bussiness.

Uploaded by

c0de517e.blogspot.com

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

1

So I found this batman image on the net. I had to put it somewhere in the presenta7on. Here it is. Nvidia has a page on its cuda zone showing applica7ons and speedups, its a bit silly and its not so adver7zed anymore nowadays, but s7ll useful as an introduc7on hCp://www.nvidia.com/object/cuda-apps-ash-new.html# Intels Knights Corner news hCp://www.intel.com/pressroom/archive/releases/ 2010/20100531comp.htm hCp://en.wikipedia.org/wiki/Intel_MIC#Knights_Corner

A wild and not even a par7cularly crea7ve example... More computa7on could enable en7rely novel scenarios... The past couple of Unreal engine demos were architected to show some of what it could be possible

I like the idea of mo7ongraphs hCp://cg.cis.upenn.edu/hms/research/wcMG/ Also, imperfect shadowmaps are cool hCp://cg.ibds.kit.edu/english/publica7ons.php The two images top right are for megameshes (Lionhead) and megatextures (ID) Fight night round 4 and Champion are the rst games using EAs Real AI, which is trained from examples

Naughy dog famously hand op7mizes SPU kernels. How many games do pack so much computa7on, even today, on SPUs? And s7ll, preCy uniform workloads, on a processor that can handle non-uniform loads beCer than compute (lower SIMD width). Heart bokeh image taken from hCp://www.augus7nefou.com/2009/08/bokeh-lter-turns-blurry-lights-into.html, I like how the woman is pushed out of the frame to show the bokeh, as a metaphor of how we did some eects early on 360 and ps3

* Data parallel is harder for more high level tasks, but its far from impossible. In fact, we s7ll rely on lots of data, just we let this data be combed through with mostly handmade serial logic Use learning, classica7on, probabilis7c, op7miza7on methods for games... I.E. in anima7on, we have data, go from blend trees to anima7on clips all down to work with individual poses, use anima7on data itself and its metrics to drive anima7on, not logic! hCp://graphics.stanford.edu/projects/ccclde/ccclde.pdf hCp://www.stanford.edu/~svlevine/ Another avenue: simula7on, modeling, physics The last image is from hCp://www.crea7veapplica7ons.net/theory/meibach-and-posavec-data-visualiza7on- poetry-and-sculpture/ think about what can you do with all the data we already have

Usually also superscalar, mul7ple pipes, SIMD but for now we dont care

Ok, ok, the last batman

...but ooen is not, its surprising (but consider that ooen the compiler does not know if a given loop is executed ooen or not) how much manual unrolling (and prefetching on plaporms without strong cache predic7ors) in 7ght loops s7ll pays, allowing the compiler to then compute a good schedule without dependencies of the generated assembly.

First insight we learned

Again, same reasoning

Fibers and Corou7nes are i.e. used to hide I/O and network access in web servers (a popular example is node.js)

For a sequen7al CPU loop, all this work done *in the specied order*. The compiler may reschedule instruc7ons without breaking the logical order of opera7ons, and the OOO CPU (i.e. PC, but not Ps3 or 360) will re-order instruc7ons (well, microinstruc7ons) to get rid of dependencies, but the processing order is essen7ally preserved. By contrast, on the GPU, all you're saying is you want that whole array processed, you don't care about the order.

Of course there is other stu too, and never mistake a logical model for a physical one, but this is enough for everything well need

Vectors are usually even larger than the number of ALUs. Repeat the same instruc7on a number of 7mes to process an en7re vector (allows ALUs to be pipelined, i.e. a vector of 16 oat4 elements might be processed by four oat4 ALUs issuing 4 at a 7me, each ALU pipeline stage can compute a single oat per cycle and with four stages we achieve the oat4 processing with a throughput of one per cycle and a latency of four). We say threads here but they are more similar to lightweight bers. Also, confusingly, dierent GPU languages call the same concepts in dierent ways The more computa7onal threads we can pack in the registers, the more independent work well have to hide latencies! This lightweight threading mechanism is fundamental for performance with manycores, see i.e. hCp://www.linleygroup.com/newsleCers/ newsleCer_detail.php?num=4747&year=2011&tag=3 which seems a deriva7ve of the Tilera Tile64 hCp://www.7lera.com/products/processors/TILE64

Illustra7on stole from Running Code at a Teraop: Overview of GPU Architecture, Kayvon Fatahalian, Siggraph 2009 Some GPUs do not support variable length loops, everything gets unrolled and inlined...

I.E. Pixars Reyes rendering algorithm had interpreted, very parallel shaders. The cost of the interpreter is zero when you repeat the same instruc7on over thousands of elements. Reyes shaders are essen7ally the same as the hardware shaders we have on modern GPUs Note that GPU shaders have some explicit SIMD instruc7ons in HLSL and similar languages, suppor7ng vectors of elements up to 4-wide. In prac7ce, many modern GPUs process only scalar code, turning vector instruc7on into scalars. This is because the SIMD parallelism that GPUs implement is much wider than the 4-wide instruc7ons of the language, coupling many (i.e. 16 or more) execu7on elements together in a big execu7on vector. This regardless of if each element is scalar or 4-way, some GPUs even have a 4+1 congura7on (one 4-way and one scalar instruc7on in parallel per each execu7on element). In general when we talk about GPU vectors we dont mean the actual vector instruc7ons in the shader code but the aggrega7on of many ALUs following the same code.

Skinning images from hCp://blog.wolre.com/2009/11/volumetric-heat-diusion-skinning/ If youve done rendering on the GPU, you might recognize that the pseudo-code looks nothing like a vertex shader, where you dont fetch data but its passed as the parameters of the main func7on of a kernel. The data is laid out in GPU memory and bound via streams, constants and other means. I didnt include a real vertex shader code example, even if it would not have been harder to read, because it might even be misleading. The way DirectX or OpenGL expose programmable func7onality to the user might be close to the hardware, but most ooen nowadays is not. I.E. shaders might indeed compile to include explicit fetches... Pixel shaders can kill their output, so to be pedan7c they can output one or zero elements. Also, pixel processing might be culled early, this is very important. The independence that I assert is not en7rely true. Pixel shaders need deriva7ve informa7on for all the variables, these are computed using nite dierences from the neighboring pixels which not only constrains them to always be processed in 2x2 pixel units (quads) but also establishes through the deriva7ves (ddx, ddy), a mean of communica7on (i.e. see Eric Penners Shader Amor7za7on using Pixel Quad Message Passing GPU Pro 2)

Well see later some more accurate pseudo-code for now lets s7ll pretend to work with a generic in-parallel

* At least in our representa7on of the GPU... Really compilers can decide to spill some of the local state onto the shared memory (well see later) and allow for a stack and so on, but that is quite a dierent thing... I.E. Nvidia Fermi has a limit of 64 temporary registers in kernel code (number of registers per core is higher, but it will be used for mul7ple contexts)

ALL PSEUDO-CODE DOES NOT DO BOUNDS CHECKS J

hCp://en.wikipedia.org/wiki/Sor7ng_network hCp://www.cs.brandeis.edu/~hugues/sor7ng_networks.html hCp://www.sciencedirect.com/science/ar7cle/pii/0020019089901968 hCp://www.springerlink.com/content/g002019156467227/ Tangen7ally related, but interes7ng hCp://www.pvk.ca/Blog/2012/08/27/tabasco-sort-super-op7mal-merge-sort/

Rendering is a non-uniform workload, fully automated scheduling has its benets early-z early-stencil rejec7on, which performs an hardware, automated stream compac7on! Fundamental for rendering...

At rst its easy to complain about bank rules and cohalescing issues when trying to access shared and global memory... But we have to consider that the GPU oers the ability of accessing dierent memory loca7ons per SIMD element, on many of them on many cores... This is a rather impressive feat, on standard CPUs to load data into SIMD vectors you have to do it serially, one element at a 7me. Memory has addressing limita7ons, its a limit of the technology. SIMD communica7on via vo7ng: all, any, ballot Atomics: Add, Sub, Inc, Dec, And, Or, Xor, Exch, Min, Max, CompareAndSwap. Append (or) consume structured buer views (Dx11)

If we dont have control over scheduling, we could not be able to access core local memory, obviously, as we would not know which elements are scheduled where.

For convenience, in compute, indices can be generated on 1d, 2d or 3d grids ResetCoreMemory memory is not really- reset, but we cant rely on its contents, the ini7aliza7on is to be done in the Kernel and any output has to go into GLOBAL_MEM, so I created this pseudo-func7on to illustrate this in a C-like way.

In the picture, the rst Connec7on Machine, a 65536-processor supercomputer of the 80ies

Shared memory is not exposed to pixel and vertex shaders as it requires manual scheduling of elements on cores. Its s7ll used internally by the graphic card to store results of rasteriza7on and shared informa7on

Note how widening the number of outputs per element looks like unrolling the computa7on... Such scheduling decisions have impacts on latency hiding (switching contextes), the ability of using registers as cache (i.e. in this computa7on, as we average columns, going wide on columns will enable reusing fetches from registers), the ability of syncronizing dierent tasks in a single kernel (persistent threads) and so on...

Also, many libraries worth checking out. Thrust, Nvidia performance primi7ves, CuDPP, libCL etc Even more for tasks that are inherently parallel, linear algebra, imaging and so on

All the arrows here represent sums to the exis7ng values, the self part of the addi7on is not shown to not confuse with too many arrows hCp://en.wikipedia.org/wiki/Prex_sum

All the black arrows here represent sums to the exis7ng values, the self part of the addi7on is not shown, as for the slide before The red arrows mark element moves hCp://hCp.developer.nvidia.com/GPUGems3/gpugems3_ch39.html www.umiacs.umd.edu/~ramani/cmsc828e_gpusci/ScanTalk.pdf

The moderngpu website has lots of code, and it take a scan centric approach to parallel computa7on hCp://www.moderngpu.com/intro/scan.html hCp://www.moderngpu.com/scan/segscan.html

Blelloch paper references many applica7on domains: www.cs.cmu.edu/~guyb/papers/ Ble93.pdf hCp://back40compu7ng.googlecode.com/svn-history/r225/wiki/documents/ RadixSortTR.pdf

hCp://www.cse.chalmers.se/~olaolss/papers/Ecient%20Stream%20Compac7on %20on%20Wide%20SIMD%20Many-Core%20Architectures.pdf

Images from the original paper: On-the-y Point Clouds through Histogram Pyramids. Gernot Ziegler, Art Tevs, Chris7an Theobalt, Hans-Peter Seidel hCp://www.mpi-inf.mpg.de/~gziegler/gpu_pointlist/paper17_gpu_pointclouds.pdf hCp://www.mpi-inf.mpg.de/%7Egziegler/gpu_pointlist/slides_vmv2006.pdf hCp://www.astro.lu.se/compugpu2010/resources/bonus_histopyramid.pdf Marching cubes using HP hCp://diglib.eg.org/EG/DL/CGF/volume27/issue8/ v27i8pp2028-2039.pdf.abstract.pdf

Good rendering hCps://graphics.stanford.edu/wikis/cs448s-10/FrontPage? ac7on=ACachFile&do=get&target=08-GPUArchII.pdf

Some of the compromises, many dierences in the implementa7on See Debunking the 100x GPU vs CPU Myth for more details: h@p://www.hwsw.hu/kepek/hirek/ 2010/06/p451-lee.pdf

There are many languages that are trying to be both GPU and CPU targets, OpenCL has CPU backends, without considering CPU and GPUs on the same chip like AMD Fusion (see i.e. synergy.cs.vt.edu/pubs/papers/daga-saahpc11-apu-ecacy.pdf) and languages that target the CPU for wide SIMD compu7ng like Intels opensource SPMD ISPC hCp:// ispc.github.com/ hCp://tbex.twbbs.org/~tbex/pad/CA_papers/Twin%20Peaks.pdf Throughput compu7ng is everywhere, not only for high performance calcula7ons but also servers, low power and so on. Amdhal for mul7cores: research.cs.wisc.edu/mul7facet/papers/ ieeecomputer08_amdahl_mul7core.pdf

Some bonus reads BASICS Nvidia CUDA programming guide hCp://developer.download.nvidia.com/compute/ DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf N-Body Simula7on with CUDA from GPU Gems 3 hCp://hCp.developer.nvidia.com/ GPUGems3/gpugems3_ch31.html ADVANCED Ecient Primi7ves and Algorithms for Many-core architectures hCp:// www.cs.ucdavis.edu/research/tech-reports/2011/CSE-2011-4.pdf Persistent threads programming hCp://developer.download.nvidia.com/GTC/PDF/ GTC2012/Presenta7onPDF/S0157-GTC2012-Persistent-Threads-Compu7ng.pdf More on PT and Task Parallelism hCp://developer.download.nvidia.com/GTC/PDF/ GTC2012/Presenta7onPDF/S0138-GTC2012-Parallelism-Primi7ves-Applica7ons.pdf Registers or shared memory? hCp://mc.stanford.edu/cgi-bin/images/6/65/ SC08_Volkov_GPU.pdf Op7miza7on 7ps hCp://www.cs.virginia.edu/~skadron/Papers/ cuda_tuning_bof_sc09_nal.pdf DATA STRUCTURES TONS of spa7al ones, very few general purpose NOTE: aCributes of parallel data structures:

How To Run CUDA C
100% (1)
How To Run CUDA C
6 pages
CH2 - STM32 Periperal - GPIO - AF
No ratings yet
CH2 - STM32 Periperal - GPIO - AF
25 pages
Mplab X Ide: Be Sure To Check Out Microchip's New Integrated Development Environment
100% (1)
Mplab X Ide: Be Sure To Check Out Microchip's New Integrated Development Environment
3 pages
Introduction to Data Science & ML
No ratings yet
Introduction to Data Science & ML
41 pages
04 Mbed Intro
No ratings yet
04 Mbed Intro
34 pages
STM32F4 Discovery Board Pinou Diagram Details, Features and Examples
100% (2)
STM32F4 Discovery Board Pinou Diagram Details, Features and Examples
11 pages
STM32 UART Communication Guide
100% (2)
STM32 UART Communication Guide
6 pages
ARM Assembly Elevator Simulation
100% (2)
ARM Assembly Elevator Simulation
2 pages
ARM Cortex-M Processor Overview
No ratings yet
ARM Cortex-M Processor Overview
9 pages
Microcontroller Lab Manual - 2013-14
100% (4)
Microcontroller Lab Manual - 2013-14
42 pages
Chapitre 4-Etude Des Microcontroleurs STM32
100% (1)
Chapitre 4-Etude Des Microcontroleurs STM32
64 pages
STM32 GPIO Configuration Guide
100% (2)
STM32 GPIO Configuration Guide
5 pages
Embedded Computing Systems Course Overview
100% (1)
Embedded Computing Systems Course Overview
109 pages
STM32F429I-DISCO Schematics
No ratings yet
STM32F429I-DISCO Schematics
8 pages
STM32 Embedded Software Overview
No ratings yet
STM32 Embedded Software Overview
12 pages
VivadoHLS Overview PDF
No ratings yet
VivadoHLS Overview PDF
43 pages
Implementation of White Line Following Firebird V Robot
No ratings yet
Implementation of White Line Following Firebird V Robot
11 pages
ARM Cortex-M3/M4 Processor Core Features
No ratings yet
ARM Cortex-M3/M4 Processor Core Features
38 pages
Overview of Standard Cell VLSI Design
No ratings yet
Overview of Standard Cell VLSI Design
25 pages
FreeRTOS for PSoC Embedded Systems
100% (2)
FreeRTOS for PSoC Embedded Systems
7 pages
C# TCP Client-Server Implementation
No ratings yet
C# TCP Client-Server Implementation
3 pages
68HC11 Microcontroller Overview
No ratings yet
68HC11 Microcontroller Overview
17 pages
STM32F407 GPIO Ports Overview
100% (2)
STM32F407 GPIO Ports Overview
23 pages
Atmel Studio 7.0
100% (1)
Atmel Studio 7.0
81 pages
Verilog HDL Workshop & ModelSim Guide
No ratings yet
Verilog HDL Workshop & ModelSim Guide
18 pages
VHDL and FPGA in Signal Processing
No ratings yet
VHDL and FPGA in Signal Processing
18 pages
Embedded Graphics On STM32F4
No ratings yet
Embedded Graphics On STM32F4
81 pages
Advanced Embedded Systems Syllabus
No ratings yet
Advanced Embedded Systems Syllabus
11 pages
Introduction To ORCAD
No ratings yet
Introduction To ORCAD
15 pages
Trace32 - Training - Part1 2
No ratings yet
Trace32 - Training - Part1 2
35 pages
LPC1343 Microcontroller Features
No ratings yet
LPC1343 Microcontroller Features
11 pages
VHDL Logic Gate Examples
No ratings yet
VHDL Logic Gate Examples
23 pages
Xilinx-13.2 ISE Tutorial
No ratings yet
Xilinx-13.2 ISE Tutorial
33 pages
1, Introduction To Embeeded Systems
100% (1)
1, Introduction To Embeeded Systems
24 pages
FPGA Seven-Segment-Display by Using Altera DE2-115 Board With Practice and Implementation
No ratings yet
FPGA Seven-Segment-Display by Using Altera DE2-115 Board With Practice and Implementation
4 pages
Xzno22222222222 PDF
No ratings yet
Xzno22222222222 PDF
278 pages
DAC 0800 Waveform Generation with 8051
No ratings yet
DAC 0800 Waveform Generation with 8051
14 pages
Armv7 A Cortex A Series PG PDF
No ratings yet
Armv7 A Cortex A Series PG PDF
421 pages
Arduino Interrupts Guide
No ratings yet
Arduino Interrupts Guide
2 pages
Tutorial - Using Atmel Studio 6 With Arduino Projects - EngBlaze
No ratings yet
Tutorial - Using Atmel Studio 6 With Arduino Projects - EngBlaze
10 pages
8051 Microcontroller Student Project Guide
100% (1)
8051 Microcontroller Student Project Guide
25 pages
Implementation of CNN On Zynq Based FPGA For Real-Time Object Detection
No ratings yet
Implementation of CNN On Zynq Based FPGA For Real-Time Object Detection
7 pages
Lab 09 - MES
No ratings yet
Lab 09 - MES
8 pages
STM32F407VGT General-Purpose I/Os (GPIO)
100% (1)
STM32F407VGT General-Purpose I/Os (GPIO)
5 pages
Nios II Processor & SOPC Builder Guide
No ratings yet
Nios II Processor & SOPC Builder Guide
55 pages
JPEG DECODER USING VHDL AND IMPLEMENTING IT ON FPGA SPARTAN 3A KItProject Main Report1
No ratings yet
JPEG DECODER USING VHDL AND IMPLEMENTING IT ON FPGA SPARTAN 3A KItProject Main Report1
21 pages
1-Microcontroller Based System Design - Complete
67% (3)
1-Microcontroller Based System Design - Complete
58 pages
Assignment 4 Solutions Pipelining and Hazards: 1 Processor Performance
100% (1)
Assignment 4 Solutions Pipelining and Hazards: 1 Processor Performance
4 pages
Freertos: Alberto Bosio
No ratings yet
Freertos: Alberto Bosio
52 pages
DE10-Nano Development Kit: Terasic
No ratings yet
DE10-Nano Development Kit: Terasic
4 pages
PCB Design Course
No ratings yet
PCB Design Course
5 pages
1 - STM32Cube Overview - HAL Package and STM32CubeMX
No ratings yet
1 - STM32Cube Overview - HAL Package and STM32CubeMX
50 pages
Arm
100% (1)
Arm
43 pages
8086 Assembler Basics for Beginners
100% (2)
8086 Assembler Basics for Beginners
36 pages
EasyEDA-Tutorial v6.4.32
No ratings yet
EasyEDA-Tutorial v6.4.32
426 pages
Embedded Systems Notes
100% (1)
Embedded Systems Notes
58 pages
Embedded C Development Overview
100% (2)
Embedded C Development Overview
48 pages
Architecture: TMS320C54x
No ratings yet
Architecture: TMS320C54x
14 pages
Parallel Path Tracing
No ratings yet
Parallel Path Tracing
35 pages
(Videogame) Rendering 102
No ratings yet
(Videogame) Rendering 102
32 pages
HDR Workflows For Video Game Rendering
No ratings yet
HDR Workflows For Video Game Rendering
58 pages
Space Marine Performance Analysis
100% (1)
Space Marine Performance Analysis
38 pages
High-Level Overview of Rendering Techniques
100% (2)
High-Level Overview of Rendering Techniques
33 pages
Fight Night Champion GDC Presentation
No ratings yet
Fight Night Champion GDC Presentation
75 pages
Mathematica - Spherical Harmonics
No ratings yet
Mathematica - Spherical Harmonics
7 pages
Mathematica - Fitting Penner's Skin Scattering
No ratings yet
Mathematica - Fitting Penner's Skin Scattering
8 pages
Game Optimization Techniques and Guidelines
100% (1)
Game Optimization Techniques and Guidelines
34 pages
Chevrolet Cruze BCM 4 Al 7
96% (24)
Chevrolet Cruze BCM 4 Al 7
7 pages
Pull Me Down
No ratings yet
Pull Me Down
12 pages
Mechanical Seals
100% (2)
Mechanical Seals
3 pages
Free Oil Crisis PPT Templates: Insert The Subtitle of Your Presentation
No ratings yet
Free Oil Crisis PPT Templates: Insert The Subtitle of Your Presentation
48 pages
Erik Satie: Life and Influence
No ratings yet
Erik Satie: Life and Influence
3 pages
Hop Into English 2nd Edition A Photocopiables
No ratings yet
Hop Into English 2nd Edition A Photocopiables
30 pages
Tom Bola Rules Names
No ratings yet
Tom Bola Rules Names
3 pages
The Gorgons Head776787
No ratings yet
The Gorgons Head776787
4 pages
DC UPS Composite Catalogue-V1
No ratings yet
DC UPS Composite Catalogue-V1
26 pages
Witch Training
No ratings yet
Witch Training
10 pages
Today 2 Student S Book
No ratings yet
Today 2 Student S Book
115 pages
Digital Video Camera Basics
No ratings yet
Digital Video Camera Basics
26 pages
Examen Inter 4
No ratings yet
Examen Inter 4
4 pages
Spreadsheet Conversion Table
No ratings yet
Spreadsheet Conversion Table
82 pages
Different Injury Pattern in Goalkeepers Compared To Field Players A
No ratings yet
Different Injury Pattern in Goalkeepers Compared To Field Players A
5 pages
Super Mario 64 (USA)
No ratings yet
Super Mario 64 (USA)
11 pages
WBBCards Free PDF
No ratings yet
WBBCards Free PDF
6 pages
Humorous Quips of St. John XXIII
No ratings yet
Humorous Quips of St. John XXIII
3 pages
Zakarya / Mouad MR: Boarding Pass
No ratings yet
Zakarya / Mouad MR: Boarding Pass
2 pages
If Cats Disappeared From The World by Genki Kawam
No ratings yet
If Cats Disappeared From The World by Genki Kawam
1 page
Pornhub Trends & Insights 2019
No ratings yet
Pornhub Trends & Insights 2019
2 pages
Wwii German Kurt Schultz
No ratings yet
Wwii German Kurt Schultz
7 pages
LG 50pa4500-Sf - Ch. Pb21a PDF
No ratings yet
LG 50pa4500-Sf - Ch. Pb21a PDF
48 pages
Bali Taw
No ratings yet
Bali Taw
4 pages
Houdini Engine
No ratings yet
Houdini Engine
2 pages
EASA Part-66 Module 7: Maintenance Practices
No ratings yet
EASA Part-66 Module 7: Maintenance Practices
2 pages
Architectural Design-Iii: Submitted By:-Abhishek Goel
No ratings yet
Architectural Design-Iii: Submitted By:-Abhishek Goel
12 pages
Bar Bending Schedule OF Box Culvert
No ratings yet
Bar Bending Schedule OF Box Culvert
1 page
Defining-Relative-Clauses by MR Rahali
No ratings yet
Defining-Relative-Clauses by MR Rahali
4 pages
Food Processing Grade 9: Agri-Fishery
No ratings yet
Food Processing Grade 9: Agri-Fishery
59 pages

GPU & Game Development Insights

Uploaded by

GPU & Game Development Insights

Uploaded by

1

Ok, ok, the last batman

First insight we learned

Again, same reasoning

ALL PSEUDO-CODE DOES NOT DO BOUNDS CHECKS J

Good rendering hCps://graphics.stanford.edu/wikis/cs448s-10/FrontPage? ac7on=ACachFile&do=get&target=08-GPUArchII.pdf

You might also like