GPU utilisation and performance improvements

Drill deep into a GPU’s architecture and at its heart you will find a large number of SIMD units whose purpose is to read data, perform some vector or scalar ALU (VALU or SALU) operation on it and write the result out to a rendertarget or buffer. Those units can be found in what Nvidia calls Streaming Multiprocessors (SM) and AMD calls Workgroup Processors (WGP). Achieving good utilisation of the SIMD units and VALU throughput (i.e. keeping them busy with work) is critical for improving the performance of rendering tasks, especially in this era of increasingly wider GPUs with many SIMD units.

Continue reading “GPU utilisation and performance improvements”
GPU utilisation and performance improvements