There are a few optimizations to be made: - user shaderc or spirv_tools optimizations - use for loop instead of unrolled for loop - store 4 bytes in each int instead of 1 and unpack at runtime