Batching means that multiple input streams are being computed at the same time, which can vectorize better and thus speedup the inference per token.
Batching means that multiple input streams are being computed at the same time, which can vectorize better and thus speedup the inference per token.