Currently the attention over heads runs in serial: https://github.com/certik/fastGPT/blob/01eb84b015d89a567245da0445c0abb7d53a8500/gpt2.f90#L101 We should try to parallelize it and see if we get any speedups.
Currently the attention over heads runs in serial:
fastGPT/gpt2.f90
Line 101 in 01eb84b
We should try to parallelize it and see if we get any speedups.