The reason in this MRE is actually obvious
ā¦
At each iteration of the outer loop on i:
- In the fast version, the inner loop on
jis executed onlyN_samples=1000times, which means that only 1000 elements ofpack_totare updated. - In the slow version all the
N_grid_pack=800000elements ofpack_totare updated (and the wholeline_packis set to zero, although you really need only 1000 elements).