A quick glance over the code suggested there is currently no support for instruction-level parallelism in the C code generation.
To my greatest surprise though, even the "simpliest" thing is not being used. Namely the keyword restrict is missing everywhere. Without this any benchmarks involving pointers do not make that much sense as it might seem (https://github.com/vlang/v/blob/master/compiler/tests/bench/val_vs_ptr.c ). Not talking about very important language-level decisions between "copy value instead of passing as reference" which I've seen somewhere in the discussions.
IMHO the path to instruction-level parallelism shall start with properly implementing the actual V semantics when it comes to passing without explicitly specifying & (and later also in some cases where & aka references are being passed around). In these cases restrict shall be used nearly everywhere. restrict improves performance significantly as memory stores invalidate the entry in all CPU caches making any work with pointers super slow (orders of magnitude in the worst case).
Using restrict basically everywhere where pointers are used shall be IMHO the very first step. And those very few places where pointers really overlap aka "are aliased" (and thus restrict can't be used) shall probably be often changed in a way to make them not overlap.
The second step could be some tiny loop preparation (inlining, padding, etc.) to allow vectorization. I don't mean V to do the vectorization (loop unrolling, etc.) itself, but use V semantics to enhance the generated C code to make sure it's more easily vectorizable.
Last but not least, there should be some portable API for SIMD - see e.g. how Rust does it: https://github.com/rust-lang/project-portable-simd/blob/master/CHARTER.md .
A quick glance over the code suggested there is currently no support for instruction-level parallelism in the C code generation.
To my greatest surprise though, even the "simpliest" thing is not being used. Namely the keyword
restrictis missing everywhere. Without this any benchmarks involving pointers do not make that much sense as it might seem (https://github.com/vlang/v/blob/master/compiler/tests/bench/val_vs_ptr.c ). Not talking about very important language-level decisions between "copy value instead of passing as reference" which I've seen somewhere in the discussions.IMHO the path to instruction-level parallelism shall start with properly implementing the actual
Vsemantics when it comes to passing without explicitly specifying&(and later also in some cases where&aka references are being passed around). In these casesrestrictshall be used nearly everywhere.restrictimproves performance significantly as memory stores invalidate the entry in all CPU caches making any work with pointers super slow (orders of magnitude in the worst case).Using
restrictbasically everywhere where pointers are used shall be IMHO the very first step. And those very few places where pointers really overlap aka "are aliased" (and thusrestrictcan't be used) shall probably be often changed in a way to make them not overlap.The second step could be some tiny loop preparation (inlining, padding, etc.) to allow vectorization. I don't mean
Vto do the vectorization (loop unrolling, etc.) itself, but useVsemantics to enhance the generated C code to make sure it's more easily vectorizable.Last but not least, there should be some portable API for SIMD - see e.g. how Rust does it: https://github.com/rust-lang/project-portable-simd/blob/master/CHARTER.md .