Fingerprinting via machine-specific artifacts

Apropos of webmachinelearning/webnn#3 and webmachinelearning/ethical-webmachinelearning#22, an efficient matmul implementation can be fingerprinted to determine hardware capabilities.  

On pre-VNNI Intel, the only efficient way to implement 8-bit multiplication is via `pmaddubsw` that produces a 16-bit result summed horizontally with saturation.  I can construct matrices that test for this saturation, which indicates a pre-VNNI Intel CPU.  Whereas ARM and NVidia implement signed * signed to 32-bit.  

Saturating addition, which should be used for accuracy lest you generate large sign errors, can be used to infer the order of operations.  So `vpdpbusds` saturation tells me what order the `matmul` ran in.  

The slowdown from using AVX512 instructions is likely detectable with timing.  

In floats one can also infer order of operations from rounding.  This would reveal the SIMD length and possibly variations in the compiler used to build the user agent.  A cache-efficient matmul implementation reveals cache sizes via floating point order of operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fingerprinting via machine-specific artifacts #85

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fingerprinting via machine-specific artifacts #85

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions