(de)quantization behaviors on CoreML

Hi folks! I've implemented some version of quantize/dequantize on CoreML, here are the findings:

CoreML [constexpr_blockwise_shift_scale](https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS18.compression.constexpr_blockwise_shift_scale) fully support webnn's dequantize, but only for constant `input`, `scale`, `zero_point`.
- It also states `Although all parameters of this op are constants, this op is not constant-folded to a single const op at the time of model serialization. The unquantized output will be decompressed later, based on the implementation detail (either at model load time or runtime).` Suggesting it's quite optimized on when to decompress the constants, I assume it would want to defer decompressing until actual computation happens on some devices.

CoreML support [dequantize](https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS17.quantization_ops.dequantize) and [quantize](https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS17.quantization_ops.quantize) for non-constant inputs. although it's quite limited:
- Thes two ops fail at compile time when scale is negative. `Failed to parse the model specification. Error: Unable to parse ML Program: in operation of type quantize: For operator: quantize, scale must be positive, but get -4.61708" UserInfo={NSLocalizedDescription=Failed to parse the model specification. Error: Unable to parse ML Program: in operation of type quantize: For operator: quantize, scale must be positive, but get -4.61708`
- it still requires scale and bias to be constants.
- no blockwise (de)quantization.
- only (u)int8 support, no (u)int4. - (u)int4 can't be emulated, because [cast](https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS17.elementwise_unary.cast) doesn't support (u)int4.


So  for non-constant inputs, we would do emulation if:
- scale and zero-point are non-constant
- it's blockwise
- same for when we need to support dynamic quantization. 

And **(u)int4 will just be unsupported on CoreML for non-constant inputs until CoreML evolves.**

 
I do wonder - Is it necessary to support negative scale? Because for this one we will need to manually inspect the tensor in c++ to decide whether to fallback to emulation.

If we need to support negative scale, since CoreML require scale to be constant,  we have following options:
a. in chromium c++ code, we need to manually scan the scale tensor and check if there are negative values, if so, do emulation, otherwise use `dequantize` or `quantize`.
b. Don't manually scan the scale tensor, always do emulation. 🤔 

Would love to get opinions from this group.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(de)quantization behaviors on CoreML #822

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(de)quantization behaviors on CoreML #822

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions