Support block-wise quantization

[Block-wise quantization](https://arxiv.org/abs/2110.02861) divides input tensors into smaller blocks that are independently quantized, resulting in faster optimization and high precision quantization. It is used for popular language models, such as [phi-3 mini int4 quantized model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct).

### Native ML API's support
DML `DML_OPERATOR_QUANTIZE` and `DML_OPERATOR_DEQUANTIZE` introduced in [Feature Level 6.3](https://learn.microsoft.com/en-us/windows/ai/directml/dml-feature-level-history#dml_feature_level_6_3)
CoreML [constexpr_blockwise_shift_scale](https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS18.compression.constexpr_blockwise_shift_scale)
TFLite: ?

### Proposal
No API signature changes regarding to @fdwr 's [proposal](https://github.com/webmachinelearning/webnn/issues/375#issuecomment-2292466613) of `dequantizeLinear` and `quantizeLinear` ops. 
```js
MLOperand dequantizeLinear(MLOperand input, MLOperand scale, MLOperand zeroPoint, optional MLOperatorOptions options = {});
MLOperand quantizeLinear(MLOperand input, MLOperand scale, MLOperand zeroPoint, optional MLOperatorOptions options = {});
```

The `block_size` is an integer and implied by `block_size = input_size / scale_size` (where `input_size % scale_size == 0`) along a dimension. `zeroPoint` and `scale` should have the same shape.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support block-wise quantization #779

Native ML API's support

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support block-wise quantization #779

Description

Native ML API's support

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions