dnn: mixed-precision inference and quantization

##### System information (version)
- OpenCV => 4.2.0

##### Detailed description

The CUDA backend can support mixed-precision inference with various types: FP32, FP16, INT32, (U)INT8 and possibly INT4 and INT1. It's fairly easy to implement as cuDNN already has convolution primitives for many of these types and the existing CUDA backend codebase is fully template-based.

Before this can be implemented, some issues need to be sorted out:
1. Is it required?
2. APIs to configure mixed-precision
    - APIs to allow the user to control the mixed-precision configuration
    - ability to import quantized models and use the quantized weights during inference

Other ideas:
1. Default Mixed-precision policies
    - example: fp16 for convolutions and fp32 for the rest?
2. Automatic Mixed-Precision
    - the user provides a representative dataset and the AMP system automatically figures out a good configuration
      - a similar thing would be required to implement in-house quantization

IE supports mixed-precision inference. A generic `cv::dnn` API for mixed-precision could be used for all the backends that support it.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dnn: mixed-precision inference and quantization #16633

System information (version)

Detailed description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

dnn: mixed-precision inference and quantization #16633

Description

System information (version)

Detailed description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions