Skip to content

dnn: mixed-precision inference and quantization #16633

@YashasSamaga

Description

@YashasSamaga
System information (version)
  • OpenCV => 4.2.0
Detailed description

The CUDA backend can support mixed-precision inference with various types: FP32, FP16, INT32, (U)INT8 and possibly INT4 and INT1. It's fairly easy to implement as cuDNN already has convolution primitives for many of these types and the existing CUDA backend codebase is fully template-based.

Before this can be implemented, some issues need to be sorted out:

  1. Is it required?
  2. APIs to configure mixed-precision
    • APIs to allow the user to control the mixed-precision configuration
    • ability to import quantized models and use the quantized weights during inference

Other ideas:

  1. Default Mixed-precision policies
    • example: fp16 for convolutions and fp32 for the rest?
  2. Automatic Mixed-Precision
    • the user provides a representative dataset and the AMP system automatically figures out a good configuration
      • a similar thing would be required to implement in-house quantization

IE supports mixed-precision inference. A generic cv::dnn API for mixed-precision could be used for all the backends that support it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions