-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
Open
Labels
Milestone
Description
System information (version)
- OpenCV => 4.2.0
Detailed description
The CUDA backend can support mixed-precision inference with various types: FP32, FP16, INT32, (U)INT8 and possibly INT4 and INT1. It's fairly easy to implement as cuDNN already has convolution primitives for many of these types and the existing CUDA backend codebase is fully template-based.
Before this can be implemented, some issues need to be sorted out:
- Is it required?
- APIs to configure mixed-precision
- APIs to allow the user to control the mixed-precision configuration
- ability to import quantized models and use the quantized weights during inference
Other ideas:
- Default Mixed-precision policies
- example: fp16 for convolutions and fp32 for the rest?
- Automatic Mixed-Precision
- the user provides a representative dataset and the AMP system automatically figures out a good configuration
- a similar thing would be required to implement in-house quantization
- the user provides a representative dataset and the AMP system automatically figures out a good configuration
IE supports mixed-precision inference. A generic cv::dnn API for mixed-precision could be used for all the backends that support it.
Reactions are currently unavailable