Skip to content

quantizationed model cannot inference with cuda? #27729

@dongfangduoshou123

Description

@dongfangduoshou123

I see the pytorch1.3 is now available, quantization is added, greate! but as the description:
'""This currently experimental feature includes support for post-training quantization, dynamic quantization, and quantization-aware training. It leverages the FBGEMM and QNNPACK state-of-the-art quantized kernel back ends, for x86 and ARM CPUs, respectively, which are integrated with PyTorch and now share a common API.
"""

means the quantizationed model can only use the FBGEMM and QNNPACK(cpu backends) and cannot deploy the quantizationed model with cuda backend?
means now all the pytorch's quantization ops are created just for cpus like arm and x86?

Thank you!

cc @jerryzh168 @jianyuh @dzhulgakov

Metadata

Metadata

Assignees

Labels

module: cudaRelated to torch.cuda, and CUDA support in generalmodule: docsRelated to our documentation, both in docs/ and docblocksoncall: quantizationQuantization support in PyTorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions