Skip to content

Conversation

@mklimenk
Copy link

@mklimenk mklimenk commented Jul 14, 2025

A follow-up to #740 with changed logic. Instead of relying on an external configuration key, perform bfloat16->float16 conversion in case there is at least one tensor in bfloat16 in the model.

https://jira.devtools.intel.com/browse/CVS-170592

@mklimenk mklimenk changed the title Add self-detecting on-the-fly bfloat16->float16 conversion pass [Draft] Add self-detecting on-the-fly bfloat16->float16 conversion pass Jul 14, 2025
Copy link

@sfatimar sfatimar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more aligned with this change.

@mklimenk mklimenk force-pushed the private/mklimenk/bfloat16_fix_implicit branch from a02a919 to c594c4d Compare July 29, 2025 12:27
@mklimenk mklimenk marked this pull request as ready for review July 29, 2025 15:33
@mklimenk mklimenk changed the title [Draft] Add self-detecting on-the-fly bfloat16->float16 conversion pass Add self-detecting on-the-fly bfloat16->float16 conversion pass Jul 29, 2025
DumpOpenVINOEPModel(onnx_model_path_name, model_proto.get(), fused_node);
ORT_ENFORCE(status.IsOK(), status.ErrorMessage());
return model_proto;
} else if (HasBf16(subgraph)) {
Copy link

@sfatimar sfatimar Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a check needed for enable_qdq_optimizer . Should you check for GPU here ? Please let me know if you support ep context graphs

Copy link
Author

@mklimenk mklimenk Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, this is a universal pass, which works for all the IPs.
UPD to an edited comment: this is an else condition for all the qdq_scales-related graph modifications. Overall, qdq_scales and bfloat16 are mutually exclusive, so the current logic is the following: if qdq_scaling pass is requested, we go there with two different paths for NPU and GPU. Else, if the model has bfloat16 initializers, we convert them to fp16 in this pass. Otherwise, we just transfer the model directly to openvino.
Regarding ep context graphs: no, they're not supported, since they're basically an encapsulated OVIR and we can only redirect it to OV, nothing more. So if there is a request from a customer to work with bfloat16 ep context models, we'd need to solve it on the OV side.

Copy link

@sfatimar sfatimar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Please look at review comments and see if you have subscribed to Coding Style.

@sfatimar
Copy link

Please update branch.

@sfatimar sfatimar requested a review from vthaniel July 30, 2025 14:14
@vthaniel
Copy link

@mklimenk
Can you please rebase this branch

@sfatimar
Copy link

sfatimar commented Aug 1, 2025

@vthaniel please approve

@vthaniel
Copy link

vthaniel commented Aug 1, 2025

Reviewer has approved and internal_ci has passed.
Merging the changes

@vthaniel vthaniel merged commit ed9e425 into intel:ovep-develop Aug 1, 2025
3 of 5 checks passed
vthaniel added a commit that referenced this pull request Aug 28, 2025
* Add on-the-fly bfloat16->float16 conversion pass

* Fix undetected bfloat16 initializers

* Remove the option and make the logic implicit

* Add tests

* Rename detection function

* Fix CI for strict aliasing rules

---------

Co-authored-by: Vishnudas Thaniel S <[email protected]>
@mklimenk mklimenk deleted the private/mklimenk/bfloat16_fix_implicit branch September 10, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants