expose graph node name returning non-zero status code#714
expose graph node name returning non-zero status code#714hariharans29 merged 23 commits intomasterfrom
Conversation
| if (!compute_status.IsOK()) { | ||
| LOGS(logger, ERROR) << "Non-zero status code returned while running Node: ", | ||
| p_op_kernel->Node().Name(), " Status Message: ", compute_status.ErrorMessage(); | ||
| return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Non-zero status code returned while running Node: ", |
There was a problem hiding this comment.
Not sure if anything external depends on the failure category or code. You may want to maintain those instead of always overwriting with ONNXRUNTIME and FAIL. #Closed
There was a problem hiding this comment.
That makes sense, thanks, fixed. #Closed
|
what if the name is empty? |
@snnn - I actually assumed handling was already done while resolving the graph (my bad), I have a light-weight mechanism to generate a node name if empty (based on op type). If op type is empty, it should break elsewhere as kernel resolving cannot happen, although it might be worthwhile to check missing op type at graph resolution time as well perhaps |
The scope of this PR is to print the node name when available. |
onnxruntime/core/graph/graph.cc
Outdated
|
|
||
| std::string node_name = node_proto.name(); | ||
| if (node_name.empty()) | ||
| node_name = GenerateNodeName("unnamed_" + op_type + "_" + |
There was a problem hiding this comment.
Why do we need this? This name won't be visible on Netron any way when debugging.
There was a problem hiding this comment.
maybe not, but atleast it helps if we can visualize the serialized proto (not sure if this is possible) and the information here would atleast tell us that a specific op type (offset by the specific count) malfunctioned (as opposed to having no information at all)
pranavsharma
left a comment
There was a problem hiding this comment.
LGTM cc @linkerzhang
onnxruntime/core/graph/graph.cc
Outdated
| else | ||
| current_op_type_count = ++iter->second; | ||
|
|
||
| std::string node_name = node_proto.name(); |
onnxruntime/core/graph/graph.cc
Outdated
|
|
||
| return AddNode(node_proto.name(), | ||
| size_t current_op_type_count = 1; | ||
| const auto op_type = node_proto.op_type(); |
| Node& AddNode(const ONNX_NAMESPACE::NodeProto& node_proto, | ||
| const ArgNameToTypeMap& name_to_type); | ||
| const ArgNameToTypeMap& name_to_type, | ||
| TypeToCountMap& type_to_count_map); |
There was a problem hiding this comment.
TypeToCountMap& type_to_count_map [](start = 16, length = 33)
Not a huge fan of the API taking this in order to create a hopefully unique name for the node as it's a bit obtuse. Possibly not expected behaviour for the name to be changed during AddNode either.
Why do we need this new approach?
There was a problem hiding this comment.
I understand your concern regarding this. But don't you think it adds a little bit of value in terms of node debuggability in case of missing node names ? -
-
It gives information regarding the malfunctioning node (in terms of op_type + chronological order in the serialized proto).
-
It is almost always guaranteed to be unique as the name is a function of op_type + order it is seen.
-
As far as API change goes, correct me if I am wrong - I think the graph building methods are not currently being exposed to end-users anyway
There was a problem hiding this comment.
Reverted the this aspect of the change
|
|
||
| const auto& compute_status = p_op_kernel->Compute(&op_kernel_context); | ||
| if (!compute_status.IsOK()) { | ||
| std::string msg_string = |
There was a problem hiding this comment.
it's better to use ostringstream when you're constructing a string from multiple strings. the + operators will make a copy. even if it doesn't matter so much in this case since we're doing it in the error scenario, it's a good habit.
…icrosoft#714) * Added support for 2025.2 and SimplifiedLayerNormalization op * [OVEP] Update OV version to 2025.2.0 * Revert "[OVEP] Update OV version to 2025.2.0" This reverts commit d129250.
Helps debugging if the graph node name that returns non-zero status code is available.
In case of exceptions from within node kernels (by using ORT_THROW), the stack trace should contain atleast the kernel name it failed at (if not the graph node name), so avoiding the (potentially) costly try...catch for now