Skip to content

Conversation

@tohtana
Copy link
Collaborator

@tohtana tohtana commented Oct 2, 2025

This PR improves error message when DeepCompile test fails.

Tests of DeepCompile occasionally fail (example) because of mismatching loss values.
To make sure this is not a synchronization bug that causes nan loss values, the change in this PR shows the mismatching values. We can consider increasing the tolerances once we confirm the mismatch is reasonable.

@sfc-gh-truwase sfc-gh-truwase merged commit 82a9db7 into deepspeedai:master Oct 3, 2025
12 checks passed
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
This PR improves error message when DeepCompile test fails.

Tests of DeepCompile occasionally fail
([example](https://github.com/deepspeedai/DeepSpeed/actions/runs/18160078309/job/51688736712?pr=7604))
because of mismatching loss values.
To make sure this is not a synchronization bug that causes `nan` loss
values, the change in this PR shows the mismatching values. We can
consider increasing the tolerances once we confirm the mismatch is
reasonable.

---------

Signed-off-by: Masahiro Tanaka <[email protected]>
Liangliang-Ma pushed a commit to Liangliang-Ma/DeepSpeed that referenced this pull request Oct 13, 2025
This PR improves error message when DeepCompile test fails.

Tests of DeepCompile occasionally fail
([example](https://github.com/deepspeedai/DeepSpeed/actions/runs/18160078309/job/51688736712?pr=7604))
because of mismatching loss values.
To make sure this is not a synchronization bug that causes `nan` loss
values, the change in this PR shows the mismatching values. We can
consider increasing the tolerances once we confirm the mismatch is
reasonable.

---------

Signed-off-by: Masahiro Tanaka <[email protected]>
Signed-off-by: Ma, Liangliang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants