[ci] don't run sana layerwise casting tests in CI. by sayakpaul · Pull Request #12551 · huggingface/diffusers

sayakpaul · 2025-10-27T15:43:08Z

What does this PR do?

Tests pass locally perfectly fine on different GPUs (RT 4090, A100, H100).

Example failures:

dg845

Thanks, LGTM! I've observed that some other test_layerwise_casting_inference tests seem to fail in the CI as well. For example, on the CI job when PRX was merged, both Qwen Image (https://github.com/huggingface/diffusers/actions/runs/18701189489/job/53330184168):

FAILED tests/pipelines/qwenimage/test_qwenimage_edit.py::QwenImageEditPipelineFastTests::test_layerwise_casting_inference - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB. GPU 0 has a total capacity of 14.74 GiB of which 14.03 GiB is free. Process 19357 has 724.00 MiB memory in use. Of the allocated memory 465.60 MiB is allocated by PyTorch, and 132.40 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
FAILED tests/pipelines/qwenimage/test_qwenimage_edit_plus.py::QwenImageEditPlusPipelineFastTests::test_layerwise_casting_inference - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB. GPU 0 has a total capacity of 14.74 GiB of which 14.05 GiB is free. Process 19357 has 708.00 MiB memory in use. Of the allocated memory 484.53 MiB is allocated by PyTorch, and 97.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

and Sana (https://github.com/huggingface/diffusers/actions/runs/18701189489/job/53330184161):

FAILED tests/pipelines/sana/test_sana.py::SanaPipelineFastTests::test_layerwise_casting_inference - RuntimeError: GET was unable to find an engine to execute this computation
FAILED tests/pipelines/sana/test_sana_controlnet.py::SanaControlNetPipelineFastTests::test_layerwise_casting_inference - RuntimeError: GET was unable to find an engine to execute this computation
FAILED tests/pipelines/sana/test_sana_sprint.py::SanaSprintPipelineFastTests::test_layerwise_casting_inference - RuntimeError: GET was unable to find an engine to execute this computation
FAILED tests/pipelines/sana/test_sana_sprint_img2img.py::SanaSprintImg2ImgPipelineFastTests::test_layerwise_casting_inference - RuntimeError: GET was unable to find an engine to execute this computation

test_layerwise_casting_inference tests failed.

sayakpaul · 2025-10-28T07:15:30Z

@dg845 please check now.

Regarding the Qwen failures, I am not too sure but OOMs in our CI can be also triggered by a prior test failure in the mix. I printed the torch.cuda.max_memory_allocated() for those concerned QwenImage tests in GB and it came out to be 0.4499330520629883 GB.

So, I suspect these OOMs are caused by previous test failures. But LMK if you have an another perspective.

dg845

LGTM :)

don't run sana layerwise casting tests in CI.

3441f7f

sayakpaul requested review from DN6 and dg845 October 27, 2025 15:43

Merge branch 'main' into sana-autoencoderdc-ci-fix

014dff4

dg845 approved these changes Oct 28, 2025

View reviewed changes

up

51b5461

sayakpaul requested a review from dg845 October 28, 2025 07:13

Merge branch 'main' into sana-autoencoderdc-ci-fix

651a684

dg845 approved these changes Oct 28, 2025

View reviewed changes

sayakpaul merged commit 55d49d4 into main Oct 28, 2025
10 of 11 checks passed

sayakpaul deleted the sana-autoencoderdc-ci-fix branch October 28, 2025 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] don't run sana layerwise casting tests in CI.#12551

[ci] don't run sana layerwise casting tests in CI.#12551
sayakpaul merged 4 commits intomainfrom
sana-autoencoderdc-ci-fix

sayakpaul commented Oct 27, 2025

Uh oh!

dg845 left a comment

Uh oh!

sayakpaul commented Oct 28, 2025

Uh oh!

dg845 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sayakpaul commented Oct 27, 2025

What does this PR do?

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Oct 28, 2025

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants