Image processor compile fix #38540

remi-or · 2025-06-03T00:22:27Z

The tests:

tests/models/internvl/test_video_processor_internvl.py::test_can_compile_fast_video_processor
tests/models/qwen2_vl/test_video_processing_qwen2_vl.py::test_can_compile_fast_video_processor

are failing on AMD because the version of torch.compile we are using does not properly graph the F.resize function when input is uint8. For now, we fix this by wrapping calls to F.resize in a compile-friendly function.

This fixes the first test right of the bat, for the second we re-directed the .resize call in qwen2 processor to use the parent class's version.

HuggingFaceDocBuilderDev · 2025-06-03T00:35:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

mht-sharma · 2025-06-04T13:48:05Z

@zucchini-nlp for review

mht-sharma · 2025-06-04T13:55:25Z

src/transformers/image_processing_utils_fast.py

                "Size must contain 'height' and 'width' keys, or 'max_height' and 'max_width', or 'shortest_edge' key. Got"
                f" {size}."
            )
+        # This is a workaround to avoid a bug in torch.compile when dealing with uint8


Is there a link to the issue? And does it affect both Nvidia and Amd

It did not happen on the A100 when I tested it. There is no issue opened, but I do have a reproducible script: https://gist.github.com/remi-or/33359e5435c4de74d8146d85ba50e485

The fact that it didn't happen on Nvidia might be because of different torch version, from what I have seen it's because when the compile graph sees (clamp to uint8 range) -> (round) -> (convert to uint8) it skips the clamping (probably thinks it's done in conversion) and we get values like -2 converted to 254. Hence the use of masked_fill to avoid this behavior.

Could you specify that the error occurs on AMD only as far as we know in the comment? Could be useful for future debugging

Since this issue is specific to AMD and occurs on both the latest stable and nightly releases (which we're likely using since we're building from source), I would suggest:

File an issue in the PyTorch repo or with the AMD team, and link it here for future reference.

Apply the change only for the ROCm path to avoid potential issues with NVIDIA compatibility upstream.

+1 on the above comment. I cannot reproduce this on NVIDIA with the latest torch. An issue with torch repo as a reference will remind us to revert this workaround once the torch team makes a fix

Added a is_rocm_platform to the condition, a on AMD ... in the comment and here is the issue: pytorch/pytorch#155209

yonigozlan

Happy to merge this if it can fix compile errors with AMD, let's just make sure that it doesn't break compile equivalence tests (test_can_compile_fast_image_processor/test_can_compile_fast_video_processor) for other fast image processor/video processors.

You can run this utility from your transformers folder to check:

import argparse
import concurrent.futures
import os
import subprocess

COLOR_RED = "\033[91m"
COLOR_GREEN = "\033[92m"
COLOR_YELLOW = "\033[93m"
COLOR_CYAN = "\033[96m"
COLOR_RESET = "\033[0m"


def colored(text, color_code):
    return f"{color_code}{text}{COLOR_RESET}"


def run_test_file(test_file):
    command = f"RUN_SLOW=1 python -m pytest {test_file} -k 'test_can_compile_fast_image_processor or test_can_compile_fast_video_processor'"
    print(colored(f"Worker running: {command}", COLOR_CYAN))
    process = subprocess.run(command, shell=True, capture_output=True, text=True)
    failed_in_file = False
    for line in process.stdout.splitlines():
        if "FAILED" in line:
            failed_in_file = True
            break

    if failed_in_file:
        print(colored(f"Test file {test_file} finished: {COLOR_RED}FAILED{COLOR_RESET}", COLOR_YELLOW))
        return [
            colored(f"{test_file}: {line.strip()}", COLOR_RED)
            for line in process.stdout.splitlines()
            if "FAILED" in line
        ]
    else:
        print(colored(f"Test file {test_file} finished: {COLOR_GREEN}PASSED{COLOR_RESET}", COLOR_YELLOW))
        return []


def find_test_files(search_dir="tests/models"):
    test_files = []
    for root, _, files in os.walk(search_dir):
        for file in files:
            if (
                file.startswith("test_image_processing") or file.startswith("test_video_processing")
            ) and file.endswith(".py"):
                test_files.append(os.path.join(root, file))
    return test_files


def main():
    parser = argparse.ArgumentParser(description=colored("Run image processing tests in parallel.", COLOR_GREEN))
    parser.add_argument(
        "-w",
        "--workers",
        type=int,
        default=8,
        help=colored("Maximum number of concurrent test workers (default: 8).", COLOR_YELLOW),
    )
    args = parser.parse_args()
    max_workers = args.workers

    test_files = find_test_files()

    if not test_files:
        print(colored("No test_image_processing files found.", COLOR_YELLOW))
        return

    failed_tests_all = []
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [executor.submit(run_test_file, test_file) for test_file in test_files]
        for future in concurrent.futures.as_completed(futures):
            failed_tests_all.extend(future.result())

    if failed_tests_all:
        print(colored("\n--- Summary of Failed Tests ---", COLOR_RED))
        for failed_test in failed_tests_all:
            print(failed_test)
    else:
        print(colored("\nAll tests passed!", COLOR_GREEN))


if __name__ == "__main__":
    main()

yonigozlan · 2025-06-04T16:23:31Z

src/transformers/image_processing_utils_fast.py

+            image = image.float() / 256
+            image = F.resize(image, new_size, interpolation=interpolation, antialias=antialias)
+            image = image * 256


Just curious, why divide by 256 and not 255?

I tried with 255 and got a numerical difference, that I did not get with 256:
Script: https://gist.github.com/remi-or/eb8936ca093d54c186fb5b67f15334eb
Output:

Max difference with 255: 1.0 Max difference with 256: 0.0

yonigozlan · 2025-06-04T16:24:11Z

src/transformers/models/qwen2_vl/video_processing_qwen2_vl.py

+                stacked_videos = self.resize(
+                    image=stacked_videos,
+                    size=SizeDict(height=resized_height, width=resized_width),
+                    interpolation=interpolation,


Thanks for fixing that

Thanks! It will be fixed in a side branch though, not yet in main

Oh didn't catch that thanks

btw, we can fix qwen2-vl image processor as well, it has the same issue

yonigozlan · 2025-06-04T16:26:57Z

src/transformers/image_processing_utils_fast.py

                "Size must contain 'height' and 'width' keys, or 'max_height' and 'max_width', or 'shortest_edge' key. Got"
                f" {size}."
            )
+        # This is a workaround to avoid a bug in torch.compile when dealing with uint8


Could you specify that the error occurs on AMD only as far as we know in the comment? Could be useful for future debugging

zucchini-nlp

LGTM, as long as we make it an AMD-specific workaround and add a comment with reference issue

zucchini-nlp · 2025-06-05T07:33:31Z

src/transformers/image_processing_utils_fast.py

                "Size must contain 'height' and 'width' keys, or 'max_height' and 'max_width', or 'shortest_edge' key. Got"
                f" {size}."
            )
+        # This is a workaround to avoid a bug in torch.compile when dealing with uint8


+1 on the above comment. I cannot reproduce this on NVIDIA with the latest torch. An issue with torch repo as a reference will remind us to revert this workaround once the torch team makes a fix

zucchini-nlp · 2025-06-05T07:36:56Z

src/transformers/models/qwen2_vl/video_processing_qwen2_vl.py

+                stacked_videos = self.resize(
+                    image=stacked_videos,
+                    size=SizeDict(height=resized_height, width=resized_width),
+                    interpolation=interpolation,


btw, we can fix qwen2-vl image processor as well, it has the same issue

remi-or · 2025-06-05T10:52:40Z

@yonigozlan just ran your script and All tests passed! (added two commits to fix llava_next and bridgetower )

mht-sharma

LGTM! Thanks

src/transformers/image_processing_utils_fast.py

Co-authored-by: Mohit Sharma <[email protected]>

* Image processor compile fix (#38540) * Added a compile-friendly versiom of resize to BaseImgProcessorFast * Changed qwen2 processor to use its parent class .resize * Style * underlined issue only happens on AMD w/ comment and bool check * Fixed some utils functions * Fixed the same issue for bridgetower * Fixed the same issue for llava_next * Repo consistency for llava onevision * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Mohit Sharma <[email protected]> --------- Co-authored-by: Mohit Sharma <[email protected]> * Added an Expectation to an internvl test * Made qwen2_vl use the resize method of its parent clas * Changed to torch.where --------- Co-authored-by: Mohit Sharma <[email protected]>

* Image processor compile fix (huggingface#38540) * Added a compile-friendly versiom of resize to BaseImgProcessorFast * Changed qwen2 processor to use its parent class .resize * Style * underlined issue only happens on AMD w/ comment and bool check * Fixed some utils functions * Fixed the same issue for bridgetower * Fixed the same issue for llava_next * Repo consistency for llava onevision * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Mohit Sharma <[email protected]> --------- Co-authored-by: Mohit Sharma <[email protected]> * Added an Expectation to an internvl test * Made qwen2_vl use the resize method of its parent clas * Changed to torch.where --------- Co-authored-by: Mohit Sharma <[email protected]>

mht-sharma self-requested a review June 4, 2025 10:25

remi-or force-pushed the img-process-compile-fix branch from 2feef65 to 9dfb902 Compare June 4, 2025 12:19

remi-or marked this pull request as ready for review June 4, 2025 12:35

mht-sharma reviewed Jun 4, 2025

View reviewed changes

yonigozlan reviewed Jun 4, 2025

View reviewed changes

zucchini-nlp reviewed Jun 5, 2025

View reviewed changes

remi-or added 4 commits June 5, 2025 04:56

Added a compile-friendly versiom of resize to BaseImgProcessorFast

c147b4b

Changed qwen2 processor to use its parent class .resize

907b63d

Style

cb36ed8

underlined issue only happens on AMD w/ comment and bool check

18d48cb

remi-or added 3 commits June 5, 2025 06:00

Fixed some utils functions

7a7aff2

Fixed the same issue for bridgetower

7769cc0

Fixed the same issue for llava_next

e4bae98

remi-or force-pushed the img-process-compile-fix branch from 9dfb902 to e4bae98 Compare June 5, 2025 12:15

Repo consistency for llava onevision

54135b3

mht-sharma approved these changes Jun 5, 2025

View reviewed changes

src/transformers/image_processing_utils_fast.py Show resolved Hide resolved

src/transformers/image_processing_utils_fast.py Show resolved Hide resolved

Update src/transformers/image_processing_utils_fast.py

14775c0

Co-authored-by: Mohit Sharma <[email protected]>

mht-sharma merged commit 18f810f into huggingface:amd-hf-ci-branch Jun 6, 2025
20 checks passed

remi-or mentioned this pull request Jun 23, 2025

Internvl fix #38946

Merged

remi-or mentioned this pull request Sep 2, 2025

Fixed incorrect normalization #40436

Merged

Image processor compile fix #38540

Image processor compile fix #38540

Uh oh!

Conversation

remi-or commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 3, 2025

Uh oh!

mht-sharma commented Jun 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mht-sharma Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

remi-or commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mht-sharma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

remi-or commented Jun 3, 2025 •

edited

Loading

mht-sharma Jun 5, 2025 •

edited

Loading

remi-or commented Jun 5, 2025 •

edited

Loading