Add Fast Image Processor for Chameleon #37140

farrosalferro · 2025-03-31T13:36:40Z

What does this PR do?

Add Fast Image Processor for Chameleon (Issue #36978

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Note

I found a problem during the resizing operation as it seems the InterpolationMode.LANCZOS resampling, which is the default resampling for Chameleon, is not supported within the torchvision resizing method. For instance, when running this code:

tensor = torch.randn(4, 3, 224, 224)
interpolation = PIL.Image.LANCZOS
size = (112, 112)
antialias = True

new_tensor = F.resize(
    tensor,
    size=size,
    interpolation=interpolation,
    antialias=antialias,
)

My workaround is to transform it to Numpy array first, then apply the resizing function similar to the Chameleon's slow image processor. However, this results in a longer processing time than the slow image processor, which does not pass the test_fast_is_faster_than_slow test.

Do you have any suggestion on how to solve this problem? Any advice would be appreciated. Thank you.

github-actions · 2025-03-31T13:36:51Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

yonigozlan

Hi @farrosalferro , thanks a lot for your great work! Left a comment on a few things you can simplify, and on the Lanczos issue.
Thanks again and I'll ping this PR if something changes for the Lanczos issue

src/transformers/models/chameleon/image_processing_chameleon_fast.py

yonigozlan · 2025-04-02T00:53:37Z

src/transformers/models/chameleon/image_processing_chameleon_fast.py

+    def resize(
+        self,
+        image: "torch.Tensor",
+        size: SizeDict,
+        interpolation: "F.InterpolationMode" = None,
+        **kwargs,
+    ) -> "torch.Tensor":
+        """
+        Resize an image to `(size["height"], size["width"])`.
+
+        Args:
+            image (`torch.Tensor`):
+                Image to resize.
+            size (`SizeDict`):
+                Dictionary in the format `{"height": int, "width": int}` specifying the size of the output image.
+            resample (`InterpolationMode`, *optional*, defaults to `InterpolationMode.BILINEAR`):
+                `InterpolationMode` filter to use when resizing the image e.g. `InterpolationMode.BICUBIC`.
+
+        Returns:
+            `torch.Tensor`: The resized image.
+        """
+        interpolation = interpolation if interpolation is not None else F.InterpolationMode.BILINEAR
+        pil_torch_interpolation_mapping_inverse = {v: k for k, v in pil_torch_interpolation_mapping.items()}
+        if isinstance(interpolation, F.InterpolationMode):
+            interpolation = pil_torch_interpolation_mapping_inverse[interpolation]
+        if size.shortest_edge and size.longest_edge:
+            # Resize the image so that the shortest edge or the longest edge is of the given size
+            # while maintaining the aspect ratio of the original image.
+            new_size = get_size_with_aspect_ratio(
+                image.size()[-2:],
+                size.shortest_edge,
+                size.longest_edge,
+            )
+        elif size.shortest_edge:
+            new_size = get_resize_output_image_size(
+                image,
+                size=size.shortest_edge,
+                default_to_square=False,
+                input_data_format=ChannelDimension.FIRST,
+            )
+        elif size.max_height and size.max_width:
+            new_size = get_image_size_for_max_height_width(image.size()[-2:], size.max_height, size.max_width)
+        elif size.height and size.width:
+            new_size = (size.height, size.width)
+        else:
+            raise ValueError(
+                "Size must contain 'height' and 'width' keys, or 'max_height' and 'max_width', or 'shortest_edge' key. Got"
+                f" {size}."
+            )
+        # resize the image one by one as torchvision does not support batch resizing with LANCZOS interpolation
+        device = image.device
+        image_stack = []
+        for img in image:
+            img = img.cpu().numpy()
+            img = resize(
+                img,
+                new_size,
+                resample=interpolation,
+                input_data_format=ChannelDimension.FIRST,
+                **kwargs,
+            )
+            img = torch.from_numpy(img).contiguous().to(device)
+            image_stack.append(img)
+        return torch.stack(image_stack, dim=0)


Thanks for making this work. This is a good way to solve the Lanczos issue, but as you said, this slow down the processing a lot, so we might as well fall back to the slow processor.

There is an ongoing discussion about what's the best solution here, but basically the possibilities are to either sacrifice speed or a bit of accuracy. The alternative to falling back to slow processing would be to use bilinear resampling, which is quite close to Lanczos one.

I think we are probably leaning towards the latter alternative, mostly because we want all fast image processing to only use torch/torchvision functions, so that they are torch compilable and can perform processing fully on GPU.

So what you could do for now is still override resize, check if the resampling is Lanczos, if it is manually change it to Bicubic and log a warning (with logger.warning_once) to fall back to slow processing for full consistency with the original model, then call the parent resize.

Thanks for the solution! I followed #37045 for the warning. However, it still fails the speed test, how do you think should we solve it? Thanks!

Also sorry for the messy commits, I'm still familiarizing myself with github.

…ro/transformers into chameleon_fastimageprocessor

yonigozlan

Hi @farrosalferro, thanks for iterating and sorry for the delay. Looks great! Just pushed some small updates to keep up with the changes in the library, but otherwise everything looks good to be merged. Thanks!

yonigozlan

Sorry just saw there are issue with equivalence tests, probably due to using Bicubic vs Lanczos. Could you override both equivalence tests and force the slow processor to use Bicubic in the tests? Thanks

HuggingFaceDocBuilderDev · 2025-05-13T16:32:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…processor

* Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]>

Add Fast Image Processor for Chameleon

1f2bd67

github-actions bot marked this pull request as draft March 31, 2025 13:36

farrosalferro marked this pull request as ready for review March 31, 2025 13:41

Merge branch 'main' into chameleon_fastimageprocessor

e4057e2

github-actions bot requested review from ydshieh and yonigozlan March 31, 2025 13:42

qubvel mentioned this pull request Mar 31, 2025

[Contributions Welcome] Add Fast Image Processors #36978

Closed

81 tasks

yonigozlan reviewed Apr 2, 2025

View reviewed changes

yonigozlan mentioned this pull request Apr 2, 2025

Add Fast Image Processor for Idefics3 #37045

Open

5 tasks

farrosalferro and others added 5 commits April 20, 2025 15:02

add warning to resize and move blend_rgba to convert_to_rgb

5225b99

Merge branch 'chameleon_fastimageprocessor' of github.com:farrosalfer…

77aff5b

…ro/transformers into chameleon_fastimageprocessor

Remove unrelated files

733cdf4

Merge branch 'main' into chameleon_fastimageprocessor

6514ede

Update image_processing_chameleon_fast to use auto_docstring

decb00e

yonigozlan approved these changes May 13, 2025

View reviewed changes

yonigozlan enabled auto-merge (squash) May 13, 2025 16:19

yonigozlan requested changes May 13, 2025

View reviewed changes

yonigozlan disabled auto-merge May 13, 2025 16:31

yonigozlan added 2 commits June 27, 2025 14:51

Merge remote-tracking branch 'upstream/main' into chameleon_fastimage…

451f9d5

…processor

fix equivalence test

65b6853

yonigozlan enabled auto-merge (squash) June 27, 2025 15:02

yonigozlan approved these changes Jun 27, 2025

View reviewed changes

yonigozlan merged commit dd7dc4a into huggingface:main Jun 27, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Fast Image Processor for Chameleon #37140

Add Fast Image Processor for Chameleon #37140

Uh oh!

farrosalferro commented Mar 31, 2025

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

yonigozlan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yonigozlan Apr 2, 2025 •

edited

Loading

Uh oh!

farrosalferro Apr 20, 2025

Uh oh!

farrosalferro Apr 20, 2025

Uh oh!

yonigozlan left a comment

Uh oh!

yonigozlan left a comment

Uh oh!

HuggingFaceDocBuilderDev commented May 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Fast Image Processor for Chameleon #37140

Add Fast Image Processor for Chameleon #37140

Uh oh!

Conversation

farrosalferro commented Mar 31, 2025

What does this PR do?

Before submitting

Who can review?

Note

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yonigozlan Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

farrosalferro Apr 20, 2025

Choose a reason for hiding this comment

Uh oh!

farrosalferro Apr 20, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yonigozlan Apr 2, 2025 •

edited

Loading