Skip to content

Conversation

@farrosalferro
Copy link
Contributor

What does this PR do?

Add Fast Image Processor for Chameleon (Issue #36978

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@yonigozlan

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Note

I found a problem during the resizing operation as it seems the InterpolationMode.LANCZOS resampling, which is the default resampling for Chameleon, is not supported within the torchvision resizing method. For instance, when running this code:

tensor = torch.randn(4, 3, 224, 224)
interpolation = PIL.Image.LANCZOS
size = (112, 112)
antialias = True

new_tensor = F.resize(
    tensor,
    size=size,
    interpolation=interpolation,
    antialias=antialias,
)

I got this error
NotImplementedError: Input Error: Only 3D, 4D and 5D input Tensors supported (got 4D) for the modes: nearest | linear | bilinear | bicubic | trilinear | area | nearest-exact (got lanczos)

My workaround is to transform it to Numpy array first, then apply the resizing function similar to the Chameleon's slow image processor. However, this results in a longer processing time than the slow image processor, which does not pass the test_fast_is_faster_than_slow test.

Do you have any suggestion on how to solve this problem? Any advice would be appreciated. Thank you.

@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@github-actions github-actions bot marked this pull request as draft March 31, 2025 13:36
@farrosalferro farrosalferro marked this pull request as ready for review March 31, 2025 13:41
Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @farrosalferro , thanks a lot for your great work! Left a comment on a few things you can simplify, and on the Lanczos issue.
Thanks again and I'll ping this PR if something changes for the Lanczos issue

Comment on lines 145 to 208
def resize(
self,
image: "torch.Tensor",
size: SizeDict,
interpolation: "F.InterpolationMode" = None,
**kwargs,
) -> "torch.Tensor":
"""
Resize an image to `(size["height"], size["width"])`.

Args:
image (`torch.Tensor`):
Image to resize.
size (`SizeDict`):
Dictionary in the format `{"height": int, "width": int}` specifying the size of the output image.
resample (`InterpolationMode`, *optional*, defaults to `InterpolationMode.BILINEAR`):
`InterpolationMode` filter to use when resizing the image e.g. `InterpolationMode.BICUBIC`.

Returns:
`torch.Tensor`: The resized image.
"""
interpolation = interpolation if interpolation is not None else F.InterpolationMode.BILINEAR
pil_torch_interpolation_mapping_inverse = {v: k for k, v in pil_torch_interpolation_mapping.items()}
if isinstance(interpolation, F.InterpolationMode):
interpolation = pil_torch_interpolation_mapping_inverse[interpolation]
if size.shortest_edge and size.longest_edge:
# Resize the image so that the shortest edge or the longest edge is of the given size
# while maintaining the aspect ratio of the original image.
new_size = get_size_with_aspect_ratio(
image.size()[-2:],
size.shortest_edge,
size.longest_edge,
)
elif size.shortest_edge:
new_size = get_resize_output_image_size(
image,
size=size.shortest_edge,
default_to_square=False,
input_data_format=ChannelDimension.FIRST,
)
elif size.max_height and size.max_width:
new_size = get_image_size_for_max_height_width(image.size()[-2:], size.max_height, size.max_width)
elif size.height and size.width:
new_size = (size.height, size.width)
else:
raise ValueError(
"Size must contain 'height' and 'width' keys, or 'max_height' and 'max_width', or 'shortest_edge' key. Got"
f" {size}."
)
# resize the image one by one as torchvision does not support batch resizing with LANCZOS interpolation
device = image.device
image_stack = []
for img in image:
img = img.cpu().numpy()
img = resize(
img,
new_size,
resample=interpolation,
input_data_format=ChannelDimension.FIRST,
**kwargs,
)
img = torch.from_numpy(img).contiguous().to(device)
image_stack.append(img)
return torch.stack(image_stack, dim=0)
Copy link
Member

@yonigozlan yonigozlan Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this work. This is a good way to solve the Lanczos issue, but as you said, this slow down the processing a lot, so we might as well fall back to the slow processor.

There is an ongoing discussion about what's the best solution here, but basically the possibilities are to either sacrifice speed or a bit of accuracy. The alternative to falling back to slow processing would be to use bilinear resampling, which is quite close to Lanczos one.

I think we are probably leaning towards the latter alternative, mostly because we want all fast image processing to only use torch/torchvision functions, so that they are torch compilable and can perform processing fully on GPU.

So what you could do for now is still override resize, check if the resampling is Lanczos, if it is manually change it to Bicubic and log a warning (with logger.warning_once) to fall back to slow processing for full consistency with the original model, then call the parent resize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the solution! I followed #37045 for the warning. However, it still fails the speed test, how do you think should we solve it? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also sorry for the messy commits, I'm still familiarizing myself with github.

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @farrosalferro, thanks for iterating and sorry for the delay. Looks great! Just pushed some small updates to keep up with the changes in the library, but otherwise everything looks good to be merged. Thanks!

@yonigozlan yonigozlan enabled auto-merge (squash) May 13, 2025 16:19
Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just saw there are issue with equivalence tests, probably due to using Bicubic vs Lanczos. Could you override both equivalence tests and force the slow processor to use Bicubic in the tests? Thanks

@yonigozlan yonigozlan disabled auto-merge May 13, 2025 16:31
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan yonigozlan enabled auto-merge (squash) June 27, 2025 15:02
@yonigozlan yonigozlan merged commit dd7dc4a into huggingface:main Jun 27, 2025
20 checks passed
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants