-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Add Fast Image Processor for Chameleon #37140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Fast Image Processor for Chameleon #37140
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
yonigozlan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @farrosalferro , thanks a lot for your great work! Left a comment on a few things you can simplify, and on the Lanczos issue.
Thanks again and I'll ping this PR if something changes for the Lanczos issue
src/transformers/models/chameleon/image_processing_chameleon_fast.py
Outdated
Show resolved
Hide resolved
src/transformers/models/chameleon/image_processing_chameleon_fast.py
Outdated
Show resolved
Hide resolved
src/transformers/models/chameleon/image_processing_chameleon_fast.py
Outdated
Show resolved
Hide resolved
| def resize( | ||
| self, | ||
| image: "torch.Tensor", | ||
| size: SizeDict, | ||
| interpolation: "F.InterpolationMode" = None, | ||
| **kwargs, | ||
| ) -> "torch.Tensor": | ||
| """ | ||
| Resize an image to `(size["height"], size["width"])`. | ||
|
|
||
| Args: | ||
| image (`torch.Tensor`): | ||
| Image to resize. | ||
| size (`SizeDict`): | ||
| Dictionary in the format `{"height": int, "width": int}` specifying the size of the output image. | ||
| resample (`InterpolationMode`, *optional*, defaults to `InterpolationMode.BILINEAR`): | ||
| `InterpolationMode` filter to use when resizing the image e.g. `InterpolationMode.BICUBIC`. | ||
|
|
||
| Returns: | ||
| `torch.Tensor`: The resized image. | ||
| """ | ||
| interpolation = interpolation if interpolation is not None else F.InterpolationMode.BILINEAR | ||
| pil_torch_interpolation_mapping_inverse = {v: k for k, v in pil_torch_interpolation_mapping.items()} | ||
| if isinstance(interpolation, F.InterpolationMode): | ||
| interpolation = pil_torch_interpolation_mapping_inverse[interpolation] | ||
| if size.shortest_edge and size.longest_edge: | ||
| # Resize the image so that the shortest edge or the longest edge is of the given size | ||
| # while maintaining the aspect ratio of the original image. | ||
| new_size = get_size_with_aspect_ratio( | ||
| image.size()[-2:], | ||
| size.shortest_edge, | ||
| size.longest_edge, | ||
| ) | ||
| elif size.shortest_edge: | ||
| new_size = get_resize_output_image_size( | ||
| image, | ||
| size=size.shortest_edge, | ||
| default_to_square=False, | ||
| input_data_format=ChannelDimension.FIRST, | ||
| ) | ||
| elif size.max_height and size.max_width: | ||
| new_size = get_image_size_for_max_height_width(image.size()[-2:], size.max_height, size.max_width) | ||
| elif size.height and size.width: | ||
| new_size = (size.height, size.width) | ||
| else: | ||
| raise ValueError( | ||
| "Size must contain 'height' and 'width' keys, or 'max_height' and 'max_width', or 'shortest_edge' key. Got" | ||
| f" {size}." | ||
| ) | ||
| # resize the image one by one as torchvision does not support batch resizing with LANCZOS interpolation | ||
| device = image.device | ||
| image_stack = [] | ||
| for img in image: | ||
| img = img.cpu().numpy() | ||
| img = resize( | ||
| img, | ||
| new_size, | ||
| resample=interpolation, | ||
| input_data_format=ChannelDimension.FIRST, | ||
| **kwargs, | ||
| ) | ||
| img = torch.from_numpy(img).contiguous().to(device) | ||
| image_stack.append(img) | ||
| return torch.stack(image_stack, dim=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this work. This is a good way to solve the Lanczos issue, but as you said, this slow down the processing a lot, so we might as well fall back to the slow processor.
There is an ongoing discussion about what's the best solution here, but basically the possibilities are to either sacrifice speed or a bit of accuracy. The alternative to falling back to slow processing would be to use bilinear resampling, which is quite close to Lanczos one.
I think we are probably leaning towards the latter alternative, mostly because we want all fast image processing to only use torch/torchvision functions, so that they are torch compilable and can perform processing fully on GPU.
So what you could do for now is still override resize, check if the resampling is Lanczos, if it is manually change it to Bicubic and log a warning (with logger.warning_once) to fall back to slow processing for full consistency with the original model, then call the parent resize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the solution! I followed #37045 for the warning. However, it still fails the speed test, how do you think should we solve it? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also sorry for the messy commits, I'm still familiarizing myself with github.
…ro/transformers into chameleon_fastimageprocessor
yonigozlan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @farrosalferro, thanks for iterating and sorry for the delay. Looks great! Just pushed some small updates to keep up with the changes in the library, but otherwise everything looks good to be merged. Thanks!
yonigozlan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry just saw there are issue with equivalence tests, probably due to using Bicubic vs Lanczos. Could you override both equivalence tests and force the slow processor to use Bicubic in the tests? Thanks
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]>
* Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]>
* Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]>
* Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]>
* Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]>
* Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]>
* Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: yonigozlan <[email protected]>
What does this PR do?
Add Fast Image Processor for Chameleon (Issue #36978
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@yonigozlan
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Note
I found a problem during the resizing operation as it seems the
InterpolationMode.LANCZOSresampling, which is the default resampling for Chameleon, is not supported within the torchvision resizing method. For instance, when running this code:I got this error
NotImplementedError: Input Error: Only 3D, 4D and 5D input Tensors supported (got 4D) for the modes: nearest | linear | bilinear | bicubic | trilinear | area | nearest-exact (got lanczos)My workaround is to transform it to Numpy array first, then apply the resizing function similar to the Chameleon's slow image processor. However, this results in a longer processing time than the slow image processor, which does not pass the
test_fast_is_faster_than_slowtest.Do you have any suggestion on how to solve this problem? Any advice would be appreciated. Thank you.