-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Refactor ViT-like models #39816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ViT-like models #39816
Conversation
|
run-slow: vit |
|
This comment contains run-slow, running the specified jobs: models: ['models/vit'] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: vit |
|
This comment contains run-slow, running the specified jobs: models: ['models/vit'] |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧼 very clean! The only nit is that we should not remove hs colllectionn when its core for the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢 it's so nice!
Really unbloats, thanks for working on this!
| """ | ||
| ) | ||
| class BitBackbone(BitPreTrainedModel, BackboneMixin): | ||
| has_attentions = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not that bat TBH! explicits that there is not attention
| @check_model_inputs | ||
| def _forward_with_additional_outputs( | ||
| self, pixel_values: torch.Tensor, **kwargs: Unpack[TransformersKwargs] | ||
| ) -> BaseModelOutput: | ||
| """Additional forward to capture intermediate outputs by `check_model_inputs` decorator""" | ||
| embedding_output = self.embeddings(pixel_values) | ||
| output = self.encoder(embedding_output) | ||
| return output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, here it is a place where it does not make sense to hide the collection of hidden_states, because it would always be set to True since you NEED them for feature maps right?
| test_resize_embeddings = False | ||
| test_head_masking = False | ||
| test_torch_exportable = True | ||
| test_torch_exportable = False # broken by output recording refactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we can leave this as a todo, but we can also weed out the part to have the decorator support export. The thing that is not exportable should be easy to remove to force output_hidden_states for example!
A TODO is fine, let's add # FIXME:
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adressing my comments and cleanup this all up!
| pixel_values, output_hidden_states=True, **kwargs | ||
| ) | ||
| embedding_output = self.embeddings(pixel_values) | ||
| output: BaseModelOutput = self.encoder(embedding_output, output_hidden_states=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not seeing (form the diff at least) a place where this would be false! if false is never an option we need to hardcode allways returning them!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the idea is always to return them from the encoder (to make feature maps), but not always in the model output, so there are 2 options
- feature_maps and hidden_states
- only feature_maps (if output_hidden_states=False, in this case, we still need
hidden_statesfrom the encoder to buildfeature_maps, but we do not put them inModelOutput. Not sure if there are any benefits, but the tests are designed this way.)
|
[For maintainers] Suggested jobs to run (before merge) run-slow: audio_spectrogram_transformer, bit, convnext, convnextv2, deepseek_vl_hybrid, deit, depth_anything, depth_pro, dinov2, dinov2_with_registers, dpt, eomt, focalnet, got_ocr2, hgnet_v2, ijepa |
What does this PR do?
Refactor ViT and dependent models to use
@check_model_inputsand@can_return_tupledecorator to remove all the logic for intermediatehidden_statesandattentionscapture