Standardize semantic segmentation models outputs #15469

sgugger · 2022-02-01T21:23:48Z

What does this PR do?

This PR standardizes the model outputs for semantic segmentation models and creates an AutoModelForSemanticSegmentation class. As discussed internally, models for semantic segmentation should return logits of shape batch_size, num_labels, height, width, one logit per pixel.

Breaking change: The BeitForSemanticSegmentation and SegformerForSemanticSegmentation models have logits with the same height and width as the input after this PR (instead of height/4 and width/4).

To maintain some level of backward compatibility, the SemanticSegmentationModelOutput has a field legacy_logits that users can pick to get the old logits value.

Another possible road to be less breaking is to create new classes BeitForPixelClassification and SegformerForPixelClassification while deprecating the current ones, then name "instance segmentation" "pixel classification" everywhere. It has the benefit of being more understandable to the beginner and look like our "ForTokenClassification" classes but it might surprise an expert more used to the "semantic segmentation" name.

HuggingFaceDocBuilder · 2022-02-01T21:24:13Z

The documentation is not available anymore as the PR was closed or merged.

src/transformers/modeling_outputs.py

NielsRogge · 2022-02-03T08:35:26Z

src/transformers/modeling_outputs.py

+
+    Args:
+        loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided):
+            Classification (or regression if config.num_labels==1) loss.


Do we also want to tackle regression tasks with this output class?

Depth estimation is an example of per-pixel regression, but that's a different task.

We can add it to the docstring once it's supported. For now, the models don't use problem_type to infer the proper loss function.

NielsRogge

Looks good to me (wondering why the PR is called "standardize instance segmentation models outputs"?).

xxxForPixelClassification => both semantic and panoptic segmentation are pixel classification tasks, right? Although in semantic segmentation, one only predicts a single label per pixel, whereas one predicts two labels per pixel in case of panoptic segmentation. If we're going with xxxForPixelClassification and xxxForPanopticSegmentation, people would be confused and that's not really consistent. So I would stick to xxxForSemanticSegmentation.

There's also depth estimation, which is per-pixel regression (if I understand correctly, I'm not entirely familiar with it). Would we then have xxxForPixelRegression? However, currently we also tackle both regression and classification with xxxForSequenceClassification models.

NielsRogge · 2022-02-03T08:45:12Z

src/transformers/modeling_outputs.py

+        legacy_logits (`torch.FloatTensor` of shape `(batch_size, config.num_labels, height/4, width/4)`):
+            For backward compatibility, old logits returned by `BeitForSemanticSegmentation` and
+            `SegformerForSemanticSegmentation`.


Not sure if we should keep this. We could also call it raw_logits (which are the raw logits coming out of the model without any interpolation).

I have a doubt about the naming. logits is usually used to indicate un-normalized vectors used to compute a downstream task (e.g. classification after a softmax layer). If that is the case, return this output class is unfeasible for some models, e.g. MaskFormer.

In MaskFormer we compute the final segmentation masks using two vectors

def forward(self, *args, **kwargs): outputs: MaskFormerOutput = self.model(*args, **kwargs) # mask classes has shape [BATCH, QUERIES, CLASSES + 1] # remove the null class `[..., :-1]` masks_classes: Tensor = outputs.preds_logits.softmax(dim=-1)[..., :-1] # mask probs has shape [BATCH, QUERIES, HEIGHT, WIDTH] masks_probs: Tensor = outputs.preds_masks.sigmoid() # now we want to sum over the queries, # $ out_{c,h,w} = \sum_q p_{q,c} * m_{q,h,w} $ # where $ softmax(p) \in R^{q, c} $ is the mask classes # and $ sigmoid(m) \in R^{q, h, w}$ is the mask probabilities # b(atch)q(uery)c(lasses), b(atch)q(uery)h(eight)w(idth) segmentation: Tensor = torch.einsum("bqc, bqhw -> bchw", masks_classes, masks_probs) return MaskFormerForSemanticSegmentationOutput(segmentation=segmentation, **outputs)

Returning a vector logits of shape batch_size, config.num_labels, height, width) is impossible due. I see two possible solutions

@NielsRogge opened an issue to ask MaskFormer author if is possible to return logits from their model

We might have to treat MaskFormer with an exception (in the sense we will need the raw output to have both logits and prediction masks to be able to compute the segmentation) the same way we handle the model for QA with beam search in XLNert. That class would then not be in the AutoModelForSemanticSegmentation as it doesn't have the same API, but we would need one version of MaskFormerForSemanticSegmentation with the same API (it might perform as well as this fancy one with some fine-tuning).

Regarding legacy_logits, I wonder if it wouldn't be better, API-wise, to have the "old" logits returned according to a flag legacy_output or something similar. I understand the need for the breaking change, but I think it would be simpler on users if they didn't have to update their model outputs, but update their model instantiation.

I think it's easier as it doesn't require diving into their previously working code to see what needs to be changed, they would just need to update the config init/model init.

The model would then return a SequenceClassifierOutput with the previous values when instantiated with legacy_output, and would return the new, better SemanticSegmentationModelOutput when nothing is set.

I would also output a FutureWarning when users either instantiate the model, or during their first forward pass. I think the latter is best, as if they have not updated their code and it crashes, the warning will be just above their crash.

FrancescoSaverioZuppichini

Thanks! I've added a comment about logits in the output and why it may be challenging for MaskFormer to use it

FrancescoSaverioZuppichini · 2022-02-03T09:20:01Z

src/transformers/modeling_outputs.py

+        legacy_logits (`torch.FloatTensor` of shape `(batch_size, config.num_labels, height/4, width/4)`):
+            For backward compatibility, old logits returned by `BeitForSemanticSegmentation` and
+            `SegformerForSemanticSegmentation`.


I have a doubt about the naming. logits is usually used to indicate un-normalized vectors used to compute a downstream task (e.g. classification after a softmax layer). If that is the case, return this output class is unfeasible for some models, e.g. MaskFormer.

In MaskFormer we compute the final segmentation masks using two vectors

def forward(self, *args, **kwargs): outputs: MaskFormerOutput = self.model(*args, **kwargs) # mask classes has shape [BATCH, QUERIES, CLASSES + 1] # remove the null class `[..., :-1]` masks_classes: Tensor = outputs.preds_logits.softmax(dim=-1)[..., :-1] # mask probs has shape [BATCH, QUERIES, HEIGHT, WIDTH] masks_probs: Tensor = outputs.preds_masks.sigmoid() # now we want to sum over the queries, # $ out_{c,h,w} = \sum_q p_{q,c} * m_{q,h,w} $ # where $ softmax(p) \in R^{q, c} $ is the mask classes # and $ sigmoid(m) \in R^{q, h, w}$ is the mask probabilities # b(atch)q(uery)c(lasses), b(atch)q(uery)h(eight)w(idth) segmentation: Tensor = torch.einsum("bqc, bqhw -> bchw", masks_classes, masks_probs) return MaskFormerForSemanticSegmentationOutput(segmentation=segmentation, **outputs)

Returning a vector logits of shape batch_size, config.num_labels, height, width) is impossible due. I see two possible solutions

@NielsRogge opened an issue to ask MaskFormer author if is possible to return logits from their model

Co-authored-by: NielsRogge <[email protected]>

LysandreJik

I don't have sufficient experience with computer vision and these specific tasks to review the content of the PR, so my review is on the:

coherence of this auto model with the rest of the auto models
backwards compatibility

I think this PR is great as it's a good opportunity to observe if our "experimental" approach works or not. I think it shows that it lacks quite a few aspects that would make it robust:

We do not systematically mention that models are experimental on their modeling page, which is an issue. It should be very visible from the users.
This is an urgent refactor, but ideally this should go through a correct deprecation cycle. I think we need to be clear that this is an exceptional instance where we do not respect backwards compatibility for experimental reasons.
In order to patch this, I think we'll need to:
- Tweet (can be from personal accounts) about the update
- Create a forum post
- Create a pinned github issue

Thanks for working on this, @sgugger!

LysandreJik · 2022-02-03T15:23:35Z

src/transformers/modeling_outputs.py

+        legacy_logits (`torch.FloatTensor` of shape `(batch_size, config.num_labels, height/4, width/4)`):
+            For backward compatibility, old logits returned by `BeitForSemanticSegmentation` and
+            `SegformerForSemanticSegmentation`.


Regarding legacy_logits, I wonder if it wouldn't be better, API-wise, to have the "old" logits returned according to a flag legacy_output or something similar. I understand the need for the breaking change, but I think it would be simpler on users if they didn't have to update their model outputs, but update their model instantiation.

I think it's easier as it doesn't require diving into their previously working code to see what needs to be changed, they would just need to update the config init/model init.

The model would then return a SequenceClassifierOutput with the previous values when instantiated with legacy_output, and would return the new, better SemanticSegmentationModelOutput when nothing is set.

I would also output a FutureWarning when users either instantiate the model, or during their first forward pass. I think the latter is best, as if they have not updated their code and it crashes, the warning will be just above their crash.

LysandreJik · 2022-02-03T15:26:36Z

src/transformers/models/beit/modeling_beit.py

 from ...activations import ACT2FN
 from ...file_utils import add_start_docstrings, add_start_docstrings_to_model_forward, replace_return_docstrings
-from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, MaskedLMOutput, SequenceClassifierOutput
+from ...modeling_outputs import (


Unfortunately, BEIT does not have any "experimental" mention on its model page, so users are not aware that its API is in an experimental state.

I would advocate to add it to the templates by default (I can take care of that), and to set a reminder for 2 months when merging a model to consider removing the experimental mention.

LysandreJik · 2022-02-03T15:27:06Z

src/transformers/models/segformer/modeling_segformer.py

 from ...activations import ACT2FN
 from ...file_utils import add_start_docstrings, add_start_docstrings_to_model_forward, replace_return_docstrings
-from ...modeling_outputs import BaseModelOutput, SequenceClassifierOutput
+from ...modeling_outputs import BaseModelOutput, SemanticSegmentationModelOutput, SequenceClassifierOutput


Same for segformer, it lacks an experimental mention on the model page.

LysandreJik · 2022-02-03T15:30:34Z

src/transformers/models/auto/modeling_auto.py

+MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING_NAMES = OrderedDict(
+    [
+        # Model for Semantic Segmentation mapping
+        ("beit", "BeitForSemanticSegmentation"),
+        ("segformer", "SegformerForSemanticSegmentation"),
+    ]
+)
+


…o pixel_classification_models

LysandreJik

Thank you for iterating, now it looks good to me from a backwards compatibility perspective.

src/transformers/models/beit/modeling_beit.py

Co-authored-by: Lysandre Debut <[email protected]>

* Standardize instance segmentation models outputs * Rename output * Update src/transformers/modeling_outputs.py Co-authored-by: NielsRogge <[email protected]> * Add legacy argument to the config and model forward * Update src/transformers/models/beit/modeling_beit.py Co-authored-by: Lysandre Debut <[email protected]> * Copy fix in Segformer Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>

Standardize instance segmentation models outputs

521ba79

sgugger requested review from FrancescoSaverioZuppichini, LysandreJik and NielsRogge February 1, 2022 21:23

Rename output

e0d5a2e

NielsRogge reviewed Feb 3, 2022

View reviewed changes

src/transformers/modeling_outputs.py Outdated Show resolved Hide resolved

NielsRogge reviewed Feb 3, 2022

View reviewed changes

FrancescoSaverioZuppichini reviewed Feb 3, 2022

View reviewed changes

sgugger changed the title ~~Standardize instance segmentation models outputs~~ Standardize semantic segmentation models outputs Feb 3, 2022

Update src/transformers/modeling_outputs.py

1bed631

Co-authored-by: NielsRogge <[email protected]>

LysandreJik reviewed Feb 3, 2022

View reviewed changes

sgugger added 2 commits February 3, 2022 12:20

Add legacy argument to the config and model forward

38f5d13

Merge remote-tracking branch 'origin/pixel_classification_models' int…

181064e

…o pixel_classification_models

bowenc0221 mentioned this pull request Feb 3, 2022

Get raw logits per pixel facebookresearch/MaskFormer#60

Closed

LysandreJik approved these changes Feb 4, 2022

View reviewed changes

src/transformers/models/beit/modeling_beit.py Outdated Show resolved Hide resolved

sgugger and others added 3 commits February 4, 2022 14:19

Update src/transformers/models/beit/modeling_beit.py

90d3ce3

Co-authored-by: Lysandre Debut <[email protected]>

Copy fix in Segformer

8b3f48b

Merge branch 'master' into pixel_classification_models

3926ee8

sgugger merged commit ac6aa10 into master Feb 4, 2022

sgugger deleted the pixel_classification_models branch February 4, 2022 19:52

sgugger mentioned this pull request Feb 18, 2022

Revert changes in logit size for semantic segmentation models #15722

Merged

Standardize semantic segmentation models outputs #15469

Standardize semantic segmentation models outputs #15469

Uh oh!

Conversation

sgugger commented Feb 1, 2022

What does this PR do?

Uh oh!

HuggingFaceDocBuilder commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NielsRogge left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrancescoSaverioZuppichini Feb 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrancescoSaverioZuppichini left a comment

Choose a reason for hiding this comment

Uh oh!

FrancescoSaverioZuppichini Feb 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

HuggingFaceDocBuilder commented Feb 1, 2022 •

edited

Loading

NielsRogge left a comment •

edited

Loading

FrancescoSaverioZuppichini Feb 3, 2022 •

edited

Loading

FrancescoSaverioZuppichini Feb 3, 2022 •

edited

Loading