Skip to content

Conversation

@Chillee
Copy link
Collaborator

@Chillee Chillee commented Jul 3, 2019

#3867

The implementation is taken from here (with some comments annotating the logic): https://github.com/mlperf/inference/blob/master/others/edge/object_detection/ssd_mobilenet/pytorch/utils.py#L40

In addition, I've stress tested it for all combinations of input image, kernel size, stride, and dilation from 1-6 (6^8 combinations). For all of them, the output dimensions follow the rules listed here for "same": https://www.tensorflow.org/api_docs/python/tf/nn/convolution

output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

I still need to add tests and test JIT compatability. Then I can tackle the other ones.

@Chillee Chillee requested review from fmassa and soumith July 3, 2019 07:20
@pytorchbot pytorchbot added the module: nn Related to torch.nn label Jul 3, 2019
@Chillee Chillee requested a review from jamesr66a July 3, 2019 07:29
@Chillee Chillee force-pushed the paddingsame branch 2 times, most recently from 6764c31 to 5d695a4 Compare July 3, 2019 07:33
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my view, I think we should have 1:1 parity between the functional and the module interface.

This means that we should probably push the "same" handling down to the functional interface, but I'll let the others chime in as well.

@Chillee
Copy link
Collaborator Author

Chillee commented Jul 6, 2019

@fmassa I initially did it this way as it seems that Soumith implied that it should be separate from the functional interface: #3867 (comment)

I continued doing it this way because it seemed like there already were differences between the functional and module interfaces. For example, the module interface supports an argument padding_mode that can be circular or zeros. I assumed that things were done this way as circular padding also requires a separate F.pad before the conv is called.

Putting it in the functional interface also requires delving substantially deeper into the darker sides of Pytorch, and it wasn't clear to me where the appropriate location was, nor whether it would break something to have one op actually become 2 ops.

@apaszke
Copy link
Contributor

apaszke commented Jul 8, 2019

Uhhh are we sure we want this? I remember @Yangqing strongly arguing against adding this option as it's often quite surprising.

@Chillee
Copy link
Collaborator Author

Chillee commented Jul 8, 2019

@apaszke Do you remember any of the reasons he gave for strongly warning against this?

@soumith mentioned that dynamically calculating padding is bad for a "variety of efficiency/serialization reasons", but these concerns don't seem relevant for research users.

On the other hand, for research users this is an unnecessary pain point. I've had multiple people complain that this isn't in Pytorch. I'm not convinced that the performance/serialization issues are worth the cost here.

@soumith
Copy link
Contributor

soumith commented Jul 8, 2019

yes, we want this. YQ's concerns were around exported models not being static, and the ONNX export pass needs to error out if the padding info cannot be statically determined (@Chillee you might have to add a test for the ONNX export).

At this point, the user need and benefits outweigh not wanting it, and our JIT / production export now can comfortably handle dynamic models (it wasn't true when we last thought about this)

@fmassa
Copy link
Member

fmassa commented Jul 8, 2019

@Chillee I wasn't aware that circular padding was added only in the nn.Module interface.

I'm less sure we want to diverge the functionalities of both the module and functional interfaces, because some things are only possible to be done with the functional interface (for example, when we want to have the weights of a model to be the output of a network).

But I'll let @soumith and @apaszke chime in on their views

@soumith
Copy link
Contributor

soumith commented Jul 8, 2019

my main thinking around matching everything in the functional interface was around how much state we can carry to reuse previous calculations. We can carry state across forward calls in the Module, but not in the Functional interface.

I didn't explore the line of thought around how much state (such as pre-calculating padding values etc.) can be carried across iterations and how much benefits it brings.

@Chillee
Copy link
Collaborator Author

Chillee commented Jul 8, 2019

@fmassa One issue with putting it in the functional interface is that the semantics around auto-calculating "static padding" become fairly muddled. Currently, we can auto-calculate static padding and have it for free if certain rules are followed (stride=1, (filter-1)*dilation is divisible by 2).

It's not clear how we would do this in the functional interface.

I think the right place to add this in the functional interface would be in the F.pad function or something along those lines.

@Chillee Chillee changed the title [WIP] Added "same" padding for conv*d Added "same" padding for conv*d Jul 9, 2019
@Chillee
Copy link
Collaborator Author

Chillee commented Jul 9, 2019

Added tests and fixed JIT compatibility. Removed the WIP tag - I think this is ready for review.

@Chillee Chillee requested a review from fmassa July 9, 2019 05:49
@fmassa
Copy link
Member

fmassa commented Jul 9, 2019

@Chillee

One issue with putting it in the functional interface is that the semantics around auto-calculating "static padding" become fairly muddled. Currently, we can auto-calculate static padding and have it for free if certain rules are followed (stride=1, (filter-1)*dilation is divisible by 2).

I don't quite get this. The overhead of calculation of if the padding is static or not is negligible wrt the convolution time, so doing this in the functional codepath would not be a problem at all.

It's not clear how we would do this in the functional interface.

Here is an example

def _conv2d(...):
     # same as conv2d now

def conv2d(...):
    is_static, padding = _compute_if_padding_is_static(...)
    if is_static:
        input = pad(input, padding, etc)
        return _conv2d(input, ...)
    return _conv2d(input, ...)

I think the right place to add this in the functional interface would be in the F.pad function or something along those lines.

I actually think we should instead add support for different paddings on left / right / top / bottom. This would be a generalization of what we currently have.
But this is a bit more involved to do.

@Chillee
Copy link
Collaborator Author

Chillee commented Jul 9, 2019

@fmassa I guess it wasn't clear to me what "static padding" actually meant. If that's still considered static padding, then I'm fine with doing it in the functional interface. Would you be able to provide a pointer to what I should be changing?

Also, as a side note, the documentation for F.conv2d isn't quite correct: https://pytorch.org/docs/stable/nn.html#id22

It says it supports padding_mode but it doesn't.

@pytorchbot pytorchbot added module: operators module: internals Related to internal abstractions in c10 and ATen labels Jul 13, 2019
@spandantiwari
Copy link

yes, we want this. YQ's concerns were around exported models not being static, and the ONNX export pass needs to error out if the padding info cannot be statically determined (@Chillee you might have to add a test for the ONNX export).

At this point, the user need and benefits outweigh not wanting it, and our JIT / production export now can comfortably handle dynamic models (it wasn't true when we last thought about this)

I agree with @soumith 's comment in that ONNX Conv op will support similar semantics as same being here, with the auto_pad option SAME_UPPER. This option was previously on path to deprecation, but due to clear use cases has since been retained in the design. Given that I feel that this can be handled for ONNX export.

@eellison
Copy link
Contributor

@Chillee do you want someone else to take on finishing this ? It would be nice to get this in for 1.3

@Chillee
Copy link
Collaborator Author

Chillee commented Sep 16, 2019

@eellison I've been busy (and will be busy) until the 23rd, but I should be able to finish it up in the week after that (which should be at least a week before 1.3?).

@eellison
Copy link
Contributor

I think the cut is the 24th actually... although potentially we could cherry-pick this in

@Chillee
Copy link
Collaborator Author

Chillee commented Sep 19, 2019

Ah alright I'll take a crack tonight at fixing it then.

@eellison
Copy link
Contributor

@ZolotukhinM @jamesr66a @raghuramank100 this PR breaks convolution tests. Do you think it's doable to land this PR before the release and have quant parity with it ?

@Chillee
Copy link
Collaborator Author

Chillee commented Sep 23, 2019

To clarify a bit more, the tests that break are the ones that look like (the last line specifically).

        FileCheck().check('ClassType<Observer> = prim::GetAttr[name="observer_for_') \
                   .check_next('prim::CallMethod[name="forward"](%observer_for_') \
                   .check('ClassType<Observer> = prim::GetAttr[name="observer_for_') \
                   .check_next('prim::CallMethod[name="forward"](%observer_for_') \
                   .check('Tensor = aten::conv2d') \

The tests pass as long as I comment that out - it's not clear to me if that's the right thing to do.

@eellison
Copy link
Contributor

@Chillee quantization has a mirror'd conv implementation so to maintain parity they have to do more than fix the tests. Since the branch cut is today/tomorrow quant people (cc @jamesr66a) think it would be better to wait until after the cut to land rather than rush in a quant fix.

I forgot that quantization needed to maintain parity when I commented above about getting it into the release, that's my bad.

@Yangqing
Copy link
Contributor

Yangqing commented Oct 9, 2019

Not that it matters, but I have a python notebook detailing the design choices:

https://gist.github.com/Yangqing/47772de7eb3d5dbbff50ffb0d7a98964

As an architect, I still believe that the SAME padding is not something that we should introduce back into the common usage patterns. This being said, I was the person that included SAME paddings in ONNX for compatibility reasons, so I can understand why one might want it.

@ZolotukhinM
Copy link

I somehow missed the notification 16 days ago, sorry.

To clarify a bit more, the tests that break are the ones that look like (the last line specifically).

        FileCheck().check('ClassType<Observer> = prim::GetAttr[name="observer_for_') \
                   .check_next('prim::CallMethod[name="forward"](%observer_for_') \
                   .check('ClassType<Observer> = prim::GetAttr[name="observer_for_') \
                   .check_next('prim::CallMethod[name="forward"](%observer_for_') \
                   .check('Tensor = aten::conv2d') \

The tests pass as long as I comment that out - it's not clear to me if that's the right thing to do.

Could you please paste what graph do we actually have there? To print it, you could add

print(str(m._c._get_module("conv")._get_method('conv2d_forward').graph))

before the FileCheck invocation.

I'm pretty sure this test can be adjusted to the changes you're doing - it basically just checks that after the transformation we have instrumentation calls to "observers" before the conv call that we're instrumenting. It essentially looks for specific strings and my guess is that conv now looks slightly differently.

@eellison
Copy link
Contributor

@Chillee , @ZolotukhinM @jamesr66a what can we do to land this

@Chillee
Copy link
Collaborator Author

Chillee commented Nov 18, 2019

Sorry I've been busy recently - will get back to this within a week or two.

@Chillee
Copy link
Collaborator Author

Chillee commented Feb 10, 2022

Somebody else pushed it across the finish line when I couldn't :) #45667

@Chillee Chillee closed this Feb 10, 2022
@github-actions github-actions bot deleted the paddingsame branch February 6, 2024 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: pybind Related to our Python bindings / interactions with other Python libraries oncall: jit Add this issue/PR to JIT oncall triage queue open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.