Skip to content

Conversation

@gchanan
Copy link
Contributor

@gchanan gchanan commented May 5, 2017

By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired mainly for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works as written because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)

gchanan added 7 commits May 5, 2017 15:10
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
… fail).

We shouldn't be introducing changes in legacy modules if we can avoid it.
The keepdim change only seems to leak in one place:
when the grad_bias is returned in linear.py.
@gchanan
Copy link
Contributor Author

gchanan commented May 5, 2017

This addresses #289.

@apaszke
Copy link
Contributor

apaszke commented May 6, 2017

I'm not 100% convinced about defaulting to what numpy does. It would break a lot of user code 😕

@soumith
Copy link
Contributor

soumith commented May 6, 2017

i want to get this into 0.2, along with broadcasting.
To have correct broadcasting without introducing user bugs, we have to introduce squeezing dims on reduction. And considering it's a major release (and the user can grep for relevant code easily / will get a hard-error), I think i prefer this.

@apaszke
Copy link
Contributor

apaszke commented May 6, 2017

Why is it needed for correct broadcasting?

@apaszke
Copy link
Contributor

apaszke commented May 6, 2017

@pytorchbot add to whitelist

@soumith
Copy link
Contributor

soumith commented May 6, 2017

@apaszke see the comment here for why: #289 (comment)

@apaszke
Copy link
Contributor

apaszke commented May 6, 2017

I'm not entirely convinced by that comment either 😕

@apaszke apaszke closed this May 7, 2017
@apaszke apaszke reopened this May 7, 2017
@gchanan
Copy link
Contributor Author

gchanan commented May 8, 2017

Thanks for bringing up what the default should be @apaszke and @soumith.

I don't think the argument is really "To have correct broadcasting without introducing user bugs, we have to introduce squeezing dims on reduction", but rather "broadcasting is already going to break backwards compatibility, so let's pay the price once to get numpy-style semantics."

To wit:

  • Broadcasting already breaks backwards compatibility -- i.e. a (1,4) x (4,1) tensor op becomes a (4,4) with broadcasting, compared to a (1,4) as we have now.
  • As mentioned above, automatically squeezing the dimensions is also backwards incompatible, but there are cases (squeeze dimension after mean / sum #289 (comment)) where introducing it with broadcasting obviates the need for user-level changes compared to introducing broadcasting alone. I don't have a good understanding for how prevalent this is, unfortunately.
  • Getting close to numpy semantics (broadcasting) without going all the way (or most of the way, I'm sure we are missing some things) seems more confusing, i.e. if the difference between PyTorch and numpy semantics are totally different I can probably keep them straight, but if they are really close, it's actually more difficult.
  • Removing the dimension by default is the consistent behavior: i.e. doing a reduction on a 1 dimension tensor without specify the dimension yields a scalar/0-dimensional tensor (torch.sum(x)) , but doing a reduction on a 1-dimensional tensor while specify the only dimensions yields a 1-dimensional tensor (torch.sum(x,0)). Note that this requires introduce torch.Scalar to represent scalars in autograd #1433 to get totally consistent semantics.
  • Making backwards incompatible changes are only going to get more difficult as usage of PyTorch increases and the code becomes more stable. This, combined with us breaking backwards compatibility for broadcasting suggests this is the best time.

That being said, I understand the other side of the argument; as a user, dealing with backwards incompatibilities is a real pain.

One possibility is deferring the decision until broadcasting is ready. It's not difficult to change the default at this point (now that I found the places that need keepdims=True), so I could change the default for now, we can get this in, and we can re-evaluate with broadcasting. That may give us more information on how prevalent the case mentioned in (#289 (comment)) is.

Thoughts?

gchanan added 2 commits May 8, 2017 11:40
If we change the default to False, reverting this commit is optional.
@gchanan
Copy link
Contributor Author

gchanan commented May 8, 2017

The keepdim default is now True (NOTE: this was correct when the comment was written, the default is now False); the last two commits are split into:

  • Changing the default
  • Explicitly passing keepdim=False to tests that require it; i.e. can avoid reverting this if we change the default.

Copy link
Member

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I think in general it's makes the calling code more clear to use kwargs for keepdim=True, but if most of these call-sites are going away, it may not be worth making that change

@soumith
Copy link
Contributor

soumith commented May 9, 2017

this is now merged into master

@soumith soumith closed this May 9, 2017
@jekbradbury
Copy link
Contributor

jekbradbury commented Jul 19, 2017

I'm a little confused -- I remembered the keepdim default as having changed to False and that's the behavior on current master but above it says "the keepdim default is now True". It's clear that keepdim=False is necessary for behavior like #289 (comment), but that wasn't possible in 0.1.12 so it's ok to require the kwarg. With keepdim=False as default the fairly common 0.1.12 patterns var2 = var1.sum(dim); var2.expand_as(var1) and var2 = var1.sum(dim); var2.squeeze(dim) both break, the latter without a Python traceback.

EDIT: the "True" comment is a typo, and there's a depwarn torch.utils.backcompat.keepdim_warning.enabled = True so that should cover the backwards compatibility concerns. Thanks Soumith for staying on top of these things and sorry for the noise.

@gchanan
Copy link
Contributor Author

gchanan commented Jul 19, 2017

The comment "the keepdim warning is now True" is not a typo; it was correct when it was written, i.e. it is explaining the context of the in progress commits, which first added a keepdim parameter (default True), then made the default False.

I edited the comment to explain that it is out of date to hopefully avoid future confusion.

jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this pull request Mar 2, 2022
Avoid reduction/normalization scheduler if there's trivial reductions outside the reduction op.
petrex pushed a commit to petrex/pytorch that referenced this pull request Aug 29, 2024
…orch#1492) (pytorch#1510)

* cudagraph explicit sync only after capture_begin

* use 'capture_dev_=-1' as not initialized value

* use named constant instead of magic '-1' value

(cherry picked from commit eb433b9)
jagadish-amd pushed a commit to jagadish-amd/pytorch that referenced this pull request Sep 5, 2024
…orch#1492)

* cudagraph explicit sync only after capture_begin

* use 'capture_dev_=-1' as not initialized value

* use named constant instead of magic '-1' value
jagadish-amd pushed a commit to jagadish-amd/pytorch that referenced this pull request Sep 5, 2024
…orch#1492) (pytorch#1509)

* cudagraph explicit sync only after capture_begin

* use 'capture_dev_=-1' as not initialized value

* use named constant instead of magic '-1' value

(cherry picked from commit eb433b9)
jagadish-amd pushed a commit to jagadish-amd/pytorch that referenced this pull request Jan 14, 2025
…orch#1492) (pytorch#1510)

* cudagraph explicit sync only after capture_begin

* use 'capture_dev_=-1' as not initialized value

* use named constant instead of magic '-1' value

(cherry picked from commit eb433b9)
(cherry picked from commit 2827617)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants