Add a keepdim parameter for reduction functions over a single dimension. #1492

gchanan · 2017-05-05T22:23:31Z

By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired mainly for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works as written because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)

By default, this parameter is False -- a backwards incompatible change, but one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter "keepdims" since you can pass multiple dims to reduction functions). The old behavior seems desired for normalization type operations where the tensor will immediately be expanded out again, e.g.: probs.sum(1).expand_as(probs) which no longer works because the dimension to expand is missing. This can be fixed by simply passing True as "keepdim" argument to the reduction operation, e.g: probs.sum(1, keepdim=True).expand_as(probs)

… fail). We shouldn't be introducing changes in legacy modules if we can avoid it.

The keepdim change only seems to leak in one place: when the grad_bias is returned in linear.py.

gchanan · 2017-05-05T22:24:31Z

This addresses #289.

apaszke · 2017-05-06T12:55:52Z

I'm not 100% convinced about defaulting to what numpy does. It would break a lot of user code 😕

soumith · 2017-05-06T14:51:02Z

i want to get this into 0.2, along with broadcasting.
To have correct broadcasting without introducing user bugs, we have to introduce squeezing dims on reduction. And considering it's a major release (and the user can grep for relevant code easily / will get a hard-error), I think i prefer this.

apaszke · 2017-05-06T15:01:01Z

Why is it needed for correct broadcasting?

apaszke · 2017-05-06T15:04:36Z

@pytorchbot add to whitelist

soumith · 2017-05-06T15:05:11Z

@apaszke see the comment here for why: #289 (comment)

apaszke · 2017-05-06T15:07:41Z

I'm not entirely convinced by that comment either 😕

gchanan · 2017-05-08T15:56:02Z

Thanks for bringing up what the default should be @apaszke and @soumith.

I don't think the argument is really "To have correct broadcasting without introducing user bugs, we have to introduce squeezing dims on reduction", but rather "broadcasting is already going to break backwards compatibility, so let's pay the price once to get numpy-style semantics."

To wit:

Broadcasting already breaks backwards compatibility -- i.e. a (1,4) x (4,1) tensor op becomes a (4,4) with broadcasting, compared to a (1,4) as we have now.
As mentioned above, automatically squeezing the dimensions is also backwards incompatible, but there are cases (squeeze dimension after mean / sum #289 (comment)) where introducing it with broadcasting obviates the need for user-level changes compared to introducing broadcasting alone. I don't have a good understanding for how prevalent this is, unfortunately.
Getting close to numpy semantics (broadcasting) without going all the way (or most of the way, I'm sure we are missing some things) seems more confusing, i.e. if the difference between PyTorch and numpy semantics are totally different I can probably keep them straight, but if they are really close, it's actually more difficult.
Removing the dimension by default is the consistent behavior: i.e. doing a reduction on a 1 dimension tensor without specify the dimension yields a scalar/0-dimensional tensor (torch.sum(x)) , but doing a reduction on a 1-dimensional tensor while specify the only dimensions yields a 1-dimensional tensor (torch.sum(x,0)). Note that this requires introduce torch.Scalar to represent scalars in autograd #1433 to get totally consistent semantics.
Making backwards incompatible changes are only going to get more difficult as usage of PyTorch increases and the code becomes more stable. This, combined with us breaking backwards compatibility for broadcasting suggests this is the best time.

That being said, I understand the other side of the argument; as a user, dealing with backwards incompatibilities is a real pain.

One possibility is deferring the decision until broadcasting is ready. It's not difficult to change the default at this point (now that I found the places that need keepdims=True), so I could change the default for now, we can get this in, and we can re-evaluate with broadcasting. That may give us more information on how prevalent the case mentioned in (#289 (comment)) is.

Thoughts?

If we change the default to False, reverting this commit is optional.

gchanan · 2017-05-08T18:57:01Z

The keepdim default is now True (NOTE: this was correct when the comment was written, the default is now False); the last two commits are split into:

Changing the default
Explicitly passing keepdim=False to tests that require it; i.e. can avoid reverting this if we change the default.

colesbury

LGTM.

I think in general it's makes the calling code more clear to use kwargs for keepdim=True, but if most of these call-sites are going away, it may not be worth making that change

soumith · 2017-05-09T23:30:30Z

this is now merged into master

jekbradbury · 2017-07-19T22:32:26Z

I'm a little confused -- I remembered the keepdim default as having changed to False and that's the behavior on current master but above it says "the keepdim default is now True". It's clear that keepdim=False is necessary for behavior like #289 (comment), but that wasn't possible in 0.1.12 so it's ok to require the kwarg. With keepdim=False as default the fairly common 0.1.12 patterns var2 = var1.sum(dim); var2.expand_as(var1) and var2 = var1.sum(dim); var2.squeeze(dim) both break, the latter without a Python traceback.

EDIT: the "True" comment is a typo, and there's a depwarn torch.utils.backcompat.keepdim_warning.enabled = True so that should cover the backwards compatibility concerns. Thanks Soumith for staying on top of these things and sorry for the noise.

gchanan · 2017-07-19T22:53:53Z

The comment "the keepdim warning is now True" is not a typo; it was correct when it was written, i.e. it is explaining the context of the in progress commits, which first added a keepdim parameter (default True), then made the default False.

I edited the comment to explain that it is out of date to hopefully avoid future confusion.

Avoid reduction/normalization scheduler if there's trivial reductions outside the reduction op.

…orch#1492) (pytorch#1510) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value (cherry picked from commit eb433b9)

…orch#1492) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value

…orch#1492) (pytorch#1509) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value (cherry picked from commit eb433b9)

…orch#1492) (pytorch#1510) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value (cherry picked from commit eb433b9) (cherry picked from commit 2827617)

gchanan added 7 commits May 5, 2017 15:10

Make keepdim work with autograd.

987f812

Add autograd tests for keepdim.

f6978a2

Change all legacy/nn modules to use keepdim=True (even if tests don't…

dc6974d

… fail). We shouldn't be introducing changes in legacy modules if we can avoid it.

Make (non-legacy) nn backwards compatible.

3571d04

The keepdim change only seems to leak in one place: when the grad_bias is returned in linear.py.

Add a keepdim test to torch_test.

2e47a93

Add documentation for keepdim.

e429e39

apaszke closed this May 7, 2017

apaszke reopened this May 7, 2017

gchanan added 2 commits May 8, 2017 08:25

Merge remote-tracking branch 'origin/master' into keepdim_v3

d5632f5

Fix test_normalize NN test.

4b98fda

gchanan added 2 commits May 8, 2017 11:40

Change keepdim default to False.

81fe1b0

Explicitly pass keepdim=False for tests that require it.

ad5a256

If we change the default to False, reverting this commit is optional.

colesbury approved these changes May 9, 2017

View reviewed changes

apaszke approved these changes May 9, 2017

View reviewed changes

Restore examples with keepdim=True default.

61745be

soumith closed this May 9, 2017

jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this pull request Mar 2, 2022

WAR For Issue pytorch#1487 (pytorch#1492)

7fcec1a

Avoid reduction/normalization scheduler if there's trivial reductions outside the reduction op.

Add a keepdim parameter for reduction functions over a single dimension. #1492

Add a keepdim parameter for reduction functions over a single dimension. #1492

Uh oh!

Conversation

gchanan commented May 5, 2017

Uh oh!

gchanan commented May 5, 2017

Uh oh!

apaszke commented May 6, 2017

Uh oh!

soumith commented May 6, 2017

Uh oh!

apaszke commented May 6, 2017

Uh oh!

apaszke commented May 6, 2017

Uh oh!

soumith commented May 6, 2017

Uh oh!

apaszke commented May 6, 2017

Uh oh!

gchanan commented May 8, 2017

Uh oh!

gchanan commented May 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

soumith commented May 9, 2017

Uh oh!

jekbradbury commented Jul 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gchanan commented Jul 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gchanan commented May 8, 2017 •

edited

Loading

jekbradbury commented Jul 19, 2017 •

edited

Loading