-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add a keepdim parameter for reduction functions over a single dimension. #1492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
By default, this parameter is False -- a backwards incompatible change, but one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter "keepdims" since you can pass multiple dims to reduction functions). The old behavior seems desired for normalization type operations where the tensor will immediately be expanded out again, e.g.: probs.sum(1).expand_as(probs) which no longer works because the dimension to expand is missing. This can be fixed by simply passing True as "keepdim" argument to the reduction operation, e.g: probs.sum(1, keepdim=True).expand_as(probs)
… fail). We shouldn't be introducing changes in legacy modules if we can avoid it.
The keepdim change only seems to leak in one place: when the grad_bias is returned in linear.py.
|
This addresses #289. |
|
I'm not 100% convinced about defaulting to what numpy does. It would break a lot of user code 😕 |
|
i want to get this into 0.2, along with broadcasting. |
|
Why is it needed for correct broadcasting? |
|
@pytorchbot add to whitelist |
|
@apaszke see the comment here for why: #289 (comment) |
|
I'm not entirely convinced by that comment either 😕 |
|
Thanks for bringing up what the default should be @apaszke and @soumith. I don't think the argument is really "To have correct broadcasting without introducing user bugs, we have to introduce squeezing dims on reduction", but rather "broadcasting is already going to break backwards compatibility, so let's pay the price once to get numpy-style semantics." To wit:
That being said, I understand the other side of the argument; as a user, dealing with backwards incompatibilities is a real pain. One possibility is deferring the decision until broadcasting is ready. It's not difficult to change the default at this point (now that I found the places that need keepdims=True), so I could change the default for now, we can get this in, and we can re-evaluate with broadcasting. That may give us more information on how prevalent the case mentioned in (#289 (comment)) is. Thoughts? |
If we change the default to False, reverting this commit is optional.
|
The keepdim default is now True (NOTE: this was correct when the comment was written, the default is now False); the last two commits are split into:
|
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think in general it's makes the calling code more clear to use kwargs for keepdim=True, but if most of these call-sites are going away, it may not be worth making that change
|
this is now merged into master |
|
I'm a little confused -- I remembered the EDIT: the "True" comment is a typo, and there's a depwarn |
|
The comment "the keepdim warning is now True" is not a typo; it was correct when it was written, i.e. it is explaining the context of the in progress commits, which first added a keepdim parameter (default True), then made the default False. I edited the comment to explain that it is out of date to hopefully avoid future confusion. |
Avoid reduction/normalization scheduler if there's trivial reductions outside the reduction op.
…orch#1492) (pytorch#1510) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value (cherry picked from commit eb433b9)
…orch#1492) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value
…orch#1492) (pytorch#1509) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value (cherry picked from commit eb433b9)
…orch#1492) (pytorch#1510) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value (cherry picked from commit eb433b9) (cherry picked from commit 2827617)
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).
The old behavior seems desired mainly for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works as written because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)