-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu} #22758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ernel.cu"
Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
gh-metadata: pytorch pytorch 22758 gh/xuhdev/10/head
…ernel.{cpp,cu}"
Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
gh-metadata: pytorch pytorch 22758 gh/xuhdev/10/head
|
Can we get some before/after numbers? |
…ernel.{cpp,cu}"
Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
gh-metadata: pytorch pytorch 22758 gh/xuhdev/10/head
…ernel.{cpp,cu}"
Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
gh-metadata: pytorch pytorch 22758 gh/xuhdev/10/head
…ernel.{cpp,cu}"
Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
gh-metadata: pytorch pytorch 22758 gh/xuhdev/10/head
|
I don't know how to trigger a benchmark run. @bddppq do you know? Although a benchmark run would show us how much impact this has a macro scale, it's nice (and faster) to just look at numbers on just the op to make sure we haven't changed the behavior too much. |
|
@xuhdev @zou3519 lol that benchmark was just for ROCm Caffe2 training. I think for this change, better to run benchmarks here cc @mingzhe09088 |
|
OK. Hopefully I didn't change anything significant to performance. I turned of CPU turbo and warmed up my GPU. Here's the benchmark: Before: After: |
…ernel.{cpp,cu}"
Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
gh-metadata: pytorch pytorch 22758 gh/xuhdev/10/head
How large is |
|
Oops, I meant Here's the script: import timeit
for n, t in [(10, 100000),
(1000, 10000)]:
print('a.fill_(10) (a.numel() == {}) for {} times'.format(n, t))
for device in ('cpu', 'cuda'):
print('device: ' + device)
for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64', 'torch.half', 'torch.float', 'torch.double'):
print(' dtype:' + dtype, end='\t\t')
print(timeit.timeit('a.fill_(10)', setup='import torch; a = torch.zeros({}, device="{}", dtype={})'.format(n, device, dtype), number=t)) |
|
There is an operator benchmark suite which reports the execution time of operators. Many operators have been supported and the code is in benchmarks/operator_benchmark/pt directory. Let me know if you guys are interested in using it for this case here. I can add a test to that directory. |
|
@mingzhe09088 That would be helpful. Thanks! |
|
Does this PR change anything other than the location of the functions? |
No, but I think the benchmark run was a safeguard measure to ensure nothing was changed accidentally. |
Sorry, I didn't realize that nothing else was changed. Thanks for catching that @colesbury. In general a dedicated benchmark for fill_ would be good but we don't have to block this PR on that. |
|
|
||
| using unary_fn = void(*)(TensorIterator&); | ||
|
|
||
| DECLARE_DISPATCH(void(*)(TensorIterator&, Scalar), fill_stub); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this just unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the definition got moved to Fill.h
Summary: Pull Request resolved: pytorch/pytorch#22758 Test Plan: Imported from OSS Differential Revision: D16257781 Pulled By: ezyang fbshipit-source-id: 9e5ed06e95ef65b036eb388488faad981f1e8012
Stack from ghstack:
Differential Revision: D16257781