-
Notifications
You must be signed in to change notification settings - Fork 27.4k
Regression on split operator benchmark after __torch_function__ merge #30831
Copy link
Copy link
Closed
Labels
high prioritymodule: cpuCPU specific problem (e.g., perf, algorithm)CPU specific problem (e.g., perf, algorithm)module: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
(moved over from #30730 (comment), comment by @hl475)
Hi @ngoldbaum, we found there is some performance regression introduced from your PR. According to our benchmark, we found
================================================================================
Before the change, Program Output:
================================================================================
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cpu
# Input: M: 8, N: 8, parts: 2, device: cpu
Forward Execution Time (us) : 6.546
# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 6.443
# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cpu
# Input: M: 512, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 6.437
================================================================================
After the change, Program Output:
================================================================================
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cpu
# Input: M: 8, N: 8, parts: 2, device: cpu
Forward Execution Time (us) : 4.252
# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 4.142
# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cpu
# Input: M: 512, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 4.210
================================================================================
where before means current codebase, and after means remove the change in this PR.
To reproduce the benchmark, please read https://github.com/pytorch/pytorch/tree/master/benchmarks/operator_benchmark and run python -m pt.split_test.
cc @ezyang @gchanan @zou3519 @VitalyFedyunin @ngimel @mruberry @mingzhe09088
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
high prioritymodule: cpuCPU specific problem (e.g., perf, algorithm)CPU specific problem (e.g., perf, algorithm)module: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module