-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Turn on BUILD_NAMEDTENSOR permanently #25798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci]
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] ghstack-source-id: 56a1ca5 Pull Request resolved: #25798
|
Benchmark results. I benchmarked benchmark script (ipython): https://gist.github.com/zou3519/ecda01fde312edbef1d57d8cc1f7853f |
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] ghstack-source-id: e2a9713 Pull Request resolved: #25798
|
Do we have any broader performance tests that look at both memory and time on both cpu and cuda for a range of operations over a longer runtime? |
I could run imagenet for a few hours, but there are no long performance tests built into PyTorch. |
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
|
Writing down the blockers for this:
Non-blockers
|
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
Benchmarking MethodologyI benchmarked on one machine with 80 CPUs, 500G RAM, and 8 V100 cards. for 9 epochs for both runs, one after the other. Benchmarking Resultswith this PR without this PR Analysis
CPU activity:
GPU Activity:
Logs for each run available upon request. ConclusionThe data supports that this PR does not introduce regressions in resnet18 epoch time. |
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Pull Request resolved: #25798 Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279) [ghstack-poisoned]
This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] ghstack-source-id: 15aadfd Pull Request resolved: #25798
|
I did some more stable microbenchmarking. The TL;DR is that named tensors add very little overhead at a microbenchmark scale. Internal benchmark suiteHere are some internal benchmarks. https://our.internmc.facebook.com/intern/aibench/details/528043000 . torch.add unnamed performance using linux
|
More macrobenchmarkingI benchmarked on one machine with 80 CPUs, 500G RAM, and 8 V100 cards. python main.py -a resnet50 --dist-url 'tcp://127.0.0.1:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] To avoid measuring data loading time, I created dummy input and target tensors before all the epochs and only sent those into the model. ResultsGroup A: On master, with named tensors turned on: [560.28216296 563.92683095 561.3685565 560.774923 561.66448978 Group B: On top of master, with named tensors turned off: [560.74388459 562.69248295 563.78086042 564.81708944 565.05355176 Percent difference: (B - A) / A [ 0.00082409 -0.00218884 0.00429718 0.00720818 0.00603396 0.01081535 AnalysisFrom these results, it looks like turning named tensors off is actually makes the code run slower by ~0.6%, which can't be right. This is probably just due to the machine being noisy. I was the only user on the machine. One way to identify the margin of error is to send two identical runs to the machine and see how much they vary. The numbers here support the hypothesis that there is no major regression to imagenet when turning named tensors on. |
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In #25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](#25798 (comment)) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](#25798 (comment)) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](#25798 (comment)) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](#25798 (comment)), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI [ghstack-poisoned]
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In #25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](#25798 (comment)) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](#25798 (comment)) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](#25798 (comment)) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](#25798 (comment)), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI ghstack-source-id: 72c39f4 Pull Request resolved: #30894
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging - binary size reduction / pytorch modularity. I don't think this is a major consideration because macros like this are not the best way to achieve this. Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In #25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](#25798 (comment)) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](#25798 (comment)) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](#25798 (comment)) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](#25798 (comment)), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI Differential Revision: [D18858543](https://our.internmc.facebook.com/intern/diff/D18858543) [ghstack-poisoned]
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging - binary size reduction / pytorch modularity. I don't think this is a major consideration because macros like this are not the best way to achieve this. Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In #25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](#25798 (comment)) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](#25798 (comment)) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](#25798 (comment)) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](#25798 (comment)), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI ghstack-source-id: 36a86e2 Pull Request resolved: #30894
Summary: Pull Request resolved: #30894 This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In #25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](#25798 (comment)) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](#25798 (comment)) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](#25798 (comment)) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](#25798 (comment)), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI Differential Revision: D18858543 Pulled By: zou3519 fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging - binary size reduction / pytorch modularity. I don't think this is a major consideration because macros like this are not the best way to achieve this. Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In #25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](#25798 (comment)) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](#25798 (comment)) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](#25798 (comment)) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](#25798 (comment)), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI Differential Revision: [D18858543](https://our.internmc.facebook.com/intern/diff/D18858543) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch#30894 This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In pytorch#25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](pytorch#25798 (comment)) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](pytorch#25798 (comment)) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](pytorch#25798 (comment)) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](pytorch#25798 (comment)), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI Differential Revision: D18858543 Pulled By: zou3519 fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13


Stack from ghstack:
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header,
c10/core/EnableNamedTensor, that setsBUILD_NAMEDTENSOR.In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.
This PR also maintains the namedtensor ci vs regular ci distinction.
test/test_namedtensor.pyonly runs if TEST_NAMEDTENSOR=1 is specified.TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.
The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.
Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a
unique_ptr<NamedTensorMetaInterface>field. This is not a lot of overhead.
Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.
My plan is to benchmark a few functions and then post the results in a
comment to this PR.
Test Plan:
Pull Request resolved: #25798
Differential Revision: D17235279