Skip to content

Conversation

@zou3519
Copy link
Contributor

@zou3519 zou3519 commented Sep 6, 2019

Stack from ghstack:

This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, c10/core/EnableNamedTensor, that sets BUILD_NAMEDTENSOR.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
test/test_namedtensor.py only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a unique_ptr<NamedTensorMetaInterface>
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:

  • [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: D17235279

This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]
@pytorchbot pytorchbot added module: ci Related to continuous integration module: internals Related to internal abstractions in c10 and ATen labels Sep 6, 2019
zou3519 added a commit that referenced this pull request Sep 6, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

ghstack-source-id: 56a1ca5
Pull Request resolved: #25798
@zou3519
Copy link
Contributor Author

zou3519 commented Sep 6, 2019

Benchmark results. I benchmarked copy_ (fundamental op), add (binary op), cos (unary op), matmul (more complicated name inference op) on CPU for very small workloads to measure overhead. The benchmark numbers are a little noisy, but the conclusion I made was that performance with and without the BUILD_NAMEDTENSOR flag is the same.

benchmark script (ipython): https://gist.github.com/zou3519/ecda01fde312edbef1d57d8cc1f7853f
before: https://gist.github.com/zou3519/e8ba592f5ba3a4040cd40befceb23a11
after: https://gist.github.com/zou3519/040795c7cc600190f877ba002c9b70ef

@zou3519 zou3519 requested a review from gchanan September 6, 2019 21:09
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 9, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

ghstack-source-id: e2a9713
Pull Request resolved: #25798
@zou3519 zou3519 requested a review from nairbv September 9, 2019 12:41
@nairbv
Copy link
Collaborator

nairbv commented Sep 9, 2019

Do we have any broader performance tests that look at both memory and time on both cpu and cuda for a range of operations over a longer runtime?

@zou3519
Copy link
Contributor Author

zou3519 commented Sep 9, 2019

Do we have any broader performance tests that look at both memory and time on both cpu and cuda for a range of operations over a longer runtime?

I could run imagenet for a few hours, but there are no long performance tests built into PyTorch.

This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 10, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 10, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 10, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
@zou3519
Copy link
Contributor Author

zou3519 commented Sep 10, 2019

Writing down the blockers for this:

  • Run on a big model (resnet18, 8 gpus) and check that there are no performance regressions.
  • Trigger a code size check (this is less of a concern because the feature is guarded when it comes to mobile).

Non-blockers

  • Minor changes that can be made in parallel (throwing experimental warnings, removing dotted names, code refactoring). These should be done before the release but I think that PyTorch users should expect nightly builds to be unstable.

This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 11, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 11, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 11, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
@zou3519
Copy link
Contributor Author

zou3519 commented Sep 11, 2019

Benchmarking Methodology

I benchmarked on one machine with 80 CPUs, 500G RAM, and 8 V100 cards.
I ran resnet18 from https://github.com/pytorch/examples/tree/master/imagenet
by using the following command:

python main.py -a resnet50 --dist-url 'tcp://127.0.0.1:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]

for 9 epochs for both runs, one after the other.
I measured how long each epoch took, in seconds, as well as periodically watched
CPU activity with htop and GPU activity with nvidia-smi.

Benchmarking Results

with this PR
epoch 0: 1374.9880466461182s
epoch 1: 1286.2552213668823s
epoch 2: 1319.2323479652405s
epoch 3: 1348.9674923419952s
epoch 4: 1348.0851469039917s
epoch 5: 1345.5131402015686s
epoch 6: 1319.1677832603455s
epoch 7: 1341.3908925056458s
epoch 8: 1310.145521402359s Acc@1 46.328 Acc@5 72.430

without this PR
epoch 0: 1378.9966588020325s
epoch 1: 1362.2592267990112s
epoch 2: 1339.5863161087036s
epoch 3: 1324.9453747272491s
epoch 4: 1325.4606726169586s
epoch 5: 1364.0598435401917s
epoch 6: 1346.6253719329834s
epoch 7: 1338.4234280586243s
epoch 8: 1330.864031791687s Acc@1 46.520 Acc@5 72.758

Analysis

  • The times are close together. They're pretty noisy, but
    the numbers imply that there is no major regression.
  • The accuracies are also close together. The difference is probably just due to
    random initialization and infrastructure non-determinisism, but both models are training.

CPU activity:

  • I watched htop for part of both runs.
  • Hard to quantify CPU activity. There was a good amount of CPU activity
    for both, but it's not like the CPU was at 100% utilization.
  • both runs caused the machine to use a total of 47.8 +/- 0.3 gb of RAM

GPU Activity:

  • I watched nvidia smi for part of both runs.
  • both runs utilized all the gpus at 100%
  • both runs used between 2100 and 2400 mb on each GPU.

Logs for each run available upon request.

Conclusion

The data supports that this PR does not introduce regressions in resnet18 epoch time.

@zou3519 zou3519 merged commit 6ae24f8 into gh/zou3519/155/base Sep 11, 2019
zou3519 added a commit that referenced this pull request Sep 11, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 11, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)
zou3519 added a commit that referenced this pull request Sep 11, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

Pull Request resolved: #25798

Differential Revision: [D17235279](https://our.internmc.facebook.com/intern/diff/D17235279)

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Sep 11, 2019
This PR enables BUILD_NAMEDTENSOR by default. This is done via including
a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`.
In the future, the plan is to get rid of the flag entirely: we can
incrementally delete usages after this PR goes in.

This PR also maintains the namedtensor ci vs regular ci distinction.
`test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified.
TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this
distinction later and send out an announcement about it; devs will be
responsible for named tensor failures after that.

The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we
could quickly prototype named tensor features without worrying about
adding overhead to the framework. The overheads can be categorized as
memory overhead and performance overhead.

Memory overhead: named tensors adds 1 additional word per Tensor. This
is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>`
field. This is not a lot of overhead.

Performance overhead: At all entry points to name inference, we check
if inputs to an op are named. If inputs are not named, we short-circuit
and don't do name inference. These calls should therefore be as
efficient as error-checking code and not take up a lot of time.

My plan is to benchmark a few functions and then post the results in a
comment to this PR.

Test Plan:
- [namedtensor ci]

ghstack-source-id: 15aadfd
Pull Request resolved: #25798
@zou3519
Copy link
Contributor Author

zou3519 commented Nov 19, 2019

I did some more stable microbenchmarking. The TL;DR is that named tensors add very little overhead at a microbenchmark scale.

Internal benchmark suite

Here are some internal benchmarks. https://our.internmc.facebook.com/intern/aibench/details/528043000 .
My experiments with them were a little noisy (~2% variance for a commit that does nothing). After is BUILD_NAMEDTENSOR=0, before is BUILD_NAMEDTENSOR=1.

torch.add unnamed performance using linux perf

I generated a flame graph of torch.add on two unnamed tensors each of size [1]. The time spent in name inference is roughly 1.9% of the total time of at::add and 1% of the total runtime of python torch.add. python torch.add takes roughly 6.6us to run; the total time spent in name inference is roughly 66 ns, which is very little.

image

torch.mm unnamed performance using linux perf

I generated a flame graph of torch.mm on two unnamed tensors each of size [1, 1].
Name inference is the boxed portion and is very roughly 0.2% (6ns) of the total runtime of python torch.mm (3us).
image

@zou3519
Copy link
Contributor Author

zou3519 commented Dec 5, 2019

More macrobenchmarking

I benchmarked on one machine with 80 CPUs, 500G RAM, and 8 V100 cards.
I ran resnet18 from https://github.com/pytorch/examples/tree/master/imagenet
by using the following command:

python main.py -a resnet50 --dist-url 'tcp://127.0.0.1:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]
for 15 epochs for both runs, one after the other. I measured how long each epoch took, in seconds.

To avoid measuring data loading time, I created dummy input and target tensors before all the epochs and only sent those into the model.

Results

Group A: On master, with named tensors turned on:

[560.28216296 563.92683095 561.3685565 560.774923 561.66448978
560.39293912 561.04192382 560.10590822 558.435179 558.54613051
557.84741974 558.80584198 556.46633288 549.08345211 556.39791152]

Group B: On top of master, with named tensors turned off:

[560.74388459 562.69248295 563.78086042 564.81708944 565.05355176
566.45378298 564.85195661 563.59880176 563.73104388 561.73447657
561.40261957 559.56541088 560.48721468 561.34041712 560.78032881]

Percent difference: (B - A) / A

[ 0.00082409 -0.00218884 0.00429718 0.00720818 0.00603396 0.01081535
0.00679099 0.00623613 0.0094834 0.00570829 0.00637307 0.00135927
0.00722574 0.02232259 0.00787641]

Analysis

From these results, it looks like turning named tensors off is actually makes the code run slower by ~0.6%, which can't be right. This is probably just due to the machine being noisy. I was the only user on the machine. One way to identify the margin of error is to send two identical runs to the machine and see how much they vary.

The numbers here support the hypothesis that there is no major regression to imagenet when turning named tensors on.

zou3519 added a commit that referenced this pull request Dec 6, 2019
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In #25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](#25798 (comment))
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](#25798 (comment))
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](#25798 (comment))
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](#25798 (comment)),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan:
- wait for CI

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Dec 6, 2019
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In #25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](#25798 (comment))
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](#25798 (comment))
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](#25798 (comment))
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](#25798 (comment)),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan:
- wait for CI

ghstack-source-id: 72c39f4
Pull Request resolved: #30894
zou3519 added a commit that referenced this pull request Dec 9, 2019
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging
- binary size reduction / pytorch modularity. I don't think this is a
major consideration because macros like this are not the best way to
achieve this.

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In #25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](#25798 (comment))
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](#25798 (comment))
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](#25798 (comment))
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](#25798 (comment)),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan:
- wait for CI

Differential Revision: [D18858543](https://our.internmc.facebook.com/intern/diff/D18858543)

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Dec 9, 2019
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging
- binary size reduction / pytorch modularity. I don't think this is a
major consideration because macros like this are not the best way to
achieve this.

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In #25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](#25798 (comment))
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](#25798 (comment))
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](#25798 (comment))
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](#25798 (comment)),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan:
- wait for CI

ghstack-source-id: 36a86e2
Pull Request resolved: #30894
facebook-github-bot pushed a commit that referenced this pull request Dec 10, 2019
Summary:
Pull Request resolved: #30894

This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In #25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](#25798 (comment))
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](#25798 (comment))
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](#25798 (comment))
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](#25798 (comment)),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan: - wait for CI

Differential Revision: D18858543

Pulled By: zou3519

fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13
zou3519 added a commit that referenced this pull request Dec 10, 2019
This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging
- binary size reduction / pytorch modularity. I don't think this is a
major consideration because macros like this are not the best way to
achieve this.

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In #25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](#25798 (comment))
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](#25798 (comment))
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](#25798 (comment))
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](#25798 (comment)),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan:
- wait for CI

Differential Revision: [D18858543](https://our.internmc.facebook.com/intern/diff/D18858543)

[ghstack-poisoned]
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
Summary:
Pull Request resolved: pytorch#30894

This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In pytorch#25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](pytorch#25798 (comment))
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](pytorch#25798 (comment))
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](pytorch#25798 (comment))
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](pytorch#25798 (comment)),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan: - wait for CI

Differential Revision: D18858543

Pulled By: zou3519

fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: ci Related to continuous integration module: internals Related to internal abstractions in c10 and ATen

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants