Numerical stability of embedding kernels #22401

madsbk · 2019-07-01T19:10:36Z

Address the issue raised in #22377.

The PR #22016 introduces a temporary tensor of weights grad_weight_per_segment of the same dtype as the end result, which can be a problem when using float16.
In this PR, it now use a float32 temporary tensor when the input is float16.

@ngimel, can I get you to review? I think I have fixed the issues you have pointed out.

…w `acc_type`

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel

there is still partials_per_segment_offset device_vector, and direct access to its elements.

madsbk · 2019-07-01T19:54:27Z

It was my impression that the policy in thrust::exclusive_scan would make it use the PyTorch allocator?

The direct access to the two elements is unfortunately necessary. It need to know the size of partial_segment_offset and how many threads to spawn.

ngimel · 2019-07-01T20:07:23Z

device_vector with default allocator template argument https://github.com/pytorch/pytorch/pull/22401/files#diff-5e16509ef3a09789b80e5d16f8dc3062R248 won't use any specified policy, as there's no policy argument in the constructor.
It's unfortunate about accesses to device memory, if this results in the slowdown for other workloads perhaps the old path will need to be brought back (also it will be needed for cuda graphs later on), but for now it's not a blocker. device_vector is, though.

madsbk · 2019-07-01T20:29:01Z

Make sense, thanks

ngimel

approving, hope this will fix @chenyangyu1988 accuracy problems.

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-07-02T16:07:25Z

@mrshenli merged this pull request in c9a8413.

Summary: Address the issue raised in pytorch/pytorch#22377. The PR pytorch/pytorch#22016 introduces a temporary tensor of weights `grad_weight_per_segment` of the same dtype as the end result, which can be a problem when using `float16`. In this PR, it now use a `float32` temporary tensor when the input is `float16`. ngimel, can I get you to review? I think I have fixed the issues you have pointed out. Pull Request resolved: pytorch/pytorch#22401 Differential Revision: D16077319 Pulled By: mrshenli fbshipit-source-id: 7cfad7f40b4d41a244052baa2982ab51bbbd7309

Summary: Address the issue raised in pytorch#22377. The PR pytorch#22016 introduces a temporary tensor of weights `grad_weight_per_segment` of the same dtype as the end result, which can be a problem when using `float16`. In this PR, it now use a `float32` temporary tensor when the input is `float16`. ngimel, can I get you to review? I think I have fixed the issues you have pointed out. Pull Request resolved: pytorch#22401 Differential Revision: D16077319 Pulled By: mrshenli fbshipit-source-id: 7cfad7f40b4d41a244052baa2982ab51bbbd7309

madsbk added 3 commits July 1, 2019 11:27

Removed premature exit in compute_grad_weight()

d9160f7

All temporary arrays now uses the THC allocator

3752fb0

For numerical stability, the dtype of grad_weight_per_segment is no…

e03de30

…w `acc_type`

pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels Jul 1, 2019

madsbk mentioned this pull request Jul 1, 2019

Back out "[pytorch][PR] Optimization of the Embedding and Embedding-Bag CUDA Kernel" #22377

Closed

facebook-github-bot reviewed Jul 1, 2019

View reviewed changes

madsbk mentioned this pull request Jul 1, 2019

Optimization of the Embedding and Embedding-Bag CUDA Kernel #22016

Closed

ngimel requested changes Jul 1, 2019

View reviewed changes

removed all of the device_vector

fb01303

ngimel approved these changes Jul 1, 2019

View reviewed changes

facebook-github-bot reviewed Jul 1, 2019

View reviewed changes

ezyang added the open source label Jul 1, 2019

mrshenli reviewed Jul 2, 2019

View reviewed changes

aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu Outdated Show resolved Hide resolved

using at::kHalf and at::kFloat

15121f4

facebook-github-bot reviewed Jul 2, 2019

View reviewed changes

facebook-github-bot closed this in c9a8413 Jul 2, 2019

facebook-github-bot added the merged label Jul 2, 2019

madsbk deleted the embedding_bug branch July 4, 2019 07:55

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Numerical stability of embedding kernels #22401

Numerical stability of embedding kernels #22401

Uh oh!

madsbk commented Jul 1, 2019 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

ngimel left a comment

Uh oh!

madsbk commented Jul 1, 2019 •

edited

Loading

Uh oh!

ngimel commented Jul 1, 2019

Uh oh!

madsbk commented Jul 1, 2019

Uh oh!

ngimel left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Jul 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Numerical stability of embedding kernels #22401

Numerical stability of embedding kernels #22401

Uh oh!

Conversation

madsbk commented Jul 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

madsbk commented Jul 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Jul 1, 2019

Uh oh!

madsbk commented Jul 1, 2019

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

madsbk commented Jul 1, 2019 •

edited

Loading

madsbk commented Jul 1, 2019 •

edited

Loading