Skip to content

Conversation

@t-vi
Copy link
Collaborator

@t-vi t-vi commented Jul 24, 2019

Fixes: #18215 at last!

Also sprinkle tests...

@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: cuda Related to torch.cuda, and CUDA support in general module: nn Related to torch.nn module: operators labels Jul 24, 2019
@ifedan ifedan requested a review from gchanan July 24, 2019 20:47
@ifedan ifedan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 24, 2019
__device__ static inline int64_t get_target_prime(const target_t* __restrict__ target, int64_t offset, int64_t stride, int64_t idx, int64_t BLANK) {
if (idx % 2 == 0) {
template <typename target_t>
__device__ static inline int64_t get_target_prime(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is me being ignorant about how CTCLoss works :) For my education, what exactly does get_target_prime do? E.g., what is it called in the paper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've adjusted this function to handle target_length == 0 but in many of the call sites, there is already a condition that implies that if you get to this function, target length is nonzero. Does this hold in all of the call sites? I guess I'll go check now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in backwards, there are a few cases when you will get here when target_length == 0.

Copy link
Collaborator Author

@t-vi t-vi Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the comment above the function // this ad-hoc converts from targets (l in [1]) to augmented targets (l' in [1]) note that no bound-checking is done isn't all that great - maybe amending it with when l is the targets, l' is BLANK l_0 BLANK l_1 ... l_targetlen BLANK helps?

I'll do a bit more analysis if we need the target length condition here. It might well be that it is not called except with idx 0, which would be equally well...

Edit: Turns out that works well.

@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019

Something that would make me feel more confident about this, is specifically running all of the tests under cuda-memcheck and showing there aren't any memory access problems. Is this something you can do easily?

@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019

I know this is super goofy and will never happen in practice, but what happens if max_target_length == 0?

target_length,
BLANK);
have_three =
((s < 2 * target_length - 1) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always false when target_length == 0

Copy link
Collaborator Author

@t-vi t-vi Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Edited) I actually changed the condition here to include target_length > 0 in the outer if, this removes the need to check target length in the get_target_prime.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a CTCLoss algorithmic expert, but I did do a reasonable amount of auditing of target_length use sites and all of the adjustments look reasonable.

@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019

Test failure is real

Jul 26 00:02:24 ======================================================================
Jul 26 00:02:24 FAIL: test_CTCLoss_empty_target_cuda (__main__.TestNN)
Jul 26 00:02:24 ----------------------------------------------------------------------
Jul 26 00:02:24 Traceback (most recent call last):
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 456, in wrapper
Jul 26 00:02:24     method(*args, **kwargs)
Jul 26 00:02:24   File "test_nn.py", line 5559, in test_CTCLoss_empty_target_cuda
Jul 26 00:02:24     self._test_CTCLoss_empty_target('cuda')
Jul 26 00:02:24   File "test_nn.py", line 5552, in _test_CTCLoss_empty_target
Jul 26 00:02:24     self.assertAlmostEqual(-log_probs.sum(0)[[0, 2], 0], loss[[0, 2]], delta=3e-5)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 643, in assertAlmostEqual
Jul 26 00:02:24     self.assertEqual(x, y, prec, msg, allow_inf)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 610, in assertEqual
Jul 26 00:02:24     assertTensorsEqual(x, y)
Jul 26 00:02:24   File "/var/lib/jenkins/workspace/test/common_utils.py", line 596, in assertTensorsEqual
Jul 26 00:02:24     self.assertLessEqual(max_err, prec, message)
Jul 26 00:02:24 AssertionError: tensor(3.0518e-05, device='cuda:0', dtype=torch.float32) not less than or equal to 3e-05
Jul 26 00:02:24 

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests need to pass

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 26, 2019

Thank you for the thorough review!

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 26, 2019

I know this is super goofy and will never happen in practice, but what happens if max_target_length == 0?

Then you'll be glad to hear we do test that in test_autograd.py :) The grid setup changes were needed for these cases.

I'll amend the PR for the other comments, thank you!

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 26, 2019

So at least the most basic invocation of cuda memcheck seems to not detect any failures in the tests:

$ PYTHONPATH=./build/lib.linux-x86_64-3.7/ cuda-memcheck python3  test/test_nn.py TestNN.test_CTCLoss_empty_target_{cuda,cpu}
========= CUDA-MEMCHECK
/usr/lib/python3/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
..
----------------------------------------------------------------------
Ran 2 tests in 0.180s

OK
========= ERROR SUMMARY: 0 errors
$ PYTHONPATH=build/lib.linux-x86_64-3.7/ cuda-memcheck python3 test/test_autograd.py TestAutograd.test_ctc_loss
========= CUDA-MEMCHECK
	/usr/lib/python3/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
.
----------------------------------------------------------------------
Ran 1 test in 6.869s

OK
========= ERROR SUMMARY: 0 errors

Regarding the tolerance in the failing tests: I previously had this at 3e-5 relative tolerance. Apparently that is not good enough, so I increased to 1e-4. The loss is ~1.5e2, so the relative tol tolerance is ~6e-7. (I added a comment to the test.) The obvious alternative would be to run the check with double precision.

@ezyang
Copy link
Contributor

ezyang commented Jul 26, 2019 via email

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 27, 2019

So I think the remaining failures are spurious.

@t-vi
Copy link
Collaborator Author

t-vi commented Jul 30, 2019

@ezyang: anything I can do to move this forward?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 31, 2019
Summary:
Fixes: pytorch/pytorch#18215 at last!

Also sprinkle tests...
Pull Request resolved: pytorch/pytorch#23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
@facebook-github-bot
Copy link
Contributor

@soumith merged this pull request in 2e40857.

ssnl pushed a commit to ssnl/pytorch that referenced this pull request Aug 2, 2019
Summary:
Fixes: pytorch#18215 at last!

Also sprinkle tests...
Pull Request resolved: pytorch#23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
soumith pushed a commit that referenced this pull request Aug 2, 2019
Summary:
Fixes: #18215 at last!

Also sprinkle tests...
Pull Request resolved: #23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: autograd Related to torch.autograd, and the autograd engine in general module: cuda Related to torch.cuda, and CUDA support in general module: nn Related to torch.nn open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CTCLoss with empty target doesn't work well

8 participants