Skip to content

Conversation

@srossross
Copy link
Collaborator

@srossross srossross commented Sep 24, 2022

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 24, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85583

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Failures

As of commit 0b103b0:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

srossross added a commit that referenced this pull request Sep 24, 2022
@srossross srossross changed the title WPI: changing as_strided_scatter to deterministic inputs WIP: changing as_strided_scatter to deterministic inputs Sep 24, 2022
srossross added a commit that referenced this pull request Sep 26, 2022
srossross added a commit that referenced this pull request Sep 27, 2022
@srossross
Copy link
Collaborator Author

@lezcano @amjames

Usually, in OpInfos, we take care of not generating these non-deterministic inputs, because these functions are not differentiable on these inputs, and gradtests would fail on them. As such my question is, are all the functions that are marked as skips coming from non-determinism non-deterministic for all inputs, or is there any of them that's just non-deterministic for some inputs. If it's the latter, it may be the case that the function has some deeper issue as the gradtests would be failing for them.

When I update the inputs to be deterministic I would have expected the test_fn_grad function to be passing, is that a reasonable assumption?

I still get this error with the input: torch.as_strided_scatter(make_tensor([3,3]), make_tensor([2, 2]), [2, 2], [2, 1])

__________________________________________________________ TestGradientsCPU.test_fn_grad_as_strided_scatter_cpu_float64 __________________________________________________________
Traceback (most recent call last):
  File "/home/srossross/pytorch/test/test_ops_gradients.py", line 156, in test_fn_grad
    self._grad_test_helper(device, dtype, op, op.get_op())
  File "/home/srossross/pytorch/test/test_ops_gradients.py", line 139, in _grad_test_helper
    return self._check_helper(device, dtype, op, variant, 'gradcheck', check_forward_ad=check_forward_ad,
  File "/home/srossross/pytorch/test/test_ops_gradients.py", line 108, in _check_helper
    self.assertTrue(gradcheck(fn, gradcheck_args,
  File "/home/srossross/pytorch/torch/testing/_internal/common_utils.py", line 3309, in gradcheck
    return torch.autograd.gradcheck(fn, inputs, **kwargs)
  File "/home/srossross/pytorch/torch/autograd/gradcheck.py", line 1418, in gradcheck
    return _gradcheck_helper(**args)
  File "/home/srossross/pytorch/torch/autograd/gradcheck.py", line 1432, in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
  File "/home/srossross/pytorch/torch/autograd/gradcheck.py", line 1075, in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
  File "/home/srossross/pytorch/torch/autograd/gradcheck.py", line 1309, in _fast_gradcheck
    analytical_vJu = _get_analytical_vJu_backward_mode(inputs, outputs, nondet_tol, check_grad_dtypes, all_v, all_u_dense)
  File "/home/srossross/pytorch/torch/autograd/gradcheck.py", line 560, in _get_analytical_vJu_backward_mode
    all_vJ = _check_analytical_jacobian_attributes(inputs, output, nondet_tol, check_grad_dtypes,
  File "/home/srossross/pytorch/torch/autograd/gradcheck.py", line 549, in _check_analytical_jacobian_attributes
    raise GradcheckError('Backward is not reentrant, i.e., running backward with '
torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient.The tolerance for nondeterminism was 0.0.

NOTE: If your op relies on non-deterministic operations i.e., it is listed here:
https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html
this failure might be expected.

If you are adding a new operator, please file an issue and then use one of the
workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
If the test
- manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
  with `nondet_tol=<tol>` as a keyword argument.
- is OpInfo-based (e.g., in test_ops_gradients.py), then modify the OpInfo for the test
  to have `gradcheck_nondet_tol=<tol>`.
- is a Module test (e.g., in common_nn.py), then modify the corresponding
  module_test entry to have `gradcheck_nondet_tol=<tol>`

I'm going to continue to investigating this

@lezcano
Copy link
Collaborator

lezcano commented Sep 27, 2022

Is the gradient of this function deterministic for all the inputs? (it should be if the function is differentiable)

srossross added a commit that referenced this pull request Sep 27, 2022
@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

srossross added a commit that referenced this pull request Oct 4, 2022
srossross added a commit that referenced this pull request Oct 5, 2022
srossross added a commit that referenced this pull request Oct 5, 2022
@srossross srossross changed the title WIP: changing as_strided_scatter to deterministic inputs Changing as_strided_scatter to deterministic inputs Nov 1, 2022
srossross added a commit that referenced this pull request Nov 1, 2022
@srossross srossross marked this pull request as ready for review November 1, 2022 18:44
def error_inputs_as_strided_scatter(op_info, device, **kwargs):
make_arg = partial(make_tensor, device=device, dtype=torch.float32, requires_grad=False)

input_t = make_arg([4, 4])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment describing what error condition is being tested

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

((16,), (2, 2, 2, 2), (1, 1, 1, 1), 0),
((16,), (2, 1, 1, 2), (1, 7, 7, 1), 0),
((3, 3), (2, 2), (2, 1), 0),
((16,), (2, 2, 2, 2), (8, 4, 2, 1), 0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the change here? Should the test cases have a comment describing the property they assume?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment above, the removed code was causing non-deterministic behaviour so I modified the strides

DecorateInfo(unittest.skip('Only fails for LAZY, passes on everything else'), 'TestCompositeCompliance', 'test_backward'), # noqa: B950
DecorateInfo(unittest.skip('Passes on complex64 and float32 only'), 'TestJit', 'test_variant_consistency_jit'),
DecorateInfo(unittest.skip('Fails on cuda + rocm'), 'TestCommon', 'test_complex_half_reference_testing'),
# https://github.com/pytorch/pytorch/issues/88105
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice pointer to the issue

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good @srossross but I left some suggestions for comments to help future developers understand what's happening

srossross added a commit that referenced this pull request Nov 2, 2022
@srossross srossross requested a review from mruberry November 2, 2022 17:40
@srossross srossross added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Nov 2, 2022
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool; cc @kurtamohler

@ngimel
Copy link
Collaborator

ngimel commented Nov 7, 2022

I believe with #88342 the backward errors should be fixed

@srossross
Copy link
Collaborator Author

I'm not sure, but could something still be wrong, because now the tests for test_fn_gradgrad have an unexpected success, but the test_fn_grad_as_strided_scatter_cpu_float64 still fails with:

  File "/home/srossross/pytorch/torch/autograd/gradcheck.py", line 1283, in _check_analytical_numerical_equal
    raise GradcheckError(_get_notallclose_msg(a, n, j, i, complex_indices, test_imag, is_forward_ad) + jacobians_str)
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor(0., device='cuda:0', dtype=torch.float64)
analytical:tensor(1., device='cuda:0', dtype=torch.float64)

The above quantities relating the numerical and analytical jacobians are computed 
in fast mode. See: https://github.com/pytorch/pytorch/issues/53876 for more background 
about fast mode. Below, we recompute numerical and analytical jacobians in slow mode:

Numerical:
 tensor([[0.]], device='cuda:0', dtype=torch.float64)
Analytical:
tensor([[1.]], device='cuda:0', dtype=torch.float64)

The max per-element difference (slow mode) is: 1.0.

@ngimel
Copy link
Collaborator

ngimel commented Nov 8, 2022

cc @bdhirsh @soulitzer

srossross added a commit that referenced this pull request Nov 8, 2022
@lezcano
Copy link
Collaborator

lezcano commented Nov 9, 2022

I think it's fine to merge this one as is. Then we can follow up with a different PR fixing that issue.
@bdhirsh, @soulitzer or me can have a look at it.

@lezcano
Copy link
Collaborator

lezcano commented Nov 9, 2022

@pytorchbot merge -f "unrelated failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR cla signed Merged open source release notes: python_frontend python frontend release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants