Skip to content

Conversation

@Kaixhin
Copy link
Contributor

@Kaixhin Kaixhin commented Nov 9, 2018

Fixes #12259, needs to make sure tests (see #13766) don't break due to numerical precision issues. Not sure what would need to be adjusted here...

@yinghai yinghai requested a review from bddppq November 9, 2018 18:40
ezyang
ezyang previously approved these changes Nov 15, 2018
@ezyang
Copy link
Contributor

ezyang commented Nov 15, 2018

But it seems the tests are still failing:

Nov 09 17:50:06 
Nov 09 17:50:06 =================================== FAILURES ===================================
Nov 09 17:50:06 ___________________________ TestModels.test_densenet ___________________________
Nov 09 17:50:06 
Nov 09 17:50:06 self = <test_models.TestModels testMethod=test_densenet>
Nov 09 17:50:06 
Nov 09 17:50:06     def test_densenet(self):
Nov 09 17:50:06         # Densenet-121 model
Nov 09 17:50:06         x = Variable(torch.randn(BATCH_SIZE, 3, 224, 224).fill_(1.0))
Nov 09 17:50:06 >       self.exportTest(toC(densenet121()), toC(x))
Nov 09 17:50:06 
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/test_models.py:149: 
Nov 09 17:50:06 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/test_models.py:51: in exportTest
Nov 09 17:50:06     verify(model, inputs, backend, rtol=rtol, atol=atol)
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:445: in verify
Nov 09 17:50:06     run(randomize_args(args))
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:425: in run
Nov 09 17:50:06     run_helper(torch_out, args)
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:439: in run_helper
Nov 09 17:50:06     errs.checkAlmostEqual(x.data.cpu().numpy(), y, "In output {}".format(i))
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:60: in checkAlmostEqual
Nov 09 17:50:06     self.almostEqualAndThen(x, y, msg, self.addErr)
Nov 09 17:50:06 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
Nov 09 17:50:06 
Nov 09 17:50:06 self = <verify.Errors object at 0x7f2ae2da5c50>
Nov 09 17:50:06 x = array([[ 0.11339295,  0.52664566, -0.5282817 , ...,  0.60019535,
Nov 09 17:50:06         -0.70....5219744 , ...,  0.604505  ,
Nov 09 17:50:06         -0.7020387 ,  0.06566978]], dtype=float32)
Nov 09 17:50:06 y = array([[ 0.11339286,  0.5266453 , -0.5282816 , ...,  0.6001954 ,
Nov 09 17:50:06         -0.70....5219741 , ...,  0.60450494,
Nov 09 17:50:06         -0.70203865,  0.06566991]], dtype=float32)
Nov 09 17:50:06 msg = 'In output 0'
Nov 09 17:50:06 k = <bound method Errors.addErr of <verify.Errors object at 0x7f2ae2da5c50>>
Nov 09 17:50:06 
Nov 09 17:50:06     def almostEqualAndThen(self, x, y, msg, k):
Nov 09 17:50:06         """
Nov 09 17:50:06             Helper for implementing 'requireAlmostEqual' and 'checkAlmostEqual'.
Nov 09 17:50:06             Upon failure, invokes continuation 'k' with the error message.
Nov 09 17:50:06     
Nov 09 17:50:06             At the moment, only tests on 'numpy.ndarray' are supported.
Nov 09 17:50:06             """
Nov 09 17:50:06         if isinstance(x, np.ndarray) and isinstance(y, np.ndarray):
Nov 09 17:50:06             try:
Nov 09 17:50:06 >               np.testing.assert_allclose(x, y, rtol=self.rtol, atol=self.atol, equal_nan=False, verbose=True)
Nov 09 17:50:06 E               AssertionError: 
Nov 09 17:50:06 E               Not equal to tolerance rtol=0.01, atol=1e-07
Nov 09 17:50:06 E               
Nov 09 17:50:06 E               (mismatch 0.05%)
Nov 09 17:50:06 E                x: array([ 0.113393,  0.526646, -0.528282, ...,  0.604505, -0.702039,
Nov 09 17:50:06 E                       0.06567 ], dtype=float32)
Nov 09 17:50:06 E                y: array([ 0.113393,  0.526645, -0.528282, ...,  0.604505, -0.702039,
Nov 09 17:50:06 E                       0.06567 ], dtype=float32)
Nov 09 17:50:06 
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:71: AssertionError
Nov 09 17:50:06 ______________________ TestCaffe2BackendEmbed.test_resnet ______________________
Nov 09 17:50:06 
Nov 09 17:50:06 self = <test_pytorch_onnx_caffe2.TestCaffe2BackendEmbed testMethod=test_resnet>
Nov 09 17:50:06 
Nov 09 17:50:06     def test_resnet(self):
Nov 09 17:50:06         state_dict = model_zoo.load_url(model_urls['resnet50'], progress=False)
Nov 09 17:50:06         self.run_model_test(resnet50(), train=False, batch_size=BATCH_SIZE,
Nov 09 17:50:06 >                           state_dict=state_dict, atol=1e-6)
Nov 09 17:50:06 
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/test_pytorch_onnx_caffe2.py:405: 
Nov 09 17:50:06 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/test_pytorch_onnx_caffe2.py:186: in run_model_test
Nov 09 17:50:06     example_outputs=example_outputs)
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/test_pytorch_onnx_caffe2.py:177: in run_actual_test
Nov 09 17:50:06     verify.verify(model, input, c2, rtol=rtol, atol=atol)
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:445: in verify
Nov 09 17:50:06     run(randomize_args(args))
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:425: in run
Nov 09 17:50:06     run_helper(torch_out, args)
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:439: in run_helper
Nov 09 17:50:06     errs.checkAlmostEqual(x.data.cpu().numpy(), y, "In output {}".format(i))
Nov 09 17:50:06 /var/lib/jenkins/workspace/test/onnx/verify.py:60: in checkAlmostEqual
Nov 09 17:50:06     self.almostEqualAndThen(x, y, msg, self.addErr)
Nov 09 17:50:06 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
Nov 09 17:50:06 
Nov 09 17:50:06 self = <verify.Errors object at 0x7f2a8e023ed0>
Nov 09 17:50:06 x = array([[-2.312549  , -0.6950716 , -0.26810262, ..., -1.5320209 ,
Nov 09 17:50:06         -1.64....25414407, ..., -1.3733028 ,
Nov 09 17:50:06         -1.3013324 ,  2.2900457 ]], dtype=float32)
Nov 09 17:50:06 y = array([[-2.3125494 , -0.69507074, -0.26810348, ..., -1.5320204 ,
Nov 09 17:50:06         -1.64....254143  , ..., -1.3733065 ,
Nov 09 17:50:06         -1.301334  ,  2.290045  ]], dtype=float32)
Nov 09 17:50:06 msg = 'In output 0'
Nov 09 17:50:06 k = <bound method Errors.addErr of <verify.Errors object at 0x7f2a8e023ed0>>
Nov 09 17:50:06 
Nov 09 17:50:06     def almostEqualAndThen(self, x, y, msg, k):
Nov 09 17:50:06         """
Nov 09 17:50:06             Helper for implementing 'requireAlmostEqual' and 'checkAlmostEqual'.
Nov 09 17:50:06             Upon failure, invokes continuation 'k' with the error message.
Nov 09 17:50:06     
Nov 09 17:50:06             At the moment, only tests on 'numpy.ndarray' are supported.
Nov 09 17:50:06             """
Nov 09 17:50:06         if isinstance(x, np.ndarray) and isinstance(y, np.ndarray):
Nov 09 17:50:06             try:
Nov 09 17:50:06 >               np.testing.assert_allclose(x, y, rtol=self.rtol, atol=self.atol, equal_nan=False, verbose=True)
Nov 09 17:50:06 E               AssertionError: 
Nov 09 17:50:06 E               Not equal to tolerance rtol=0.001, atol=1e-06
Nov 09 17:50:06 E               
Nov 09 17:50:06 E               (mismatch 0.05%)
Nov 09 17:50:06 E                x: array([-2.312549, -0.695072, -0.268103, ..., -1.373303, -1.301332,
Nov 09 17:50:06 E                       2.290046], dtype=float32)
Nov 09 17:50:06 E                y: array([-2.312549, -0.695071, -0.268103, ..., -1.373307, -1.301334,
Nov 09 17:50:06 E                       2.290045], dtype=float32)
Nov 09 17:50:06 

@Kaixhin
Copy link
Contributor Author

Kaixhin commented Nov 15, 2018

Yep it seems like numerical precision issues, but I'm not entirely sure where these are creeping in/what the solution to the tests should be. Was the tolerance set too high previously, or is there a genuine problem somewhere in the backend?

fmassa
fmassa previously approved these changes Nov 20, 2018
@ezyang
Copy link
Contributor

ezyang commented Dec 6, 2018

I don't know. Some investigation will be needed.

@zou3519 zou3519 added the awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it label Dec 11, 2018
@ezyang ezyang dismissed stale reviews from fmassa and themself June 6, 2019 15:21

numerical precision problems

@ezyang ezyang removed the awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it label Jun 6, 2019
@ezyang
Copy link
Contributor

ezyang commented Jun 6, 2019

Well, it looks like densenet was disabled on master, so we might be able to land this :>

@pytorchbot pytorchbot added module: nn Related to torch.nn module: onnx Related to torch.onnx labels Jun 6, 2019
Signed-off-by: Edward Z. Yang <[email protected]>
@ezyang
Copy link
Contributor

ezyang commented Jun 7, 2019

@pytorchbot retest this please

@ezyang ezyang changed the title Fix batch norm multiplier init [BC-BREAKING] Fix batch norm multiplier init Jun 7, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in c604658.

@gchanan gchanan added the module: bc-breaking Related to a BC-breaking change label Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: bc-breaking Related to a BC-breaking change module: nn Related to torch.nn module: onnx Related to torch.onnx open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set Batchnorm weight scalar initialization to unit (not random uniform)

7 participants