Restore full Windows tests #17102

kostmo · 2019-02-14T06:38:59Z

.jenkins/pytorch/win-test.sh

pjh5 · 2019-02-21T23:29:04Z

EDIT: I don't think JOB_BASE_NAME is being populated anywhere, but now that leads the test to exist b/c we predicate on -z $JOB_BASE_NAME.

I think this can by changing https://github.com/pytorch/ossci-job-dsl/blob/master/src/jobs/pytorch.groovy#L1070 to "${buildEnvironment}-test${suffix}"

kostmo · 2019-02-22T00:18:07Z

I think that ${JOB_BASE_NAME} must currently still be nonempty. Take a look at the console output of the two "Test and Push" jobs from this PR's build.

In the console log of the pytorch-win-ws2016-cuda9-cudnn7-py3-test1 job, one can find:

22:50:00 + run_tests
22:50:00 + '[' -z pytorch-win-ws2016-cuda9-cudnn7-py3-test1 ']'
22:50:00 + [[ pytorch-win-ws2016-cuda9-cudnn7-py3-test1 == *-test ]]
22:50:00 + [[ pytorch-win-ws2016-cuda9-cudnn7-py3-test1 == *-test1 ]]

Similarly, in the console log of the pytorch-win-ws2016-cuda9-cudnn7-py3-test2 job, one can find:

22:56:45 + [[ pytorch-win-ws2016-cuda9-cudnn7-py3-test2 == *-test2 ]]

pjh5 · 2019-02-22T00:32:02Z

Can you tell where JOB_BASE_NAME is being populated from? It shouldn't exist anymore. All info should be encoded into BUILD_ENVIRONMENT.

peterjc123 · 2019-02-23T04:25:28Z

@kostmo Would you please skip DataLoaderTest.ChunkDataSetGetBatch to see if it fixes the timeout issue? It seems that there is a deadlock under this test.

peterjc123 · 2019-02-27T05:22:24Z

Any updates on this PR? I believe we should fix this ASAP.

Skip DataLoaderTest.ChunkDataSetGetBatch due to possible deadlock Closes #17101

ezyang · 2019-02-27T18:32:31Z

test/cpp/api/dataloader.cpp

  ASSERT_TRUE(batch->target[0].allclose(torch::zeros(kBatchSize - 1)));
 }

+/*


There should be a comment saying why this is commented out, and an issue tracking when we can uncomment it again.

But IIUC, you only wanted to disable this on Windows, right? Then it should be macro'ed.

Are there pre-existing macros for this, or did you just mean to surround with #ifdef?

ezyang

with comments

kostmo · 2019-02-27T23:25:25Z

Looks like disabling that test was not enough to fix it:

18:29:39 test_single_tensor (__main__.TestTensorDataset) ... ok
18:29:39 
18:29:39 ======================================================================
18:29:39 FAIL: test_proper_exit (__main__.TestDataLoader)
18:29:39 There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore
18:29:39 ----------------------------------------------------------------------
18:29:39 Traceback (most recent call last):
18:29:39   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\test\common_utils.py", line 120, in wrapper
18:29:39     fn(*args, **kwargs)
18:29:39   File "test_dataloader.py", line 757, in test_proper_exit
18:29:39     self.fail(fail_msg + ', and had exception {}'.format(loader_p.exception))
18:29:39 AssertionError: test_proper_exit with use_workers=True, pin_memory=True, hold_iter_reference=True, exit_method=None: loader process failed to setup within given time, and had exception Traceback (most recent call last):
18:29:39   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\test\test_dataloader.py", line 175, in run
18:29:39     super(ErrorTrackingProcess, self).run()
18:29:39   File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 93, in run
18:29:39     self._target(*self._args, **self._kwargs)
18:29:39   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\test\test_dataloader.py", line 354, in _test_proper_exit
18:29:39     for i, _ in enumerate(it):
18:29:39   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\utils\data\dataloader.py", line 545, in __next__
18:29:39     idx, batch = self._get_batch()
18:29:39   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\utils\data\dataloader.py", line 517, in _get_batch
18:29:39     raise RuntimeError('Pin memory thread exited unexpectedly')
18:29:39 RuntimeError: Pin memory thread exited unexpectedly
18:29:39 
18:29:39 
18:29:39 ----------------------------------------------------------------------
18:29:39 Ran 51 tests in 73.087s
18:29:39 
18:29:39 FAILED (failures=1, skipped=2)
18:29:39 Traceback (most recent call last):
18:29:39   File "run_test.py", line 458, in <module>
18:29:39     main()
18:29:39   File "run_test.py", line 450, in main
18:29:39     raise RuntimeError(message)
18:29:39 RuntimeError: test_dataloader failed!

peterjc123 · 2019-02-28T02:15:56Z

@pytorchbot retest this please

peterjc123 · 2019-02-28T05:38:29Z

@kostmo Looks like test_proper_exit is only a flaky test. It gets stuck again at DataLoaderTest.ChunkDataSetWithEmptyBatch now. I think that the implementation of ChunkDataSet
doesn't work well and causes deadlock on Windows. Could you please do #ifdef on all the tests for ChunkDataSet?

peterjc123 · 2019-03-01T06:27:50Z

Opened a new issue for the deadlock: #17609.

peterjc123 · 2019-03-05T02:24:17Z

@pytorchbot rebase this please

peterjc123 · 2019-03-12T11:23:51Z

@pytorchbot rebase this please

peterjc123 · 2019-03-12T12:32:43Z

@pytorchbot merge this please

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

* upstream/master: (87 commits) Make Variable::set_data non-const; cosmetic fixes. remove warning for upsample code (pytorch#17921) Optimize TileOp (pytorch#17290) Optimize channel_stats_op (pytorch#16243) enable shape inference for elementwise operators (pytorch#17885) Remove remaining test jit expects redux (pytorch#17924) Handle Scalars Better (pytorch#17875) Fixed a formatting issue in doc comments (pytorch#17505) Add nbytes, itemsize, element_size to at::Tensor. (pytorch#17810) Fix lint in test_distributions.py Fix lint in test_jit.py Fix lint errors in test_autograd Added a few extra python bindings to help with walking the IR graph from Python (pytorch#17822) kthvalue consistency with sort in the presence of NaN (pytorch#17824) Fix minor grammatical mistakes in torch/nn/modules/loss.py (pytorch#17892) Remove (almost all) TensorOptions from native_functions.yaml (pytorch#17385) Restore full Windows tests (pytorch#17102) Prevent VS2017 from emitting ambiguous symbol errors (second time) Fix windows test hang (pytorch#17778) torch.btrifact for tensors with greater than 3 dimensions (pytorch#14964) ...

peterjc123 reviewed Feb 16, 2019

View reviewed changes

.jenkins/pytorch/win-test.sh Outdated Show resolved Hide resolved

Restore full Windows tests

26a0a49

Skip DataLoaderTest.ChunkDataSetGetBatch due to possible deadlock Closes #17101

kostmo requested review from ebetica, goldsborough and yf225 as code owners February 27, 2019 17:43

ezyang reviewed Feb 27, 2019

View reviewed changes

ezyang approved these changes Feb 27, 2019

View reviewed changes

peterjc123 mentioned this pull request Mar 5, 2019

Deadlock when running dataloader tests for ChunkDataSet on Windows #17609

Closed

pytorchbot and others added 2 commits March 5, 2019 02:24

Merge remote-tracking branch 'origin/master' into HEAD

573af23

Revert comments for the dataloader test

cd36094

Merge remote-tracking branch 'origin/master' into HEAD

774df39

pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label Mar 12, 2019

facebook-github-bot reviewed Mar 12, 2019

View reviewed changes

facebook-github-bot closed this in 12d6725 Mar 12, 2019

ezyang added the merged label Jun 25, 2019

Restore full Windows tests #17102

Restore full Windows tests #17102

Uh oh!

Conversation

kostmo commented Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pjh5 commented Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kostmo commented Feb 22, 2019

Uh oh!

pjh5 commented Feb 22, 2019

Uh oh!

peterjc123 commented Feb 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterjc123 commented Feb 27, 2019

Uh oh!

ezyang Feb 27, 2019

Choose a reason for hiding this comment

Uh oh!

kostmo Feb 27, 2019

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

kostmo commented Feb 27, 2019

Uh oh!

peterjc123 commented Feb 28, 2019

Uh oh!

peterjc123 commented Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterjc123 commented Mar 1, 2019

Uh oh!

peterjc123 commented Mar 5, 2019

Uh oh!

peterjc123 commented Mar 12, 2019

Uh oh!

peterjc123 commented Mar 12, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kostmo commented Feb 14, 2019 •

edited

Loading

pjh5 commented Feb 21, 2019 •

edited

Loading

peterjc123 commented Feb 23, 2019 •

edited

Loading

peterjc123 commented Feb 28, 2019 •

edited

Loading