Skip to content

Conversation

@jeffdaily
Copy link
Collaborator

@jeffdaily jeffdaily commented Aug 20, 2020

#22990 added a multiprocessing_context argument to DataLoader, but a typo in the test causes the wrong DataLoader class to be used.

@jeffdaily jeffdaily requested a review from ssnl August 20, 2020 17:42
@jeffdaily jeffdaily added module: dataloader Related to torch.utils.data.DataLoader and Sampler module: multiprocessing Related to torch.multiprocessing labels Aug 20, 2020
@jeffdaily
Copy link
Collaborator Author

Coincidentally, this test occasionally causes ROCm CI to hang. Investigating the hang revealed the typo in the test. We need to watch the CI results for all platforms to verify if the corrected test is functioning correctly.

@dr-ci
Copy link

dr-ci bot commented Aug 20, 2020

💊 CI failures summary and remediations

As of commit ff35305 (more details on the Dr. CI page):



🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test (1/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 24 19:15:05 bash: line 6: ./workspace/env: No such file or directory
Aug 24 19:15:05 + export BUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test 
Aug 24 19:15:05 + BUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test 
Aug 24 19:15:05 + export 'SCRIBE_GRAPHQL_ACCESS_TOKEN=********************************************' 
Aug 24 19:15:05 + SCRIBE_GRAPHQL_ACCESS_TOKEN='********************************************' 
Aug 24 19:15:05 + source ./workspace/env 
Aug 24 19:15:05 bash: line 6: ./workspace/env: No such file or directory 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_profiling_test (2/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 24 19:14:14 bash: line 6: ./workspace/env: No such file or directory
Aug 24 19:14:14 + export BUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-ge_config_profiling-test 
Aug 24 19:14:14 + BUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-ge_config_profiling-test 
Aug 24 19:14:14 + export 'SCRIBE_GRAPHQL_ACCESS_TOKEN=********************************************' 
Aug 24 19:14:14 + SCRIBE_GRAPHQL_ACCESS_TOKEN='********************************************' 
Aug 24 19:14:14 + source ./workspace/env 
Aug 24 19:14:14 bash: line 6: ./workspace/env: No such file or directory 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (3/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 24 19:14:21 bash: line 6: ./workspace/env: No such file or directory
Aug 24 19:14:21 + export BUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-test 
Aug 24 19:14:21 + BUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-test 
Aug 24 19:14:21 + export 'SCRIBE_GRAPHQL_ACCESS_TOKEN=********************************************' 
Aug 24 19:14:21 + SCRIBE_GRAPHQL_ACCESS_TOKEN='********************************************' 
Aug 24 19:14:21 + source ./workspace/env 
Aug 24 19:14:21 bash: line 6: ./workspace/env: No such file or directory 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (4/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 24 19:13:43 bash: line 6: ./workspace/env: No such file or directory
Aug 24 19:13:43 + export BUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test 
Aug 24 19:13:43 + BUILD_ENVIRONMENT=pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test 
Aug 24 19:13:43 + export 'SCRIBE_GRAPHQL_ACCESS_TOKEN=********************************************' 
Aug 24 19:13:43 + SCRIBE_GRAPHQL_ACCESS_TOKEN='********************************************' 
Aug 24 19:13:43 + source ./workspace/env 
Aug 24 19:13:43 bash: line 6: ./workspace/env: No such file or directory 

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (5/6)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Aug 24 19:00:20 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\'\n return ((int*)(&strtod_l))[argc];\n ^\n1 error generated.\n" }
Aug 24 19:00:19  
Aug 24 19:00:19 ++ which bazels3cache 
Aug 24 19:00:19 + BAZELS3CACHE=/usr/local/bin/bazels3cache 
Aug 24 19:00:19 + '[' -z /usr/local/bin/bazels3cache ']' 
Aug 24 19:00:19 + bazels3cache --bucket= --maxEntrySizeBytes=0 
Aug 24 19:00:20 bazels3cache: S3 bucket is required, e.g. 'bazels3cache --bucket=<bucketname>' 
Aug 24 19:00:20 + cleanup 
Aug 24 19:00:20 + retcode=1 
Aug 24 19:00:20 + set +x 
Aug 24 19:00:20 =================== sccache compilation log =================== 
Aug 24 19:00:20 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/workspace/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:8:19: error: use of undeclared identifier \'strtod_l\'\n  return ((int*)(&strtod_l))[argc];\n                  ^\n1 error generated.\n" } 
Aug 24 19:00:20  
Aug 24 19:00:20 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Aug 24 19:00:20 Compile requests              6097 
Aug 24 19:00:20 Compile requests executed     3607 
Aug 24 19:00:20 Cache hits                      53 
Aug 24 19:00:20 Cache misses                  3538 
Aug 24 19:00:20 Cache timeouts                   0 
Aug 24 19:00:20 Cache read errors                0 
Aug 24 19:00:20 Forced recaches                  0 
Aug 24 19:00:20 Cache write errors               0 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (6/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 24 18:50:06 NameError: name 'TEST_WITH_ROCM' is not defined
Aug 24 18:50:06   test_len (__main__.TestTensorDataset) ... ok (0.001s) 
Aug 24 18:50:06   test_many_tensors (__main__.TestTensorDataset) ... ok (0.005s) 
Aug 24 18:50:06   test_single_tensor (__main__.TestTensorDataset) ... ok (0.002s) 
Aug 24 18:50:06  
Aug 24 18:50:06 ====================================================================== 
Aug 24 18:50:06 ERROR [0.163s]: test_multiprocessing_contexts (__main__.TestDataLoader) 
Aug 24 18:50:06 ---------------------------------------------------------------------- 
Aug 24 18:50:06 Traceback (most recent call last): 
Aug 24 18:50:06   File "test_dataloader.py", line 1212, in test_multiprocessing_contexts 
Aug 24 18:50:06     if ctx in ['spawn', 'forkserver'] and TEST_CUDA and not IS_WINDOWS and not TEST_WITH_ROCM: 
Aug 24 18:50:06 NameError: name 'TEST_WITH_ROCM' is not defined 
Aug 24 18:50:06  
Aug 24 18:50:06 ---------------------------------------------------------------------- 
Aug 24 18:50:06 Ran 76 tests in 164.769s 
Aug 24 18:50:06  
Aug 24 18:50:06 FAILED (errors=1, skipped=1) 
Aug 24 18:50:06  
Aug 24 18:50:06 Generating XML reports... 
Aug 24 18:50:06 Generated XML report: test-reports/python-unittest/TEST-TestConcatDataset-20200824184722.xml 
Aug 24 18:50:06 Generated XML report: test-reports/python-unittest/TEST-TestCustomPinFn-20200824184722.xml 
Aug 24 18:50:06 Generated XML report: test-reports/python-unittest/TEST-TestDataLoader-20200824184722.xml 

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 21 times.

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo fix looks good, but is the CI failing related?

@jeffdaily
Copy link
Collaborator Author

We need to wait for ROCm CI to finish. Current 2 failing tests are not related to this change.

@jeffdaily
Copy link
Collaborator Author

@ssnl looks like ROCm doesn't fully support IPC now that the test typo is fixed. We'll need to skip on ROCm for now.

@jeffdaily
Copy link
Collaborator Author

@pytorchbot retest this please

@jeffdaily
Copy link
Collaborator Author

After last commit, ROCm CI was assigned a broken host. Retesting.

@jeffdaily
Copy link
Collaborator Author

Fixed lint line length error.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in 6a2d7a0.

facebook-github-bot pushed a commit that referenced this pull request Aug 26, 2020
…43588)

Summary:
2nd attempt to land #43343

Pull Request resolved: #43588

Reviewed By: seemethere

Differential Revision: D23332284

Pulled By: malfet

fbshipit-source-id: d78faf468c56af2f176dbdd2ce4bd51f0b5df6fd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: dataloader Related to torch.utils.data.DataLoader and Sampler module: multiprocessing Related to torch.multiprocessing open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants