Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Dec 24, 2022

Stack from ghstack (oldest at bottom):

See also conda/conda#10431

Signed-off-by: Edward Z. Yang [email protected]

See also conda/conda#10431

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
@ezyang ezyang requested a review from jeffdaily as a code owner December 24, 2022 07:09
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Dec 24, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 24, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91371

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Failures

As of commit 8a925e0:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Dec 24, 2022
See also conda/conda#10431

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 38cc3d1
Pull Request resolved: #91371
@ezyang ezyang added ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request labels Dec 24, 2022
@ezyang ezyang requested a review from malfet December 24, 2022 07:14
@ezyang
Copy link
Contributor Author

ezyang commented Dec 26, 2022

@pytorchbot merge -f "the rest of the problems look like preexisting conditions"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kit1980
Copy link
Contributor

kit1980 commented Dec 27, 2022

trunk / cuda11.6-py3.10-gcc7-sm86 / test (slow, 1, 2, linux.g5.4xlarge.nvidia.gpu) started to fail after this PR with

Run documentation examples through mypy. ... FAIL (7.171s)
  test_doc_examples (__main__.TestTypeHints)
Run documentation examples through mypy. ...     test_doc_examples failed - num_retries_left: 3
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/test_type_hints.py", line 134, in test_doc_examples
    self.fail(f"mypy failed:\n{stderr}\n{stdout}")
AssertionError: mypy failed:
torch/distributed/fsdp/fully_sharded_data_parallel.py:870:5: error: Parenthesized context managers are only supported in Python 3.9 and greater  [syntax]
Found 1 error in 1 file (errors prevented further checking)

https://github.com/pytorch/pytorch/actions/runs/3778643918/jobs/6423437935

But I don't see how this PR can be possibly related...

@kit1980
Copy link
Contributor

kit1980 commented Dec 27, 2022

@pytorchbot revert -m "trunk / cuda11.6-py3.10-gcc7-sm86 / test (slow, 1, 2, linux.g5.4xlarge.nvidia.gpu) started to fail after this PR with mypy error" -c ignoredsignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@ezyang your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Dec 27, 2022
…)"

This reverts commit 57dcd93.

Reverted #91371 on behalf of https://github.com/kit1980 due to trunk / cuda11.6-py3.10-gcc7-sm86 / test (slow, 1, 2, linux.g5.4xlarge.nvidia.gpu) started to fail after this PR with mypy error
@kit1980
Copy link
Contributor

kit1980 commented Dec 27, 2022

I still don't understand what's going on, but with the revert of this PR "trunk / cuda11.6-py3.10-gcc7-sm86 / test (slow, 1, 2, linux.g5.4xlarge.nvidia.gpu)" passed.
Maybe this PR makes it so a different Conda environment used compared to what's expected?

@kit1980
Copy link
Contributor

kit1980 commented Dec 28, 2022

Let's try again after #91410

@kit1980
Copy link
Contributor

kit1980 commented Dec 28, 2022

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

See also conda/conda#10431

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/ezyang/1678/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/91371)

pytorchmergebot pushed a commit that referenced this pull request Dec 28, 2022
See also conda/conda#10431

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 9a39ea1
Pull Request resolved: #91371
@kit1980
Copy link
Contributor

kit1980 commented Dec 28, 2022

@pytorchbot merge -f "Fix docker builds"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

jataylo added a commit to ROCm/builder that referenced this pull request Jan 18, 2023
We require the same fix that was made on upstream pytorch
pytorch/pytorch#91371
ROCm/pytorch@b72ec7c

Without this change install_conda.sh stage fails
```
#21 6.254 CondaFileIOError: '/opt/conda/pkgs/envs/*/env.txt'. [Errno 2] No such file or directory: '/opt/conda/pkgs/envs/*/env.txt'
#21 6.254 
#21 ERROR: executor failed running [/bin/sh -c bash ./install_conda.sh && rm install_conda.sh]: exit code: 1
------
 > [conda 2/3] RUN bash ./install_conda.sh && rm install_conda.sh:
------
executor failed running [/bin/sh -c bash ./install_conda.sh && rm install_conda.sh]: exit code: 1
```

Locally tested with the `/builder/libtorch/build_docker.sh`
jithunnair-amd pushed a commit to ROCm/builder that referenced this pull request Jan 24, 2023
We require the same fix that was made on upstream pytorch
pytorch/pytorch#91371
ROCm/pytorch@b72ec7c

Without this change install_conda.sh stage fails
```
#21 6.254 CondaFileIOError: '/opt/conda/pkgs/envs/*/env.txt'. [Errno 2] No such file or directory: '/opt/conda/pkgs/envs/*/env.txt'
#21 6.254 
#21 ERROR: executor failed running [/bin/sh -c bash ./install_conda.sh && rm install_conda.sh]: exit code: 1
------
 > [conda 2/3] RUN bash ./install_conda.sh && rm install_conda.sh:
------
executor failed running [/bin/sh -c bash ./install_conda.sh && rm install_conda.sh]: exit code: 1
```

Locally tested with the `/builder/libtorch/build_docker.sh`
pytorchmergebot pushed a commit that referenced this pull request Jan 24, 2023
The issue was first solved in [/pull/91371] for CI/CD, but the main Dockerfile in the repo root still has this issue for people trying to test build custom image manually.
Without it the build fails at installing miniconda
```
#14 3.802 Preparing transaction: ...working... done
#14 4.087 Executing transaction: ...working... done
#14 5.713 /root/miniconda.sh: 438: /root/miniconda.sh: [[: not found
#14 5.713
#14 5.713 Installing * environment...
#14 5.713
#14 5.714 /root/miniconda.sh: 444: /root/miniconda.sh: [[: not found
#14 6.050
#14 6.050 CondaFileIOError: '/opt/conda/pkgs/envs/*/env.txt'. [Errno 2] No such
file or directory: '/opt/conda/pkgs/envs/*/env.txt'
#14 6.050
```

With the modification, locally tested build successfully with `make -f ./docker.Makefile` as instructed in the README

Pull Request resolved: #92702
Approved by: https://github.com/seemethere, https://github.com/malfet
@facebook-github-bot facebook-github-bot deleted the gh/ezyang/1678/head branch June 8, 2023 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants