-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
graph_manipulation without numpy #7243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
On a side note, I would really like to make all trivial dependencies of dask (meaning cloudpickle, fsspec, toolz, and partd) mandatory, so that dask.bag and dask.delayed are guaranteed to always work and don't need special use cases. |
|
Correction - toolz is already an unlisted mandatory dependency - unless there's somebody out there that uses dask.multiprocessing or dask.threads with a custom collection, and that custom collection avoids using dask.base. Very very unlikely IMHO. |
jrbourbeau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @crusaderky!
| jobs: | ||
| mindeps: | ||
| runs-on: "ubuntu-latest" | ||
| if: "contains(github.event.head_commit.message, 'test-mindeps') || github.event_name != 'pull_request'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a TODO comment that we should figure out how to make this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any reason for the limitation. It will always be run in paralell with the other workflows (I just tested you can start up to 60 in parallel!) and it will always be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if that's 60 per repository, or 60 per GitHub org?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afraid it doesn't answer your specific question though. We could test it by kicking off a torture test in two different repos I guess.
@shoyer the above link also seems to answer your question from a few days back: we are currently capped at 5 MacOS workflows at any given time; it looks like upgrading to the "enterprise" plan would raise that limit to 50.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading it, it says "in your account"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I interpret that as meaning 60 concurrent builds for repos in the Dask GitHub org. Perhaps for now we should continue to only run these builds on master to not hog shared CI resources. We can always reevaluate later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we instead start disabling less-than-essential builds only after we start experiencing wait time? So far, it has never happened on Linux/Windows as far as I could see.
The thing is: if this workflow had been enabled at PR time, I would have never introduced the regression in the test suite to begin with. The build in master is rarely something developers will remember to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you're worried about running out of slots, I'd much rather remove the on: push (which nobody looks at anyway) and replace it with a nightly build. Such a change would help a lot with the Mac slots too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can give this a try to see what happens. Do you know if the on: push builds apply to the Dask org account or the personal accounts for the user is who submitting a PR from (i.e. from their fork of dask)?
For reference, I think at one point @jacobtomlinson mentioned finding the on: push builds useful (though I don't recall where that comment was)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if the on: push builds apply to the Dask org account or the personal accounts for the user is who submitting a PR from (i.e. from their fork of dask)?
No evidence, but I would be inclined to believe the former as it's the one that makes by far more sense commercially
| # test dependencies | ||
| - pytest | ||
| - pytest-xdist | ||
| - moto>=1.3.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this minimum version pinning? I have the same question for other packages like fsspec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min version pinning is important in setup.py and conda recipe. In a requirements.yml file, the only cases where it matters are
- when another package in the environment suddenly starts being pinned against a max version - not likely as these are all fairly old versions; if it ever happens unit tests will fail anyway
- when a lower-priority channel defines the version you need (a lower version of a higher priority channel wins) - we have only one channel, and anyway this is only a problem with very fresh releases
| return iter(self._dict) | ||
|
|
||
| def __len__(self): | ||
| return int(np.prod(list(self._out_numblocks().values()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to drop NumPy as a dependency for this module, or for something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It makes blockwise usable for dask.bag, which is something (given enough user interest) would be very beneficial e.g. to speed up db.Bag.map.
|
|
||
| except ImportError: | ||
|
|
||
| # Python < 3.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this comment, I can see it helping us in the future when we drop support for Python 3.7
| @@ -1 +0,0 @@ | |||
| environment-3.8.yaml No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this file around as we use it in our user documentation (see the discussion in #6399)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
Closes #7241