Skip to content

Conversation

@crusaderky
Copy link
Collaborator

Closes #7241

@crusaderky
Copy link
Collaborator Author

On a side note, I would really like to make all trivial dependencies of dask (meaning cloudpickle, fsspec, toolz, and partd) mandatory, so that dask.bag and dask.delayed are guaranteed to always work and don't need special use cases.

@crusaderky
Copy link
Collaborator Author

Correction - toolz is already an unlisted mandatory dependency - unless there's somebody out there that uses dask.multiprocessing or dask.threads with a custom collection, and that custom collection avoids using dask.base. Very very unlikely IMHO.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @crusaderky!

jobs:
mindeps:
runs-on: "ubuntu-latest"
if: "contains(github.event.head_commit.message, 'test-mindeps') || github.event_name != 'pull_request'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO comment that we should figure out how to make this work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any reason for the limitation. It will always be run in paralell with the other workflows (I just tested you can start up to 60 in parallel!) and it will always be faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if that's 60 per repository, or 60 per GitHub org?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits

Afraid it doesn't answer your specific question though. We could test it by kicking off a torture test in two different repos I guess.

@shoyer the above link also seems to answer your question from a few days back: we are currently capped at 5 MacOS workflows at any given time; it looks like upgrading to the "enterprise" plan would raise that limit to 50.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading it, it says "in your account"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpret that as meaning 60 concurrent builds for repos in the Dask GitHub org. Perhaps for now we should continue to only run these builds on master to not hog shared CI resources. We can always reevaluate later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we instead start disabling less-than-essential builds only after we start experiencing wait time? So far, it has never happened on Linux/Windows as far as I could see.
The thing is: if this workflow had been enabled at PR time, I would have never introduced the regression in the test suite to begin with. The build in master is rarely something developers will remember to check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you're worried about running out of slots, I'd much rather remove the on: push (which nobody looks at anyway) and replace it with a nightly build. Such a change would help a lot with the Mac slots too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can give this a try to see what happens. Do you know if the on: push builds apply to the Dask org account or the personal accounts for the user is who submitting a PR from (i.e. from their fork of dask)?

For reference, I think at one point @jacobtomlinson mentioned finding the on: push builds useful (though I don't recall where that comment was)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the on: push builds apply to the Dask org account or the personal accounts for the user is who submitting a PR from (i.e. from their fork of dask)?

No evidence, but I would be inclined to believe the former as it's the one that makes by far more sense commercially

# test dependencies
- pytest
- pytest-xdist
- moto>=1.3.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this minimum version pinning? I have the same question for other packages like fsspec

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min version pinning is important in setup.py and conda recipe. In a requirements.yml file, the only cases where it matters are

  1. when another package in the environment suddenly starts being pinned against a max version - not likely as these are all fairly old versions; if it ever happens unit tests will fail anyway
  2. when a lower-priority channel defines the version you need (a lower version of a higher priority channel wins) - we have only one channel, and anyway this is only a problem with very fresh releases

return iter(self._dict)

def __len__(self):
return int(np.prod(list(self._out_numblocks().values())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to drop NumPy as a dependency for this module, or for something else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It makes blockwise usable for dask.bag, which is something (given enough user interest) would be very beneficial e.g. to speed up db.Bag.map.


except ImportError:

# Python < 3.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this comment, I can see it helping us in the future when we drop support for Python 3.7

@@ -1 +0,0 @@
environment-3.8.yaml No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this file around as we use it in our user documentation (see the discussion in #6399)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"CI Additional – mindeps-bag-delayed" step failing

2 participants