Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Apr 9, 2025

Closes #11879

Relates to #11854 as well

This includes two changes

  1. It removes _blockwise_unpack_collections_task_spec which is a version of delayed.unpack_collections that was using the Task class already. This was introduced when migrating blockwise to keep the changes small since hybrid tasks can cause problems.
  2. It changes how collection finalization works when the collection is encountered as an input (e.g. to a delayed object but also to others).
  • It is converting it to a HLGExpr which automatically will perform the container specific optimization step upon materialization.
  • It also takes care that when multiple collections are encountered, they are optimized together.
  • We are ensuring that whatever result comes out of this optimization is unique and the results are not being reused. This is a common problem we've already encountered in dask-expr which can cause major memory blow up otherwise.

@fjetter fjetter requested a review from Copilot April 9, 2025 12:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

dask/delayed.py:109

  • The call to finalize_compute() may inadvertently trigger materialization of the collection, potentially impacting performance; consider deferring materialization until it is absolutely necessary.
finalized = expr.finalize_compute().optimize()

@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2025

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      9 files  ± 0        9 suites  ±0   3h 18m 20s ⏱️ + 3m 26s
 18 005 tests  -  4   16 791 ✅  -  4   1 214 💤 ±0  0 ❌ ±0 
161 119 runs   - 36  149 014 ✅  - 35  12 105 💤  - 1  0 ❌ ±0 

Results for commit c268995. ± Comparison against base commit 0fa5e18.

This pull request removes 8 and adds 4 tests. Note that renamed tests count towards both.
dask.array.tests.test_routines ‑ test_histogram_delayed_range[False-1-False-False]
dask.array.tests.test_routines ‑ test_histogram_delayed_range[False-1-False-True]
dask.array.tests.test_routines ‑ test_histogram_delayed_range[False-1-True-False]
dask.array.tests.test_routines ‑ test_histogram_delayed_range[False-1-True-True]
dask.array.tests.test_routines ‑ test_histogram_delayed_range[True-1-False-False]
dask.array.tests.test_routines ‑ test_histogram_delayed_range[True-1-False-True]
dask.array.tests.test_routines ‑ test_histogram_delayed_range[True-1-True-False]
dask.array.tests.test_routines ‑ test_histogram_delayed_range[True-1-True-True]
dask.array.tests.test_xarray ‑ test_xarray_blockwise_fusion_store[False]
dask.array.tests.test_xarray ‑ test_xarray_blockwise_fusion_store[True]
dask.tests.test_delayed ‑ test_array_delayed_complex_optimization
dask.tests.test_delayed ‑ test_array_delayed_complex_optimization_kwargs

♻️ This comment has been updated with latest results.

@fjetter fjetter requested a review from Copilot April 11, 2025 15:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

dask/delayed.py:640

  • [nitpick] The _expr slot is added in the slots list of Delayed but is never initialized in init. If this slot is intended for future use, consider adding a comment to explain its purpose or initialize it accordingly.
__slots__ = ("_key", "_dask", "_length", "_layer", "_expr")

@fjetter
Copy link
Member Author

fjetter commented Apr 11, 2025

Looks like this is finally converging. I pulled a thread and everything unravelled...

One thing that mildly concerns me is that I can now trigger a key collision of this type #9888 consistently with the test_svd. I do want to investigate this before we release since that could cause corrupt data, deadlocks, etc. Most notably, I don't understand where this is coming from and why my ProhibitReuse isn't fighting that off.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

dask/delayed.py:85

  • Consider adding documentation in the function docstring to explain the internal '_return_collections' parameter and its effect on the behavior of this API.
def unpack_collections(expr, _return_collections=True):

dask/_task_spec.py:676

  • Ensure that _execute_subgraph is imported or defined within this module; otherwise, the comparison in has_subgraph may raise a NameError at runtime.
def has_subgraph(self) -> bool:

@fjetter fjetter force-pushed the ensure_args_to_delayed_are_optimized branch from 31dc6df to 487571a Compare April 15, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

xarray store disables blockwise fusion if compute=False

1 participant