Add cached version for `normalize_chunks` #11650

phofl · 2025-01-08T17:07:31Z

Closes #xxxx
Tests added / passed
Passes pre-commit run --all-files

cc @dcherian is this suitable for xarray?

github-actions · 2025-01-08T17:55:37Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

15 files ± 0 15 suites ±0 4h 29m 11s ⏱️ -2s
17 163 tests + 1 15 968 ✅ + 1 1 195 💤 ±0 0 ❌ ±0
211 476 runs +13 194 296 ✅ +10 17 180 💤 +3 0 ❌ ±0

Results for commit f4c0011. ± Comparison against base commit 7393a77.

This pull request removes 1 and adds 2 tests. Note that renamed tests count towards both.

dask.array.tests.test_array_core ‑ test_normalize_chunks

dask.array.tests.test_array_core ‑ test_normalize_chunks[normalize_chunks]
dask.array.tests.test_array_core ‑ test_normalize_chunks[normalize_chunks_cached]

dcherian · 2025-01-09T00:16:30Z

Yes decent impact.

# BEFORE: dask  : 2024.12.1, xarray: 2025.1.0
# Wall time: 1min 12s
#
# AFTER: xarray: 2024.11.1.dev50+gd9365109.d20250108, dask  : 2024.12.1+966.gf4c001150
# Wall time: 49.1 s
%time xr.open_zarr("gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3")

from dask.array.core import normalize_chunks_cached

normalize_chunks_cached.cache_info()
# CacheInfo(hits=271, misses=2, maxsize=128, currsize=2)

Now the big issue is tokenize.

dcherian · 2025-01-09T00:17:50Z

dask/array/core.py

    return i


+@functools.lru_cache


I bet you could use this function internally if you wrote a caching decorator that cached both by id and hash. That way you check id first, and then with the hash if that fails.

Oh this is interesting, I'll take a look at this but will merge that one for now

dcherian · 2025-01-09T01:02:35Z

This is the issue:

from dask.layers import ArraySliceDep

chunks = ((1,)*1_000_000, (721,), (1441,))
%timeit dask.base.tokenize(ArraySliceDep(chunks)) # 55ms

for 300 variables, that alone takes 16s. somehow using a cache with id(chunks) in that tokenization would fix it.

EDIT: The tuples within chunks are the cached values, not chunks itself

phofl · 2025-01-09T09:49:08Z

for 300 variables, that alone takes 16s. somehow using a cache with id(chunks) in that tokenization would fix it.

EDIT: The tuples within chunks are the cached values, not chunks itself

Yeah, this is tricky, I'll see if we can do something here

phofl · 2025-01-09T09:49:43Z

@dcherian

can you import from

dask.array.api when you add this to xarray?

phofl added 2 commits January 8, 2025 18:03

Add cached version for normalize_chunks

b0bc7f0

Add cached version for normalize_chunks

f4c0011

phofl changed the title ~~Add cached version for normalize_chunks~~ Add cached version for normalize_chunks Jan 8, 2025

dcherian reviewed Jan 9, 2025

View reviewed changes

dcherian approved these changes Jan 9, 2025

View reviewed changes

phofl merged commit ca15d49 into dask:main Jan 9, 2025
27 checks passed

phofl deleted the normalize-chunks-cached branch January 9, 2025 09:49

dcherian added a commit to dcherian/xarray that referenced this pull request Oct 15, 2025

Support dask/dask#11650

5f05058

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add cached version for `normalize_chunks` #11650

Add cached version for `normalize_chunks` #11650

Uh oh!

phofl commented Jan 8, 2025

Uh oh!

github-actions bot commented Jan 8, 2025

Uh oh!

dcherian commented Jan 9, 2025

Uh oh!

dcherian Jan 9, 2025

Uh oh!

phofl Jan 9, 2025

Uh oh!

dcherian commented Jan 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

phofl commented Jan 9, 2025

Uh oh!

phofl commented Jan 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return i


		@functools.lru_cache

Uh oh!

Add cached version for normalize_chunks #11650

Add cached version for normalize_chunks #11650

Uh oh!

Conversation

phofl commented Jan 8, 2025

Uh oh!

github-actions bot commented Jan 8, 2025

Unit Test Results

Uh oh!

dcherian commented Jan 9, 2025

Uh oh!

dcherian Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

phofl Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

dcherian commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

phofl commented Jan 9, 2025

Uh oh!

phofl commented Jan 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add cached version for `normalize_chunks` #11650

Add cached version for `normalize_chunks` #11650

dcherian commented Jan 9, 2025 •

edited

Loading