Skip to content

Setting explicit chunk sizes in dask.array.from_array() fails for 2D array: "Chunks do not add up to shape" #7310

@jameslamb

Description

@jameslamb

What happened:

Running dask.array.from_array() on a 2D numpy.ndarray and passing a specific list of chunk sizes to chunks, construction of the Dask Array failed with the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/dask/array/core.py", line 3054, in from_array
    chunks = normalize_chunks(
  File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/dask/array/core.py", line 2726, in normalize_chunks
    raise ValueError(
ValueError: Chunks do not add up to shape. Got chunks=((67, 6), (33, 6)), shape=(100, 6)

What you expected to happen:

I expected that running da.from_array(X, chunks=((67, 6), (33, 6))) on a numpy array with shape (100, 6) would create a Dask Array with two chunks, where the first chunk had size (67, 6) and the second chunk had size (33, 6).

I expected to be able to do this based on the documentation at https://docs.dask.org/en/latest/array-api.html#dask.array.from_array, which includes following in the list of valid values for the chunks argument to dask.array.from_array():

Explicit sizes of all blocks along all dimensions like ((1000, 1000, 500), (400, 400)).

Minimal Complete Verifiable Example:

import dask.array as da
import numpy as np

X = np.random.random((100, 6))

da.from_array(
    X,
    chunks=((67, 6), (33, 6))
)

Anything else we need to know?:

I did search for other issues with this error message, and didn't find any that seemed like the same question. I don't know for sure if the error I hit is the root cause of #6709 or not, but it seems possible. I don't even know for sure if it's an error or if I just misinterpreted the documentation 😬 .

I tried to find tests on the pattern "pass a list of specific chunk sizes to .from_array()", to see if those tests used that feature differently and maybe I was misunderstanding the docs. I ran git grep from_array dask/tests and saw that there don't appear to be any tests currently on that pattern.

image

Environment:

  • Dask version:
    • dask: 2021.2.0
    • numpy: 1.20.1
  • Python version: 3.8.3
  • Operating System: MacOS Mojave (10.14.6)
  • Install method (conda, pip, source): pip
conda info
     active environment : None
       user config file : /Users/jlamb/.condarc
 populated config files : /Users/jlamb/.condarc
          conda version : 4.9.2
    conda-build version : not installed
         python version : 3.8.3.final.0
       virtual packages : __osx=10.14.6=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/jlamb/miniconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/jlamb/miniconda3/pkgs
                          /Users/jlamb/.conda/pkgs
       envs directories : /Users/jlamb/miniconda3/envs
                          /Users/jlamb/.conda/envs
               platform : osx-64
             user-agent : conda/4.9.2 requests/2.24.0 CPython/3.8.3 Darwin/18.7.0 OSX/10.14.6
                UID:GID : 501:20
             netrc file : None
           offline mode : False

Thanks for your time and help with this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions