Skip to content

NEP18 trouble when pint is being wrapped #878

@crusaderky

Description

@crusaderky

FYI @shoyer @hameerabbasi @keewis
numpy 1.17, xarray/dask/sparse/pint git tip

NEP18 doesn't seem to work correctly in several cases.
I'm still in the process of investigating what causes the issue(s).

Works:

  • pint wraps around sparse
  • pint wraps around dask.array
  • xarray wraps around pint
  • xarray wraps around sparse
  • dask.array wraps around sparse
  • xarray wraps around dask.array which wraps around sparse

Broken:

  • [1] dask.array wraps around pint, and there are 2+ chunks
  • [2] xarray wraps around pint which wraps around dask
  • [3] xarray wraps around pint which wraps around sparse

[1] dask.array wraps around pint, and there are 2+ chunks
At first sight, the legitimacy of this use case is arguable, as it feels much cleaner to always have pint wrapping around dask.array (and it saves a few of headaches when dask.distributed and custom UnitRegistries get involved, too, as you never need to pickle your Quantities).

However, the problems of pint->dask and the benefits of dask->pint become clear when one wraps a pint+dask object in xarray.
There, with pint around dask, one would need to write special case handling for pretty much every piece of xarray logic that today has special case handling for dask - which is, a lot, whereas with dask around pint I would expect everything to work out of the box as long as NEP18 compliance is respected by all libraries.
@shoyer I'd like to hear your opinion on this...

>>> import dask.array as da
>>> import pint
>>> ureg = pint.UnitRegistry()
>>> q = ureg.Quantity([1, 2], "kg")
>>> da.from_array(q).compute() # single chunk works
<Quantity([1 2], 'kilogram')>
>> da.from_array(q, chunks=1).compute() . # With 2+ chunks, something's calling Quantity.__array__
array([1, 2])

[2] xarray wraps around pint which wraps around dask
Following the reasoning of [1], this should happen only when a user manually builds the data, as opposed to calling xarray.Dataset.chunk() - which should be rare-ish. I'm tempted to write a single piece of logic in xarray.Variable.data.setter that detects the special pint->dask case and turns it around to dask->pint.

>>> import dask.array as da
>>> import pint
>>> ureg = pint.UnitRegistry()
>>> q = ureg.Quantity(da.from_array([1, 2]), "kg")
>>> q
<Quantity(dask.array<array, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>, 'kilogram')>
>>> xarray.DataArray(q) # Something is calling da.Array.__array__, which computes it
<xarray.DataArray 'array-de932becc43e72c010bc91ffefe42af1' (dim_0: 2)>
<Quantity([1 2], 'kilogram')>
Dimensions without coordinates: dim_0

[3] xarray wraps around pint which wraps around sparse
This looks to be the same as [2].

>>> import numpy as np
>>> import pint
>>> import sparse
>>> ureg = pint.UnitRegistry()
>>> q = ureg.Quantity(sparse.COO(np.array([1, 2])), "kg")
>>> q
<Quantity(<COO: shape=(2,), dtype=int64, nnz=2, fill_value=0>, 'kilogram')>
>>> xarray.DataArray(q)
RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions