FYI @shoyer @hameerabbasi @keewis
numpy 1.17, xarray/dask/sparse/pint git tip
NEP18 doesn't seem to work correctly in several cases.
I'm still in the process of investigating what causes the issue(s).
Works:
- pint wraps around sparse
- pint wraps around dask.array
- xarray wraps around pint
- xarray wraps around sparse
- dask.array wraps around sparse
- xarray wraps around dask.array which wraps around sparse
Broken:
- [1] dask.array wraps around pint, and there are 2+ chunks
- [2] xarray wraps around pint which wraps around dask
- [3] xarray wraps around pint which wraps around sparse
[1] dask.array wraps around pint, and there are 2+ chunks
At first sight, the legitimacy of this use case is arguable, as it feels much cleaner to always have pint wrapping around dask.array (and it saves a few of headaches when dask.distributed and custom UnitRegistries get involved, too, as you never need to pickle your Quantities).
However, the problems of pint->dask and the benefits of dask->pint become clear when one wraps a pint+dask object in xarray.
There, with pint around dask, one would need to write special case handling for pretty much every piece of xarray logic that today has special case handling for dask - which is, a lot, whereas with dask around pint I would expect everything to work out of the box as long as NEP18 compliance is respected by all libraries.
@shoyer I'd like to hear your opinion on this...
>>> import dask.array as da
>>> import pint
>>> ureg = pint.UnitRegistry()
>>> q = ureg.Quantity([1, 2], "kg")
>>> da.from_array(q).compute() # single chunk works
<Quantity([1 2], 'kilogram')>
>> da.from_array(q, chunks=1).compute() . # With 2+ chunks, something's calling Quantity.__array__
array([1, 2])
[2] xarray wraps around pint which wraps around dask
Following the reasoning of [1], this should happen only when a user manually builds the data, as opposed to calling xarray.Dataset.chunk() - which should be rare-ish. I'm tempted to write a single piece of logic in xarray.Variable.data.setter that detects the special pint->dask case and turns it around to dask->pint.
>>> import dask.array as da
>>> import pint
>>> ureg = pint.UnitRegistry()
>>> q = ureg.Quantity(da.from_array([1, 2]), "kg")
>>> q
<Quantity(dask.array<array, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>, 'kilogram')>
>>> xarray.DataArray(q) # Something is calling da.Array.__array__, which computes it
<xarray.DataArray 'array-de932becc43e72c010bc91ffefe42af1' (dim_0: 2)>
<Quantity([1 2], 'kilogram')>
Dimensions without coordinates: dim_0
[3] xarray wraps around pint which wraps around sparse
This looks to be the same as [2].
>>> import numpy as np
>>> import pint
>>> import sparse
>>> ureg = pint.UnitRegistry()
>>> q = ureg.Quantity(sparse.COO(np.array([1, 2])), "kg")
>>> q
<Quantity(<COO: shape=(2,), dtype=int64, nnz=2, fill_value=0>, 'kilogram')>
>>> xarray.DataArray(q)
RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.
FYI @shoyer @hameerabbasi @keewis
numpy 1.17, xarray/dask/sparse/pint git tip
NEP18 doesn't seem to work correctly in several cases.
I'm still in the process of investigating what causes the issue(s).
Works:
Broken:
[1] dask.array wraps around pint, and there are 2+ chunks
At first sight, the legitimacy of this use case is arguable, as it feels much cleaner to always have pint wrapping around dask.array (and it saves a few of headaches when dask.distributed and custom UnitRegistries get involved, too, as you never need to pickle your Quantities).
However, the problems of pint->dask and the benefits of dask->pint become clear when one wraps a pint+dask object in xarray.
There, with pint around dask, one would need to write special case handling for pretty much every piece of xarray logic that today has special case handling for dask - which is, a lot, whereas with dask around pint I would expect everything to work out of the box as long as NEP18 compliance is respected by all libraries.
@shoyer I'd like to hear your opinion on this...
[2] xarray wraps around pint which wraps around dask
Following the reasoning of [1], this should happen only when a user manually builds the data, as opposed to calling
xarray.Dataset.chunk()- which should be rare-ish. I'm tempted to write a single piece of logic inxarray.Variable.data.setterthat detects the special pint->dask case and turns it around to dask->pint.[3] xarray wraps around pint which wraps around sparse
This looks to be the same as [2].