Masked arrays #2301

jcrist · 2017-05-05T02:54:12Z

Adds support for masked arrays, similar to how we support sparse arrays.

Supports:

reductions (regular, cumulative, and arg)
slicing
dot
concatenate
ufuncs
elementwise
creation functions like da.ma.masked_greater
accessor functions like da.ma.getmaskarray
setting fill value with da.ma.set_fill_value

The last one is a little weird because it matches the numpy api of mutating the array instead of returning a new one. I figured it was better to match numpy here than to match dask, but could go either way.

Fixes #1928.

Note that this adds a type-level registry for package-level functions like concatenate and tensordot instead of using the package_of function from before. This is necessary because:

package_of returns np for masked arrays
No tensordot for masked_arrays
np.ma.concatenate doesn't persist the fill_value of its inputs

I also think this is cleaner than relying on inspecting the object to find its module.

mrocklin · 2017-05-05T10:55:42Z

@shoyer @pelson @niallrobinson @rabernat @pwolfram this may interest you or others within your respective organizations

mrocklin · 2017-05-05T11:43:59Z

dask/array/reductions.py

+        if dtype is not None:
+            x = x.astype(dtype)
+        return x
+    return divide(x1, x2, dtype=dtype)


This function seems unfortunate. Is there a way to not bake in ma here? What happens if another ma implementation arises? (this was discussed in the __array_ufunc__ PR

This function and the one above it (empty_type_of) could be easily switched out for a dispatch system, but the changes for arg and cumulative operations less so. My personal preference is to wait until there's a need for another in memory container, and then figure out what needs to be generalized.

I've generalized the easy things to generalize. I'd still rather wait until we have another in-memory container before figuring out what needs to be generalized forthe arg and cumulative operations.

mrocklin · 2017-05-05T11:48:24Z

The special-casing of np.ma in dask here is somewhat concerning. This seems fine if masked arrays are a priority (as those cc'ed above may be able to support) but puts us down an odd path where we might have a lot of highly branching code to support a few in-memory arrays. Are there ways around this through dispatching and such?

rabernat · 2017-05-05T13:02:56Z

Looks very cool! However, it will be of limited use to xarray, since xarray does not use masked arrays internally. (It uses NaN to represent masked elements.)

jcrist · 2017-05-23T19:55:07Z

This PR is falling out of sync with master. Before I take the time to fix the merge conflicts, is this of actual use to anyone? It would be good to hear from @shoyer, @pelson, @niallrobinson, @pwolfram, or @bjlittle on whether this would be useful for any of the work y'all do.

FWIW I think the maintenance costs here are fairly minimal, but it'd be good to have a real world use case before actually merging.

DPeterK · 2017-06-01T09:01:59Z

@jcrist Iris is interested in this work! Thanks muchly for the effort you've put into it -- hopefully @bjlittle or @pelson will be able to respond with something more concrete soon 👍

mrocklin · 2017-06-01T13:45:53Z

How does Iris handle masked arrays today?

Also @njsmith, do you have a sense for the planned longevity of the numpy.ma module?

bjlittle · 2017-06-01T14:01:31Z

@mrocklin At the moment, iris will either use numpy.ma for the concrete case, or biggus to handle masked arrays in a lazy way. But change is afoot ...

We're in the process of replacing biggus with dask, and opting for use numpy.nan to fudge support for lazy, mask like behaviour ... with pimple on the princess is the lazy, masked integral/bool case, where we need to change dtype to use numpy.nan. To be honest, if numpy.nan was dtype agnostic, then that would make life a heck of a lot easier, but instead to support that corner case, we need to deal with it explicitly, which isn't particularly pleasant.

mrocklin · 2017-06-01T14:02:41Z

How would life change if dask.array supported masked arrays? Would this have near-term positive impact on Iris and the Met office?

bjlittle · 2017-06-01T14:44:50Z

IMO we're committed to cutting the next release of iris with the current numpy.nan approach and dask as a replacement for biggus. If we don't do that, well then, that's news to me. So we've totally bought into the benefits of dask, and commited to it as a means to underpin our needs for deferred loading and lazy evaluation.

Naturally, from our next release, users of iris will be exposed to dask. If they choose to use dask natively (without iris getting in the way) then they may also be exposed to the numpy.nan approach to masking ... and perhaps the lazy, masked integral/bool case stings them, forcing them to deal with that explicitly. I really can't quantify the cost of that to them or others in the community.

So, for me, the near-term positive impact for dask.array supporting masked arrays is that it provides a minimal overhead to us in supporting the lazy, masked integral/bool case, which has been the biggest bane of our endeavor to replace biggus with dask in iris, and we would also be able to pass-off all masked array handling naturally to dask and not jump through the numpy.nan hoop .... an issue that others i.e. xarray at least, also have to address. So, I guess the question is, should all users of dask accept numpy.nan as a given approach to dealing with masking, or can the problem be addressed in one place, but not at the cost of compromising the design of dask?

jakirkham · 2017-06-01T15:21:46Z

Wanted to add something here. From past experience there are many functions operating on NumPy's masked arrays that have bugs, strange behavior, or are incomplete somehow. Feel free to take a look at NumPy's issue tracker for examples. A relevant search of these is linked below.

So taking on Masked Array support means handling these sorts of cases somehow or at a bare minimum directing feedback to upstream. This isn't an argument against it. Just trying to make sure you are aware of these problems. Admittedly people that are already using numpy.ma are probably aware of these shortcomings.

ref: https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20label%3A%22component%3A%20numpy.ma%22%20

...do you have a sense for the planned longevity of the numpy.ma module?

While I can't speak for Nathaniel, my understanding is Matplotlib makes use of Masked Arrays throughout. So NumPy couldn't drop Masked Arrays without breaking Matplotlib in a pretty big way, which seems like incentive enough to leave it alone. This in spite of occasional rumblings on the NumPy issue tracker to the contrary.

shoyer · 2017-06-01T15:39:28Z

NumPy tries *very* hard to avoid breaking backwards compatibility. You can count on masked arrays being around for the indefinite future, despite all of their issues. We haven't even gotten rid of np.matrix, even though all the NumPy developers agree that it shouldn't be used for new code.

…

On Thu, Jun 1, 2017 at 8:21 AM, jakirkham ***@***.***> wrote: Wanted to add something here. From past experience there are many functions operating on NumPy's masked arrays that have bugs, strange behavior, or are incomplete somehow. Feel free to take a look at NumPy's issue tracker for examples. A relevant search of these is linked below. So taking on Masked Array support means handling these sorts of cases somehow or at a bare minimum directing feedback to upstream. This isn't an argument against it. Just trying to make sure you are aware of these problems. Admittedly people that are already using numpy.ma are probably aware of these shortcomings. ref: https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q= is%3Aissue%20is%3Aopen%20label%3A%22component%3A%20numpy.ma%22%20 ...do you have a sense for the planned longevity of the numpy.ma module? While I can't speak for Nathaniel, my understanding is Matplotlib makes use of Masked Arrays throughout. So NumPy couldn't drop Masked Arrays without breaking Matplotlib in a pretty big way, which seems like incentive enough to leave it alone. This in spite of occasional rumblings on the NumPy issue tracker to the contrary. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2301 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1qvIJDwQWcIcKWfQ9mVukjvQzyvZks5r_tcMgaJpZM4NRcMo> .

mrocklin · 2017-06-01T15:40:57Z

Perhaps a more optimistic question is "is there likely to be a masked array alternative in the near future?"

njsmith · 2017-06-01T16:23:25Z

I would very much like to see proper NA support in numpy. That definitely doesn't replace all use cases for np.ma (in particular the cases where people want to toggle elements back and forth between masked/unmasked without losing data), but it sounds like it's what iris really wants. I won't try to make any concrete estimate for when that will happen, though. In general, I'd strongly recommend against writing new code using np.ma just because the design is kind of inherently broken in several ways and will never be fixed. From the maintenance point of view we generally treat it like a museum piece that needs preservation, though some people do submit the occasional bugfix. As __numpy_ufunc__ and any future follow-ups land then it will become more possible to write a better version of what np.ma does – perhaps someone will take up the challenge. In the mean time, @shoyer is right that it's not going away – at most we might eventually have a better replacement, and might then eventually deprecate it, and it's just barely possible that some time after that we might eventually split it out into a new package and make people update their dependencies. But the code isn't going to disappear.

bjlittle · 2017-06-02T08:59:12Z

I'm in agreement with @shoyer and @njsmith

Our whole biggus to dask migration and dealing with masking explicitly through numpy.nan has really highlighted some not very pleasant areas within numpy masked arrays. There are certainly a few dragons living there.

I'm pretty much in awe of @jcrist's efforts to get this PR up 👍. Totally awesome 🍻

That said, this PR has forced the discussion on whether dask should support numpy masked arrays. Being impartial, I think it's a decision that shouldn't be taken lightly by the dask core devs. I see it as a pretty big, long term commitment - caveat emptor.

mrocklin · 2017-06-07T13:56:00Z

So maybe we should do the following:

Issue another PR with all of the infrastructural changes done here that would make it easy to add masked arrays in the future
Wait on Iris and XArray to tell us that they would prefer to use masked arrays if available

Thoughts?

jcrist · 2017-06-07T18:37:09Z

The changes here were mostly:

Remove the package_of function in favor of a method registry (still think this is a good idea)
Generalize a few small bits
Some masked array specific changes that I'm not sure how/if to generalize
Tests

Of these, I'm only really in favor of adding the first separate from masked-array support. The rest would clutter up the code a bit without a specific reason behind them. I'm fine with letting this idle (even closed) until masked arrays are requested - shouldn't take long to bring back up to date.

marqh · 2017-06-16T10:32:55Z

Wait on Iris and XArray to tell us that they would prefer to use masked arrays if available

Hello @mrocklin et al

I'd like to share my perspective, as an Iris developer.

I think that the adoption of Dask within Iris is a really positive and exciting step for us. The work to integrate dask into Iris has been fairly intensive, and working around the lack of masked arrays has formed a significant quantity of this work.
Some of the work around code has been accepted as pragmatic, but does not feel ideal to me.
There are concerns from our user community about some of the side effects of this implementation for some use cases.

I think that in the short term, we will be working with dask and patching around the lack of masked array support. However, I feel that this represents a degree of technical debt for our implementation.

So, in the near future, I'm interested in dask supporting numpy's masked array implementation, as it feels like the widely used implementation and many libraries using numpy rely on this implementation for their core functionality.
If a future dask supports numpy.ma I would be very keen to adopt that in short order for our library, this would be a great benefit, I feel.

I think that driving a conversation about how a future numpy and dask could handle the concept of missing data and processing in different ways is a really valuable conversation. I think that this is a long term activity, which would be informed by dask adopting numpy's masked array as is.

I'd like to offer my encouragement for dask to adopt numpy.ma, with all its foibles and use this activity to explore what could be done better. Perhaps this would encourage a conversation across the numpy community about having a fresh look at this topic; I'd certainly be interested in engaging with such a discussion and offering key use cases if I can generate them.

many thanks
mark

jcrist · 2017-06-19T16:36:19Z

Ok, I've brought this PR back in sync with master.

I don't think this adds much complexity to dask.array internals, and think it'd be fine to merge. However, it might be nice to try it out on some simple problem to see if it solves real world needs. Specifically, the fill_value is not metadata on dask.array, and so can't be accessed statically (e.g. can't do a.fill_value). If this is needed for your work, then this PR doesn't satisfy that need. We could add a function to get it lazily (ma.get_fill_value(x)?), but having it as static metadata would require a redesign.

@marqh, @bjlittle do you have examples of what kind of operations you'd like to do with masked arrays? Do you have time/motivation to try this PR out?

pelson

By no means a comprehensive review, but a few noteworthy things:

>>> da.from_array(np.ma.masked_array([1, 2, 3], mask=[1, 0, 0]), chunks=(2,))

Traceback (most recent call last):
  File "/Users/pelson/miniconda/envs/dev-dask/lib/python3.6/site-packages/numpy/ma/core.py", line 3142, in view
    if issubclass(dtype, ndarray):
TypeError: issubclass() arg 1 must be a class

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pelson/dev/dask/dask/base.py", line 397, in normalize_array
    data = hash_buffer_hex(x.ravel(order='K').view('i1'))
  File "/Users/pelson/miniconda/envs/dev-dask/lib/python3.6/site-packages/numpy/ma/core.py", line 3148, in view
    output = ndarray.view(self, dtype)
  File "/Users/pelson/miniconda/envs/dev-dask/lib/python3.6/site-packages/numpy/ma/core.py", line 3425, in __setattr__
    self._mask.shape = self.shape
ValueError: cannot reshape array of size 3 into shape (24,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pelson/miniconda/envs/dev-dask/lib/python3.6/site-packages/numpy/ma/core.py", line 3142, in view
    if issubclass(dtype, ndarray):
TypeError: issubclass() arg 1 must be a class

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "example.py", line 10, in <module>
    ad = da.from_array(np.ma.masked_array([1, 2, 3], mask=[1, 0, 0]), chunks=(2,))
  File "/Users/pelson/dev/dask/dask/array/core.py", line 1893, in from_array
    token = tokenize(x, chunks)
  File "/Users/pelson/dev/dask/dask/base.py", line 426, in tokenize
    return md5(str(tuple(map(normalize_token, args))).encode()).hexdigest()
  File "/Users/pelson/dev/dask/dask/utils.py", line 415, in __call__
    return meth(arg)
  File "/Users/pelson/dev/dask/dask/base.py", line 399, in normalize_array
    data = hash_buffer_hex(x.copy().ravel(order='K').view('i1'))
  File "/Users/pelson/miniconda/envs/dev-dask/lib/python3.6/site-packages/numpy/ma/core.py", line 3148, in view
    output = ndarray.view(self, dtype)
  File "/Users/pelson/miniconda/envs/dev-dask/lib/python3.6/site-packages/numpy/ma/core.py", line 3425, in __setattr__
    self._mask.shape = self.shape
ValueError: cannot reshape array of size 3 into shape (24,)

>>> da.ma.masked_outside(np.array([1, 2, 3]), 2, 2.5)

Traceback (most recent call last):
  File "/Users/pelson/dev/dask/dask/array/ma.py", line 123, in masked_outside
    return map_blocks(np.ma.masked_outside, x, v1, v2)
  File "/Users/pelson/dev/dask/dask/array/core.py", line 662, in map_blocks
    out_ind = tuple(range(max(a.ndim for a in arrs)))[::-1]
ValueError: max() arg is an empty sequence


>>> da.ma.masked_where([1, 2, 3], da.arange(3, chunks=(2, )))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-4270623ca151> in <module>()
      4 import dask.array as da
      5 
----> 6 da.ma.masked_where([1, 2, 3], da.arange(3, chunks=(2, )))

/Users/pelson/dev/dask/dask/array/ma.py in masked_where(condition, a)
    132         raise IndexError("Inconsistant shape between the condition and the "
    133                          "input (got %s and %s)" % (cshape, a.shape))
--> 134     return map_blocks(np.ma.masked_where, condition, a)
    135 
    136 

/Users/pelson/dev/dask/dask/array/core.py in map_blocks(func, *args, **kwargs)
    690         else:
    691             kwargs2 = kwargs
--> 692         dtype = apply_infer_dtype(func, args, kwargs2, 'map_blocks')
    693 
    694     if len(arrs) == 1:

/Users/pelson/dev/dask/dask/array/core.py in apply_infer_dtype(func, args, kwargs, funcname, suggest_dtype)
    525         msg = None
    526     if msg is not None:
--> 527         raise ValueError(msg)
    528     return o.dtype
    529 

ValueError: `dtype` inference failed in `map_blocks`.

Please specify the dtype explicitly using the `dtype` kwarg.

Original error is below:
------------------------
IndexError('Inconsistant shape between the condition and the input (got (3,) and (1,))',)

Traceback:
---------
  File "/Users/pelson/dev/dask/dask/array/core.py", line 510, in apply_infer_dtype
    o = func(*args, **kwargs)
  File "/Users/pelson/miniconda/lib/python3.5/site-packages/numpy/ma/core.py", line 1910, in masked_where
    " (got %s and %s)" % (cshape, ashape))

(I guess this is because the mask should be chunked in the same way as the array)

I was somewhat concerned about the numerical accuracy of things like standard deviation - it would be easy to lose precision with the repeated addition result chunks. Biggus goes to some length to implement a streaming single pass standard deviation for both masked and un-masked data, but I don't have an example that justifies that implementation vs what has been done here.

I'm really pleased to say that a cursory experimentation suggests that the accuracy is very good. My code:

np.random.seed(0)
rand = np.random.randn(100000) ** 20

# ~4% masked numbers
data = np.ma.masked_outside(rand, -2, 2)

chunks = (rand.size // 500,)
da_data = da.ma.masked_where(da.from_array(data.mask, chunks=chunks),
                            da.from_array(rand, chunks=chunks))

print(biggus.std(data, ddof=2, axis=0).masked_array())
print(np.std(da_data, ddof=2).compute())
print(np.std(da_data.compute(), ddof=2))

which gives:

0.28873287828340444
0.288732878283
0.288732878283

I also visualised the graph with 5 equal chunks:

Which, just like the biggus implementation has a (necessary) bottleneck at the sqrt, but the preparation before that point parallelises well (just like it does with a dask.array).

In summary

This is a really great implementation - I've pointed out a few usability issues, but in principle I believe the implementation is a viable option for entirely bridging the remaining functionality gap between biggus and dask. With some refinement of this implementation I'm 100% supportive of moving towards formally deprecating biggus in favour of dask (note: I'm a core biggus dev).

pelson · 2017-06-19T20:06:43Z

dask/array/reductions.py

+def _cumsum_merge(a, b):
+    if isinstance(a, np.ma.masked_array) or isinstance(b, np.ma.masked_array):
+        values = np.ma.getdata(a) + np.ma.getdata(b)
+        return np.ma.masked_array(values, mask=np.ma.getmaskarray(b))


(Note: I'm reading this from a high level, so haven't fully understood its purpose)

Shouldn't this be a combination of a and b's masks?

No, for cumulative operations the mask stays fixed throughout and the operation only occurs on the data. In this case b is whatever chunk already existed at that chunk-location, and a is the results from cumsum/cumprod all previous chunks along the axis.

pelson · 2017-06-20T04:52:19Z

dask/array/reductions.py

+
+def divide(a, b, dtype=None):
+    key = lambda x: getattr(x, '__array_priority__', float('-inf'))
+    f = divide_lookup.dispatch(type(builtins.max(a, b, key=key)))


I'm surprised there isn't use of the __array_priority__ dispatching elsewhere (maybe there is). It seems like a fairly common requirement of Dispatch...

Why is divide different from other ufuncs?

We need to use np.ma.divide for masked arrays, and our patched version of np.divide for non-masked arrays. If we use np.divide for masked arrays, then the function applies to all elements in .data, which may result in RuntimeWarnings due to divide-by-zero. Note that this is only true for the functional operators, using the operators instead dispatches properly (e.g. divide vs /).

This is related to my question about implementing all the masked ufuncs. Normal ufuncs work and return instances of np.ma.masked_array, but apply to all elements in .data instead of only the non-masked ones. Note that even if we did implement masked ufuncs, we'd still need to do this dispatch here in the reductions code.

pelson · 2017-06-20T06:16:22Z

@jcrist - great stuff! 👍

jakirkham · 2017-06-20T14:45:36Z

Did you try with nomask?

jcrist · 2017-06-20T16:15:20Z

Thanks for the review @pelson. Responding to a few comments:

The errors you're seeing in masked_where and masked_outside have to do with lack of support for numpy arrays as arguments. Will fix.

One question is what do you expect to happen if you call a da.ma.masked_* function on only numpy arrays (as you did with masked_outside). Dask's convention so far is to call the equivalent numpy method in these cases. There's a question whether this should change (see #2103).

The from_array issue is a bug in our tokenizing function (will fix), but even then from_array won't work as it always calls np.asarray on all chunks. A few options here:

Don't call asarray if the input is a masked array. I'm not as fond of this because it special cases an input type.
Add a keyword to turn off asarray. NumPy seems to use subok for this in places, but I'm tempted to use asarray=True instead because I think it's clearer.

Thoughts?

I was somewhat concerned about the numerical accuracy of things like standard deviation

We also work hard to implement numerically stable parallel algorithms. The one used here is from http://prod.sandia.gov/techlib/access-control.cgi/2008/086212.pdf and has been working well for us. It's used for computing all our moments (var, skew, kurt, etc...).

Which, just like the biggus implementation has a (necessary) bottleneck at the sqrt

Just to clarify, depending on your chunking and reduction axis there may be multiple parallel calls to sqrt. In this case you're reducing down to a single chunk so there's only one call.

pelson · 2017-06-21T14:33:44Z

Dask's convention so far is to call the equivalent numpy method in these cases.

Personally, I'd expect to always have a lazy dask thing if I've called dask functions/methods. If I wanted immediate, I'd call numpy directly...

bug in our tokenizing function (will fix)... Thoughts?

At this moment, I'm afraid I don't have enough background to possibly give an educated answer.

We also work hard to implement numerically stable parallel algorithms.

Great. Biggus used a method described in Welford, BP (August 1962). "Note on a Method for Calculating Corrected Sums of Squares and Products and needed a particular implementation for masked arrays, so I think it is a good sign to see that your implementation of moment_chunk has held up well with this extension. 👍

marqh · 2017-06-29T14:34:15Z

@jcrist many thanks for the updates

@marqh, @bjlittle do you have examples of what kind of operations you'd like to do with masked arrays?

key operations I know about with masks include:

statistical aggregations that are mask aware
statistical aggregations of float arrays using masks and NaNs with different semantics
masked integer arrays
conditional masks, calculated differently based on metadata
plotting of masked data

Are you interested in general problem statements like this, or are you more keen for actual examples of data processing which could form test cases?

Our current implementation converts masked arrays to NaN filled float arrays, which gives partial support, but there are edge cases which are causing concerns and lacks of capability for some cases.

Do you have time/motivation to try this PR out?

motivation, for sure, time is more of a challenge just now, i'll let you know if I make progress.

jcrist · 2017-07-18T20:18:41Z

I'm running into some hard-to-solve issues with masked arrays if the fill_value isn't a scalar. In these cases it becomes unclear what the intended behavior is when concatenating blocks, and is tricky enough that I'd like to just forbid it/not handle it. It seems to only come up in masked_equal and masked_values (there's a few numpy issues about how this is confusing and bad but needs to remain for backwards compat).

I'd like to not handle this case and only work with scalar fill_value - is that ok for your needs? I can make it work with arrays, but there'd probably be edge cases. If I can forbid it, what should the behavior be if an array fill_value shows up (say from some user function and map_blocks)?

Error explicitly with "unsupported, can't be consistent here with numpy". This would be hard to catch in all cases, we'd really only see it when merging blocks.
Silently reset fill_value to the default. This is what np.ma.concatenate does anyway, but seems like it could be unexpected.

If given no direction, I plan to error explicitly where easy/possible, and forbid operations that could result in this case (e.g. forbid masked_equal with an array value).

Bugs in implementation of masked arrays before then make this difficult.

Move some masked array specific functions to general dispatches

- Add support for non-dask objects in masked operations - Properly handle non-equal chunking by using elemwise where required

Failures after merging new elemwise dtype inference code.

- Add api docs - Update sparse docs on arbitrary chunk types

jcrist · 2017-09-01T18:06:27Z

Fine by me. I've fixed the merge conflicts and updated some docs. I think this is good to go now.

mrocklin · 2017-09-01T18:07:16Z

+1 from me

jcrist · 2017-09-01T19:15:34Z

Alright, this is in. Thanks everyone for reviewing/testing this PR.

- Add support for masked arrays as chunks - Add api docs for mask arrays - Update sparse docs on arbitrary chunk types

jcrist mentioned this pull request May 5, 2017

nan vs mask #1928

Closed

jcrist force-pushed the masked-array branch from 8f9b429 to 9c9b732 Compare May 5, 2017 04:43

mrocklin reviewed May 5, 2017

View reviewed changes

jcrist force-pushed the masked-array branch from 610de3f to 53a9b53 Compare June 19, 2017 16:28

pelson reviewed Jun 20, 2017

View reviewed changes

pelson mentioned this pull request Jun 21, 2017

Should da ufuncs be lazy or immediate on numpy arrays? #2103

Open

jcrist added 16 commits September 1, 2017 12:37

Add support for masked arrays

5eb3e43

Add more masked methods

08d0eeb

Tests and fixes

41e4d30

Support for masked reductions

817175e

Add accessors

25a7cfd

Add set_fill_value

dc7d6f5

Remove unused package_of function

0c1a71e

Masked Array can't support numpy < 1.11.0

7d52c82

Bugs in implementation of masked arrays before then make this difficult.

Generalize some of the reduction code

eaefc21

Move some masked array specific functions to general dispatches

Add support for tokenizing masked arrays

c3185c3

Support non-dask in masked operations

f692f88

- Add support for non-dask objects in masked operations - Properly handle non-equal chunking by using elemwise where required

Test from_array on masked arrays

3238641

Respond to comment

e8ebc0a

Add da.ma.masked_array

0c1e45c

Fix masked failures

7d90917

Failures after merging new elemwise dtype inference code.

Add docs for masked arrays

c78ef47

- Add api docs - Update sparse docs on arbitrary chunk types

jcrist force-pushed the masked-array branch from 65b9bdb to c78ef47 Compare September 1, 2017 18:05

jcrist merged commit dd0802f into dask:master Sep 1, 2017

jcrist deleted the masked-array branch September 1, 2017 19:12

stsievert mentioned this pull request Sep 2, 2017

WIP: ENH: implement gradient approximation dask/dask-glm#60

Open

3 tasks

This was referenced Sep 4, 2017

Bump version to 0.15.3 #2654

Closed

Handle fill value on netCDF save SciTools/iris#2747

Merged

pp-mo mentioned this pull request Oct 9, 2017

cube.data.set_fill_value no longer works? SciTools/iris#2773

Closed

djkirkham mentioned this pull request Oct 26, 2017

Scalar array-like with asarray=False passed though #2823

Closed

fujiisoup pushed a commit to fujiisoup/dask that referenced this pull request Feb 3, 2018

Masked arrays (dask#2301)

47897a2

- Add support for masked arrays as chunks - Add api docs for mask arrays - Update sparse docs on arbitrary chunk types

fujiisoup pushed a commit to fujiisoup/dask that referenced this pull request Feb 6, 2018

Masked arrays (dask#2301)

370972a

- Add support for masked arrays as chunks - Add api docs for mask arrays - Update sparse docs on arbitrary chunk types

dbarbier mentioned this pull request Mar 29, 2019

var, std memory consumption numpy/numpy#13199

Open

Uh oh!

Masked arrays #2301

Masked arrays #2301

Uh oh!

Conversation

jcrist commented May 5, 2017

Uh oh!

mrocklin commented May 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrocklin commented May 5, 2017

Uh oh!

rabernat commented May 5, 2017

Uh oh!

jcrist commented May 23, 2017

Uh oh!

DPeterK commented Jun 1, 2017

Uh oh!

mrocklin commented Jun 1, 2017

Uh oh!

bjlittle commented Jun 1, 2017

Uh oh!

mrocklin commented Jun 1, 2017

Uh oh!

bjlittle commented Jun 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Jun 1, 2017

Uh oh!

shoyer commented Jun 1, 2017 via email

Uh oh!

mrocklin commented Jun 1, 2017

Uh oh!

njsmith commented Jun 1, 2017

Uh oh!

bjlittle commented Jun 2, 2017

Uh oh!

mrocklin commented Jun 7, 2017

Uh oh!

jcrist commented Jun 7, 2017

Uh oh!

marqh commented Jun 16, 2017

Uh oh!

jcrist commented Jun 19, 2017

Uh oh!

pelson left a comment

Choose a reason for hiding this comment

In summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pelson commented Jun 20, 2017

Uh oh!

jakirkham commented Jun 20, 2017

Uh oh!

jcrist commented Jun 20, 2017

Uh oh!

pelson commented Jun 21, 2017

Uh oh!

marqh commented Jun 29, 2017

Uh oh!

jcrist commented Jul 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcrist commented Sep 1, 2017

Uh oh!

mrocklin commented Sep 1, 2017

bjlittle commented Jun 1, 2017 •

edited

Loading

jcrist commented Jul 18, 2017 •

edited

Loading