Skip to content

Conversation

@aidanheerdegen
Copy link
Contributor

@aidanheerdegen aidanheerdegen commented Aug 15, 2019

Explicit cast to numpy array to avoid np.ravel calling out to dask

Explicit cast to numpy array to avoid np.ravel calling out to dask
@max-sixty
Copy link
Collaborator

Hi @aidanheerdegen - apologies no one followed up on this for a few days. Thank you for the contribution!

Could you add a test?

@aidanheerdegen
Copy link
Contributor Author

HI @max-sixty. I am working on making a test, but when I serialise my test file so it is suitable for inclusion in a test it doesn't throw an error!

>>> ds = xarray.open_mfdataset('temp_049.nc', decode_cf=False)
>>> xarray.decode_cf(ds)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 479, in decode_cf
    decode_coords, drop_variables=drop_variables, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 401, in decode_cf_variables
    stack_char_dim=stack_char_dim, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 306, in decode_cf_variable
    var = coder.decode(var, name=name)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 419, in decode
    self.use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 90, in _decode_cf_datetime_dtype
    last_item(values) or [0]])
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/core/formatting.py", line 99, in last_item
    return np.ravel(array[indexer]).tolist()
AttributeError: 'Array' object has no attribute 'tolist'
>>> xarray.decode_cf(xarray.Dataset.from_dict(ds.to_dict()))
<xarray.Dataset>
Dimensions:     (time: 5)
Coordinates:
  * time        (time) object 2198-07-02 12:00:00 ... 2202-07-02 12:00:00
Data variables:
    average_T1  (time) datetime64[ns] ...
>>> ds.identical(xarray.Dataset.from_dict(ds.to_dict()))
True
>>>

Seems Dataset.identical is failing to find something that traversing decode_cf does. Odd.

@aidanheerdegen
Copy link
Contributor Author

I can save the decoded version to a file and read it back in and it throws the error. I suppose this is traversing a different code path

>>> xarray.decode_cf(xarray.Dataset.from_dict(ds.to_dict()))
<xarray.Dataset>
Dimensions:     (time: 5)
Coordinates:
  * time        (time) object 2198-07-02 12:00:00 ... 2202-07-02 12:00:00
Data variables:
    average_T1  (time) datetime64[ns] ...
>>> xarray.decode_cf(xarray.Dataset.from_dict(ds.to_dict())).to_netcdf('tmp.nc')
>>> xarray.decode_cf(xarray.open_mfdataset('tmp.nc',decode_cf=False))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 479, in decode_cf
    decode_coords, drop_variables=drop_variables, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 401, in decode_cf_variables
    stack_char_dim=stack_char_dim, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 306, in decode_cf_variable
    var = coder.decode(var, name=name)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 419, in decode
    self.use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 90, in _decode_cf_datetime_dtype
    last_item(values) or [0]])
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/core/formatting.py", line 99, in last_item
    return np.ravel(array[indexer]).tolist()
AttributeError: 'Array' object has no attribute 'tolist'
>>> 

@shoyer
Copy link
Member

shoyer commented Aug 28, 2019

When you write a dataset to a dict, the dask arrays are converted into NumPy arrays. Try something like xarray.decode_cf(xarray.Dataset.from_dict(ds.to_dict()).chunk())

@pep8speaks
Copy link

pep8speaks commented Aug 28, 2019

Hello @aidanheerdegen! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-28 06:29:45 UTC

@aidanheerdegen
Copy link
Contributor Author

Thanks @shoyer. I have added a test. It contains no assertion, but does fail with

AttributeError: 'Array' object has no attribute 'tolist'

without the code update. Is that sufficient?

)
assert_identical(decoded, conventions.decode_cf(original).compute())

def test_experimental_array(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this test_decode_dask_times?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also will need to use the @requires_dask decorator here, so this test only gets run if dask is installed.

"average_T1": {
"dims": ("time",),
"attrs": {
"long_name": "Start time for average period",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this attribute? if not, please remove it to keep the test example as simple as possible.

)
# Throws an AttributeError: 'Array' object has no attribute 'tolist'
# without updated code
conventions.decode_cf(original.chunk())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely preferrable to construct a positive example of what the dataset should be (and use assert_identical to verify it), but this is certainly sufficient that we don't get an error anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work to assert_equal to a non-dask version (I tried). pytest supports context managers when an assertion is expected, but not when one isn't. As I said, I could wrap in a try/except but not sure what that will achieve.

I've added an assert test that serialises the dask and non-dask datasets. Hopefully that suffices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you write:

expected = conventions.decode_cf(original)
actual = conventions.decode_cf(original.chunk())
assert_identical(expected, actual) 

If that doesn't work, you could also try:

expected = conventions.decode_cf(original)
actual = conventions.decode_cf(original.chunk()).compute()
assert_identical(expected, actual) 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a valid assert_identical test which I think should do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, crossed comments. Your first suggestion throws this:

>       elif hasattr(a, "data") and type(a).__module__.startswith("xarray."):
E       ValueError: cannot include dtype 'M' in a buffer

I think the test I just added may be quite close to your second suggestion, but flipped, so the "truth" is coerced to a dask array before comparison.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, that is definitely a dask bug. Should be fixed by dask/dask#5334.

This version looks fine.

shoyer added a commit to shoyer/dask that referenced this pull request Aug 28, 2019
…jects.

`da.asanyarray()` current crashes if passed a datetime64 dtype array, as noted
in pydata/xarray#3220:

    >       elif hasattr(a, "data") and type(a).__module__.startswith("xarray."):
    E       ValueError: cannot include dtype 'M' in a buffer

Reversing the order of these checks fixes the issue.

I also adjusted `da.asarray()` to coerce xarray objects in the same way as
`da.asanyarray()`.
shoyer added a commit to shoyer/dask that referenced this pull request Aug 28, 2019
…jects.

`da.asanyarray()` current crashes if passed a datetime64 dtype array, as noted
in pydata/xarray#3220:

    >       elif hasattr(a, "data") and type(a).__module__.startswith("xarray."):
    E       ValueError: cannot include dtype 'M' in a buffer

Reversing the order of these checks fixes the issue.

I also adjusted `da.asarray()` to coerce xarray objects in the same way as
`da.asanyarray()`.
@shoyer shoyer merged commit aaeea62 into pydata:master Aug 28, 2019
@shoyer
Copy link
Member

shoyer commented Aug 28, 2019

thanks @aidanheerdegen !

shoyer added a commit to dask/dask that referenced this pull request Aug 29, 2019
…jects (#5334)

* Fixup da.asarray and da.asanyarray for datetime64 dtype and xarray objects.

`da.asanyarray()` current crashes if passed a datetime64 dtype array, as noted
in pydata/xarray#3220:

    >       elif hasattr(a, "data") and type(a).__module__.startswith("xarray."):
    E       ValueError: cannot include dtype 'M' in a buffer

Reversing the order of these checks fixes the issue.

I also adjusted `da.asarray()` to coerce xarray objects in the same way as
`da.asanyarray()`.

* Update dask/array/tests/test_xarray.py

Co-Authored-By: Matthew Rocklin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist'

4 participants