-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Handle mask_and_scale ourselves instead of using netCDF4 #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This lets us NaNs instead of masked arrays to indicate missing values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to conflict with my change https://github.com/akleeman/xray/pull/21 but I'm definitely in favor of moving it to conventions.
|
Notes:
Failing tests: I added an "encoded_dtype" attribute to keep of the original dtype of variables loaded from a netCDF file. Unfortunately, this means most of the round-trip tests currently fail, because no variables have "encoded_dtype" attributes until they are loaded from netCDF files. I think we will need to make some ugly trade-off to get round tripping working both directions, but I'm not yet sure what the best option is. We could:
FWIW, I think it's OK if we don't preserve data types in the round-trip process, as long as the data itself is equivalent. I'm not entirely opposed to trying, but in general it is very hard to guarantee that serialized/unserialized data is exactly equivalent. There is somewhat of a conflict between preserving the original data (netCDF-like) and representing the data in a sane format in-memory (which should not be exactly like a netCDF). IMHO, we should focus on the later. |
src/xray/conventions.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the variable has been scaled and offset should we remove those attributes (or add underscores)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the for serializing to netCDF logic, so we need these attributes around still if we want to round trip things properly.
On the other hand, I'm not so certain now that exactly faithful round-trip behavior netCDF->xray->netCDF is behavior we want (see above).
|
OK, the last commit implements the suggested approach in #26 and all tests pass. There is not yet a test for toggling Please take a look when you get the chance :). |
src/xray/conventions.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_offset -> scale_factor
Miscelaneous bugs in mask/scale serialization also fixed.
|
See the new commits for more comprehensive tests of encoding/decoding. |
Datasets now have an optional constructor argument which determines whether CF-variables are converted or stored raw.
src/xray/dataset.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_decode_cf should not add additional state to Dataset. Instead it should be passed on to _set_variables and _as_variable. If we want the option to use CF decoding later, we should expose set_variables as a public method.
Handle mask_and_scale ourselves instead of using netCDF4
* Define _get_backends_cls function inside apiv2.py to read engines from plugins.py * Read open_backends_dataset_* from entrypoints. * Add backend entrypoints in setup.cfg * Pass apiv2.py isort and black formatting tests. * add dependencies * add backend entrypoints and check on conflicts * black * removed global variable EMGINES add class for entrypointys * black isort * add detect_engines in __all__ init.py * removed entrypoints in py36-bare-minimum.yml and py36-min-all-deps.yml * add entrypoints in IGNORE_DEPS * Plugins test (#20) - replace entrypoints with pkg_resources - add tests * fix typo Co-authored-by: keewis <[email protected]> * style Co-authored-by: keewis <[email protected]> * style * Code style * Code style * fix: updated plugins.ENGINES with plugins.list_engines() * fix * One more correctness fix of the latest merge from master Co-authored-by: TheRed86 <[email protected]> Co-authored-by: keewis <[email protected]> Co-authored-by: Alessandro Amici <[email protected]>
Add basic CI setup
This lets us use NaNs instead of masked arrays to indicate missing values.