-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What is your issue?
Summary
The netcdf4-python API docs say the following
If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently
zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlibandblosc_zstdare supported. Default is None (no compression). All of the compressors exceptzlibandszipuse the HDF5 plugin architecture.If the optional keyword
zlibis True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor ofcompression='zlib'.
Although compression is considered a valid encoding option by Xarray
xarray/xarray/backends/netCDF4_.py
Lines 232 to 242 in bbe63ab
| valid_encodings = { | |
| "zlib", | |
| "complevel", | |
| "fletcher32", | |
| "contiguous", | |
| "chunksizes", | |
| "shuffle", | |
| "_FillValue", | |
| "dtype", | |
| "compression", | |
| } |
...it appears that we silently ignores the compression option when creating new netCDF4 variables:
xarray/xarray/backends/netCDF4_.py
Lines 488 to 501 in bbe63ab
| nc4_var = self.ds.createVariable( | |
| varname=name, | |
| datatype=datatype, | |
| dimensions=variable.dims, | |
| zlib=encoding.get("zlib", False), | |
| complevel=encoding.get("complevel", 4), | |
| shuffle=encoding.get("shuffle", True), | |
| fletcher32=encoding.get("fletcher32", False), | |
| contiguous=encoding.get("contiguous", False), | |
| chunksizes=encoding.get("chunksizes"), | |
| endian="native", | |
| least_significant_digit=encoding.get("least_significant_digit"), | |
| fill_value=fill_value, | |
| ) |
Code example
shape = (10, 20)
chunksizes = (1, 10)
encoding = {
'compression': 'zlib',
'shuffle': True,
'complevel': 8,
'fletcher32': False,
'contiguous': False,
'chunksizes': chunksizes
}
da = xr.DataArray(
data=np.random.rand(*shape),
dims=['y', 'x'],
name="foo",
attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()
fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")
with xr.open_dataset(fname, engine="netcdf4") as ds1:
display(ds1.foo.encoding){'zlib': False,
'szip': False,
'zstd': False,
'bzip2': False,
'blosc': False,
'shuffle': False,
'complevel': 0,
'fletcher32': False,
'contiguous': False,
'chunksizes': (1, 10),
'source': 'test.nc',
'original_shape': (10, 20),
'dtype': dtype('float64'),
'_FillValue': nan}
In addition to showing that compression is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip, zstd, bzip2, blosc).
Proposal
We should align with the recommendation from the netcdf4 docs and support compression= style encoding in NetCDF. We should deprecate zlib=True syntax.