Skip to content

pass **kwargs through from save_mfdataset to to_netcdf #6684

@taobrienlbl

Description

@taobrienlbl

Is your feature request related to a problem?

Based on the documentation of xarray.save_mfdataset, I would expect that arguments that can be passed to xarray.Dataset.to_netcdf() can also be passed to xarray.save_mfdataset:

When not using dask, it is no different than calling to_netcdf repeatedly.

But it appears that the unlimited_dims and encoding arguments available in to_netcdf are not also available in save_mfdataset:

test_save_mfdataset_encoding_opt.py:

import xarray as xr

# create a timeseries to store in a netCDF file
times = list(range(0,3652))
time = xr.DataArray(times, dims = ("time",))

# create a simple dataset to write using save_mfdataset
test_ds = xr.Dataset()
test_ds['time'] = time

# tell netCDF to write the times as doubles
encoding = dict(time = dict(dtype = "double"))

# set the output file name
output_path = "test.nc"

# the test fails when encoding is added as an argument to save_mfdataset
# but it works if instead the dataset is saved using
# test_ds.to_netcdf(output_path, encoding = encoding)
xr.save_mfdataset([test_ds], [output_path], encoding = encoding)
$ python3 test_save_mfdataset_encoding_opt.py
Traceback (most recent call last):
  File "test_save_mfdataset_encoding_opt.py", line 21, in <module>
    xr.save_mfdataset([test_ds], [output_path], encoding = encoding)
TypeError: save_mfdataset() got an unexpected keyword argument 'encoding'

This appears to be because save_mfdataset does not accept the encoding argument, nor does it accept and pass along **kwargs.

This means that datasets written with save_mfdataset are less flexible than those written with to_netcdf.

Describe the solution you'd like

A simple fix, which I have verified, is to modify save_mfdataset to accept and pass along **kwargs:

diff --git a/xarray/backends/api.py b/xarray/backends/api.py
index d1166624..8baca58c 100644
--- a/xarray/backends/api.py
+++ b/xarray/backends/api.py
@@ -1258,7 +1258,7 @@ def dump_to_store(


 def save_mfdataset(
-    datasets, paths, mode="w", format=None, groups=None, engine=None, compute=True
+    datasets, paths, mode="w", format=None, groups=None, engine=None, compute=True, **kwargs
 ):
     """Write multiple datasets to disk as netCDF files simultaneously.

@@ -1280,6 +1280,7 @@ def save_mfdataset(
         these locations will be overwritten.
     format : {"NETCDF4", "NETCDF4_CLASSIC", "NETCDF3_64BIT", \
               "NETCDF3_CLASSIC"}, optional
+    **kwargs : additional arguments are passed along to ``to_netcdf``

         File format for the resulting netCDF file:

@@ -1358,7 +1359,7 @@ def save_mfdataset(
     writers, stores = zip(
         *[
             to_netcdf(
-                ds, path, mode, format, group, engine, compute=compute, multifile=True
+                ds, path, mode, format, group, engine, compute=compute, multifile=True, **kwargs
             )
             for ds, path, group in zip(datasets, paths, groups)
         ]

When a version of xarray with xarray/backends/api.py patched as above, the test file indicated above runs as expected, with the encoding passed along:

$ python3 test_save_mfdataset_encoding_opt.py
$ ncdump -h test.nc
netcdf test {
dimensions:
	time = 3652 ;
variables:
	double time(time) ;
		time:_FillValue = NaN ;
}

Describe alternatives you've considered

I attempted to set the encoding dictionary directly on the dataset prior to calling save_mfdataset, but that didn't seem to have an effect.

Additional context

Here is version information, in case it is relevant:

$ python3 -c 'import xarray; print(xarray.show_versions())'

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.4 (default, Aug 13 2019, 15:17:50)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1

xarray: 0.15.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.6.3
netCDF4: 1.4.2
pydap: installed
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.1.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.3
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.8.3
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0
None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions