Warnings when writing valid_[min|max|range] properties#29
Warnings when writing valid_[min|max|range] properties#29sadielbartholomew merged 21 commits intoNCAS-CMS:masterfrom
Conversation
…ply_masking to mask metadata constructs
|
Hi @sadielbartholomew It'd be great if you could review this PR. We discussed have |
|
Fixes #30 |
Sure. I've assigned myself so I can get notified automatically when it is ready for review.
Fair enough, that does sound sensible. A warning is indeed sufficient & if users don't read it & heed it then that is their problem! |
|
(Holding on reviewing for the moment as it seems from external discussion we might want to adjust the approach.) |
There was a problem hiding this comment.
Initial review feedback: the read behaviour seems to work as agreed when setting warn_valid as True & False for datasets with & without valid_* properties on fields. Good stuff.
But there are two major issues in the write I have observed, at least unless I missing something which is certainly possible:
1) The write not raising the warning when specified: as seen by reading in a field with demonstrated valid_* properties & writing it straight out to a new file (& reading back in once more just to double check the valid_* properties are still there as they should be):
In [57]: e = cfdm.read('~/cf-test-datasets/ecm-e40_1deg-model-levs_2008090706_al
...: l.nc', warn_valid=True)
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude_1(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude_1(72), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Temperature(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude(96)) K> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=U velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude_1(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=V velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude_1(72), ncdim%longitude(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
In [58]: for field in e:
...: print("min", field.get_property('valid_min'))
...: print("max", field.get_property('valid_max'))
...: try:
...: print("range", field.get_property('valid_range'))
...: except:
...: pass
min 10.867776
max 11.5522375
min 10.870843
max 11.552274
min 10.879355
max 11.552163
min 183.56966
max 313.26398
min -52.872753
max 145.06691
min -52.22291
max 59.72735
In [59]: cfdm.write(e, 'testing-write.nc', warn_valid=True)
In [60]: e = cfdm.read('/home/vagrant/cf-test-datasets/testing-write.nc', warn_v
...: alid=True)
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
...2) Two separate errors for setting each Boolean (UnboundLocalError, KeyError) (see tracebacks in report below) when a write is executed to overwrite an existing field. I can look at the code to narrow this down but having written it you may be able to spot what is happening? Either way, further investigation is needed into the conditions this gets raised in & why:
In [9]: e = cfdm.read('~/cf-test-datasets/ecm-e40_1deg-model-levs_2008090706_all
...: .nc', warn_valid=True)
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude_1(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude_1(72), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Temperature(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude(96)) K> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=U velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude_1(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=V velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude_1(72), ncdim%longitude(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
In [10]: cfdm.write(e, 'check-write-no-kwarg.nc', warn_valid=False)
In [11]: f = cfdm.read('~/cf-test-datasets/check-write-no-kwarg.nc', warn_valid=
...: True)
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude_1(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude_1(72), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Temperature(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude(96)) K> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=U velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude_1(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=V velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude_1(72), ncdim%longitude(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
In [12]: cfdm.write(f, 'check-write-no-kwarg.nc', warn_valid=False)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-13-95b0eaf1d364> in <module>
----> 1 cfdm.write(f, 'check-write-no-kwarg.nc', warn_valid=False)
~/cfdm/cfdm/read_write/write.py in write(fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, string, verbose, warn_valid, _implementation)
406 shuffle=shuffle, fletcher32=fletcher32,
407 string=string, verbose=verbose,
--> 408 warn_valid=warn_valid, extra_write_vars=None)
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in write(self, fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, scalar, string, extra_write_vars, verbose, warn_valid)
3841 # ------------------------------------------------------------
3842 for f in fields:
-> 3843 self._write_field(f)
3844
3845 # ------------------------------------------------------------
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_field(self, f, add_to_seen, allow_data_insert_dimension)
2618 # the dimension coordinate to the file as a
2619 # coordinate variable.
-> 2620 ncvar = self._write_dimension_coordinate(f, key, dim_coord)
2621 else:
2622 # The data array does not span this axis (and
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_dimension_coordinate(self, f, key, coord)
621 # Create a new dimension coordinate variable
622 self._write_netcdf_variable(ncvar, ncdimensions, coord,
--> 623 extra=extra)
624 else:
625 ncvar = seen[id(coord)]['ncvar']
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_netcdf_variable(self, ncvar, ncdimensions, cfvar, omit, extra, fill, data_variable)
2209 unset_values=unset_values,
2210 compressed=compressed,
-> 2211 attributes=attributes)
2212
2213 # Update the 'seen' dictionary
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_data(self, data, cfvar, ncvar, ncdimensions, unset_values, compressed, attributes)
2302 else:
2303 # Get the data as an uncompressed numpy array
-> 2304 array = self.implementation.get_array(data)
2305
2306 # Convert data type
~/cfdm/cfdm/cfdmimplementation.py in get_array(self, data)
286
287 '''
--> 288 return data.array
289
290 def get_auxiliary_coordinates(self, field, axes=[], exact=False):
~/cfdm/cfdm/core/data/data.py in array(self)
138
139 '''
--> 140 array = self._get_Array().array
141
142 # Set the numpy array fill value
~/cfdm/cfdm/data/netcdfarray.py in array(self)
425
426 '''
--> 427 return self[...]
428
429 def open(self):
~/cfdm/cfdm/data/netcdfarray.py in __getitem__(self, indices)
120 if ncvar is not None:
121 # Get the variable by netCDF name
--> 122 variable = netcdf.variables[ncvar]
123 # print (mask, variable.mask)
124 variable.set_auto_mask(mask)
KeyError: 't'[breaking this report up else the syntax highlighting breaks for some reason, but continuing form the above... ]
In [13]: cfdm.write(f, 'check-write-no-kwarg.nc', warn_valid=True)
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-14-ba942d56478f> in <module>
----> 1 cfdm.write(f, 'check-write-no-kwarg.nc', warn_valid=True)
~/cfdm/cfdm/read_write/write.py in write(fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, string, verbose, warn_valid, _implementation)
406 shuffle=shuffle, fletcher32=fletcher32,
407 string=string, verbose=verbose,
--> 408 warn_valid=warn_valid, extra_write_vars=None)
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in write(self, fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, scalar, string, extra_write_vars, verbose, warn_valid)
3841 # ------------------------------------------------------------
3842 for f in fields:
-> 3843 self._write_field(f)
3844
3845 # ------------------------------------------------------------
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_field(self, f, add_to_seen, allow_data_insert_dimension)
2618 # the dimension coordinate to the file as a
2619 # coordinate variable.
-> 2620 ncvar = self._write_dimension_coordinate(f, key, dim_coord)
2621 else:
2622 # The data array does not span this axis (and
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_dimension_coordinate(self, f, key, coord)
621 # Create a new dimension coordinate variable
622 self._write_netcdf_variable(ncvar, ncdimensions, coord,
--> 623 extra=extra)
624 else:
625 ncvar = seen[id(coord)]['ncvar']
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_netcdf_variable(self, ncvar, ncdimensions, cfvar, omit, extra, fill, data_variable)
2209 unset_values=unset_values,
2210 compressed=compressed,
-> 2211 attributes=attributes)
2212
2213 # Update the 'seen' dictionary
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_data(self, data, cfvar, ncvar, ncdimensions, unset_values, compressed, attributes)
2302 else:
2303 # Get the data as an uncompressed numpy array
-> 2304 array = self.implementation.get_array(data)
2305
2306 # Convert data type
~/cfdm/cfdm/cfdmimplementation.py in get_array(self, data)
286
287 '''
--> 288 return data.array
289
290 def get_auxiliary_coordinates(self, field, axes=[], exact=False):
~/cfdm/cfdm/core/data/data.py in array(self)
138
139 '''
--> 140 array = self._get_Array().array
141
142 # Set the numpy array fill value
~/cfdm/cfdm/data/netcdfarray.py in array(self)
425
426 '''
--> 427 return self[...]
428
429 def open(self):
~/cfdm/cfdm/data/netcdfarray.py in __getitem__(self, indices)
111
112 '''
--> 113 netcdf = self.open()
114
115 # indices = tuple(self.parse_indices(indices))
~/cfdm/cfdm/data/netcdfarray.py in open(self)
450 self._set_component('netcdf', netcdf, copy=False)
451
--> 452 return netcdf
453
454 def to_memory(self):
UnboundLocalError: local variable 'netcdf' referenced before assignment|
Further to offline discussions: 1) I think is OK, as the data being written is within the valid range. Assigning a large (+ve or -ve) number to the data array prior to writing triggers the warning for me. 2) has been hopefully been fixed by c112ed2 and 7643d4f @sadielbartholomew Can you confirm? Thanks |
There was a problem hiding this comment.
Excellent thanks for investigating, @davidhassell. I'm glad the first case is doing as it should, apologies for not spotting that & thanks for the hint on getting it to trigger the warning as that is what I now want to achieve as a test.
I'll start reviewing again now.
I see there are some merge conflicts to resolve at some point, but I'll save my lines used in testing so I can just repeat them afterwards if any significant changes need to be made towards that.
There was a problem hiding this comment.
RE 1) from my previous comment: I can confirm also that it works as it should, with the warning emerging when values sit outside the valid min, max or range, e.g (where the code also runs as in my previous comment):
In [27]: e[0].data[0,0,0,0]
Out[27]: <Data(1, 1, 1, 1): [[[[11.127561569213867]]]] >
In [28]: e[0].data[0,0,0,0] = cfdm.Data([[[[100000000000000]]]])
In [29]: e[0].data[0,0,0,0]
Out[29]: <Data(1, 1, 1, 1): [[[[100000000376832.0]]]] >
In [30]: cfdm.write(e, 'testing-write.nc', warn_valid=True)
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96)) > has data values written to testing-write.nc that are strictly greater than the valid maximum defined by the valid_max property: 11.552237510681152. Set warn_valid=False to remove warning.
In [31]: cfdm.write(e, 'testing-write.nc', warn_valid=False) (There was no text output for the final line.)
[Edit: also to anyone reading this note the assignment to 100000... value has produced a different number on the field due to the precision & data type of that data. That is the correct behaviour etc. Note (as demonstrated by David):
In [23]: e.dtype=floatIn [24]: e[0,0,0,0]=100000000000000In [25]: e.data[0,0,0,0]
Out[25]: <CF Data(1, 1, 1, 1): [[[[100000000000000.0]]]]>
New
5:20
In [27]: e.dtype='float32'
...:
...: In [28]: e.data[0,0,0,0]
...:
Out[28]: <CF Data(1, 1, 1, 1): [[[[100000000376832.0]]]]>]]) The warning message (in the Input [30] case) is very clear & accurate 👍
But RE 2), I am now seeing that the file doesn't actually get written out as I believe it should be (& if it should not be, there is no message as such to warn the user but I don't get that):
In [2]: e = cfdm.read('~/cf-test-datasets/ecm-e40_1deg-model-levs_2008090706_all
...: .nc', warn_valid=False)
In [3]: e = cfdm.read('~/cf-test-datasets/ecm-e40_1deg-model-levs_2008090706_all
...: .nc', warn_valid=True)
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude_1(96)) > has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude_1(72), ncdim%longitude(96)) > has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=Temperature(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude(96)) K> has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=U velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude_1(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=V velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude_1(72), ncdim%longitude(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
In [4]: cfdm.write(e, 'testing-write-2.nc', warn_valid=False)
In [5]: e = cfdm.read('~/cf-test-datasets/testing-write-2.nc', warn_valid=True)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-5-e602085a028e> in <module>
----> 1 e = cfdm.read('~/cf-test-datasets/testing-write-2.nc', warn_valid=True)
~/cfdm/cfdm/read_write/read.py in read(filename, external, extra, verbose, warnings, warn_valid, mask, _implementation)
245
246 if not os.path.isfile(filename):
--> 247 raise IOError("Can't read non-existent file {}".format(filename))
248
249 # ----------------------------------------------------------------
OSError: Can't read non-existent file /home/vagrant/cf-test-datasets/testing-write-2.nc
In [6]: exit()where checking my filesystem such a file has not been created:
$ ls ~/cf-test-datasets/testing*
/home/vagrant/cf-test-datasets/testing-write.nc(The one listed is from previous work, note the -2 in the name.) So I think the logic for the removal of the file the overwrite case, & messaging about it to the user, needs adjusting.
Otherwise looks like the functionality is there. I will check the new documentation & tests next.
|
Actually since you may be working on other things, I'll go in & see if I can work out what needs to be tweaked... I will share a |
|
Hmm. Works for me. Perhaps I hadn't pushed up the relevant changes (sorry)? Try 6f229e5 ... |
Strange, I have checked out your updated branch & it the written file has still gone AWOL. If I run it with |
|
Aha! It is writing them, but to the current directory (where I am running the iPython shell), rather than into |
sadielbartholomew
left a comment
There was a problem hiding this comment.
Some further functionality testing this evening confirms the appropriate warning message is provided in (only) the applicable cases as agreed upon. The new parameters are well documented in both read & write so the behaviour should be clear enough for users. And the tests each look rational & taken all together should be sufficient.
Great work. This is good to merge now (there is one potential minor formatting error from a possible typo but I can fix that in a subsequent commit so I can do the release shortly).
|
Oops, I've just observed there are some conflicts to resolve before a merge is possible. I shall try to resolve them now... |
|
Conflicts resolved. All resolutions were obvious & most conflicts were caused by my relatively recent addition of the in-place operations decorators. |
Fixes #30