Skip to content

Warnings when writing valid_[min|max|range] properties#29

Merged
sadielbartholomew merged 21 commits intoNCAS-CMS:masterfrom
davidhassell:valid
Apr 29, 2020
Merged

Warnings when writing valid_[min|max|range] properties#29
sadielbartholomew merged 21 commits intoNCAS-CMS:masterfrom
davidhassell:valid

Conversation

@davidhassell
Copy link
Copy Markdown
Contributor

@davidhassell davidhassell commented Apr 27, 2020

Fixes #30

@davidhassell
Copy link
Copy Markdown
Contributor Author

Hi @sadielbartholomew It'd be great if you could review this PR. We discussed have cfdm.write raise an exception, rather than printing a warning, but I've had second thoughts. I think that it goes against the grain to fail on something that is, after all, CF-compliant.

@davidhassell davidhassell changed the title Warnings when writing valid_[min|max|range] properties Warnings when writing valid_[min|max|range] properties. Fixes #30 Apr 27, 2020
@davidhassell
Copy link
Copy Markdown
Contributor Author

Fixes #30

@sadielbartholomew sadielbartholomew self-requested a review April 27, 2020 15:09
@davidhassell davidhassell marked this pull request as ready for review April 27, 2020 15:11
@sadielbartholomew
Copy link
Copy Markdown
Member

It'd be great if you could review this PR.

Sure. I've assigned myself so I can get notified automatically when it is ready for review.

We discussed have cfdm.write raise an exception, rather than printing a warning, but I've had second thoughts. I think that it goes against the grain to fail on something that is, after all, CF-compliant.

Fair enough, that does sound sensible. A warning is indeed sufficient & if users don't read it & heed it then that is their problem!

@sadielbartholomew
Copy link
Copy Markdown
Member

(Holding on reviewing for the moment as it seems from external discussion we might want to adjust the approach.)

Copy link
Copy Markdown
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review feedback: the read behaviour seems to work as agreed when setting warn_valid as True & False for datasets with & without valid_* properties on fields. Good stuff.

But there are two major issues in the write I have observed, at least unless I missing something which is certainly possible:

1) The write not raising the warning when specified: as seen by reading in a field with demonstrated valid_* properties & writing it straight out to a new file (& reading back in once more just to double check the valid_* properties are still there as they should be):

In [57]: e = cfdm.read('~/cf-test-datasets/ecm-e40_1deg-model-levs_2008090706_al
    ...: l.nc', warn_valid=True)                                                
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude_1(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude_1(72), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Temperature(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude(96)) K> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=U velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude_1(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=V velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude_1(72), ncdim%longitude(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.

In [58]: for field in e: 
    ...:     print("min", field.get_property('valid_min')) 
    ...:     print("max", field.get_property('valid_max')) 
    ...:     try: 
    ...:         print("range", field.get_property('valid_range')) 
    ...:     except: 
    ...:         pass                                                                 
min 10.867776
max 11.5522375
min 10.870843
max 11.552274
min 10.879355
max 11.552163
min 183.56966
max 313.26398
min -52.872753
max 145.06691
min -52.22291
max 59.72735

In [59]: cfdm.write(e, 'testing-write.nc', warn_valid=True)                     

In [60]: e = cfdm.read('/home/vagrant/cf-test-datasets/testing-write.nc', warn_v
    ...: alid=True)                                                             
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
...

2) Two separate errors for setting each Boolean (UnboundLocalError, KeyError) (see tracebacks in report below) when a write is executed to overwrite an existing field. I can look at the code to narrow this down but having written it you may be able to spot what is happening? Either way, further investigation is needed into the conditions this gets raised in & why:

In [9]: e = cfdm.read('~/cf-test-datasets/ecm-e40_1deg-model-levs_2008090706_all
   ...: .nc', warn_valid=True)                                                  
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude_1(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude_1(72), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Temperature(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude(96)) K> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=U velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude_1(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=V velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude_1(72), ncdim%longitude(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.          

In [10]: cfdm.write(e, 'check-write-no-kwarg.nc', warn_valid=False)             

In [11]: f = cfdm.read('~/cf-test-datasets/check-write-no-kwarg.nc', warn_valid=
    ...: True)                                                                  
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude_1(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude_1(72), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=Temperature(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude(96)) K> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=U velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude_1(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.
WARNING: <Field: long_name=V velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude_1(72), ncdim%longitude(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to remove warning.

In [12]: cfdm.write(f, 'check-write-no-kwarg.nc', warn_valid=False)             
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-13-95b0eaf1d364> in <module>
----> 1 cfdm.write(f, 'check-write-no-kwarg.nc', warn_valid=False)

~/cfdm/cfdm/read_write/write.py in write(fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, string, verbose, warn_valid, _implementation)
    406                      shuffle=shuffle, fletcher32=fletcher32,
    407                      string=string, verbose=verbose,
--> 408                      warn_valid=warn_valid, extra_write_vars=None)

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in write(self, fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, scalar, string, extra_write_vars, verbose, warn_valid)
   3841         # ------------------------------------------------------------
   3842         for f in fields:
-> 3843             self._write_field(f)
   3844 
   3845         # ------------------------------------------------------------

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_field(self, f, add_to_seen, allow_data_insert_dimension)
   2618                     # the dimension coordinate to the file as a
   2619                     # coordinate variable.
-> 2620                     ncvar = self._write_dimension_coordinate(f, key, dim_coord)
   2621                 else:
   2622                     # The data array does not span this axis (and

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_dimension_coordinate(self, f, key, coord)
    621             # Create a new dimension coordinate variable
    622             self._write_netcdf_variable(ncvar, ncdimensions, coord,
--> 623                                         extra=extra)
    624         else:
    625             ncvar        = seen[id(coord)]['ncvar']

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_netcdf_variable(self, ncvar, ncdimensions, cfvar, omit, extra, fill, data_variable)
   2209                              unset_values=unset_values,
   2210                              compressed=compressed,
-> 2211                              attributes=attributes)
   2212 
   2213         # Update the 'seen' dictionary

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_data(self, data, cfvar, ncvar, ncdimensions, unset_values, compressed, attributes)
   2302         else:
   2303             # Get the data as an uncompressed numpy array
-> 2304             array = self.implementation.get_array(data)
   2305 
   2306         # Convert data type

~/cfdm/cfdm/cfdmimplementation.py in get_array(self, data)
    286 
    287         '''
--> 288         return data.array
    289 
    290     def get_auxiliary_coordinates(self, field, axes=[], exact=False):

~/cfdm/cfdm/core/data/data.py in array(self)
    138 
    139         '''
--> 140         array = self._get_Array().array
    141 
    142         # Set the numpy array fill value

~/cfdm/cfdm/data/netcdfarray.py in array(self)
    425 
    426         '''
--> 427         return self[...]
    428 
    429     def open(self):

~/cfdm/cfdm/data/netcdfarray.py in __getitem__(self, indices)
    120         if ncvar is not None:
    121             # Get the variable by netCDF name
--> 122             variable = netcdf.variables[ncvar]
    123 #            print (mask, variable.mask)
    124             variable.set_auto_mask(mask)

KeyError: 't'

[breaking this report up else the syntax highlighting breaks for some reason, but continuing form the above... ]

In [13]: cfdm.write(f, 'check-write-no-kwarg.nc', warn_valid=True)              
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-14-ba942d56478f> in <module>
----> 1 cfdm.write(f, 'check-write-no-kwarg.nc', warn_valid=True)

~/cfdm/cfdm/read_write/write.py in write(fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, string, verbose, warn_valid, _implementation)
    406                      shuffle=shuffle, fletcher32=fletcher32,
    407                      string=string, verbose=verbose,
--> 408                      warn_valid=warn_valid, extra_write_vars=None)

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in write(self, fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, scalar, string, extra_write_vars, verbose, warn_valid)
   3841         # ------------------------------------------------------------
   3842         for f in fields:
-> 3843             self._write_field(f)
   3844 
   3845         # ------------------------------------------------------------

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_field(self, f, add_to_seen, allow_data_insert_dimension)
   2618                     # the dimension coordinate to the file as a
   2619                     # coordinate variable.
-> 2620                     ncvar = self._write_dimension_coordinate(f, key, dim_coord)
   2621                 else:
   2622                     # The data array does not span this axis (and

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_dimension_coordinate(self, f, key, coord)
    621             # Create a new dimension coordinate variable
    622             self._write_netcdf_variable(ncvar, ncdimensions, coord,
--> 623                                         extra=extra)
    624         else:
    625             ncvar        = seen[id(coord)]['ncvar']

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_netcdf_variable(self, ncvar, ncdimensions, cfvar, omit, extra, fill, data_variable)
   2209                              unset_values=unset_values,
   2210                              compressed=compressed,
-> 2211                              attributes=attributes)
   2212 
   2213         # Update the 'seen' dictionary

~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_data(self, data, cfvar, ncvar, ncdimensions, unset_values, compressed, attributes)
   2302         else:
   2303             # Get the data as an uncompressed numpy array
-> 2304             array = self.implementation.get_array(data)
   2305 
   2306         # Convert data type

~/cfdm/cfdm/cfdmimplementation.py in get_array(self, data)
    286 
    287         '''
--> 288         return data.array
    289 
    290     def get_auxiliary_coordinates(self, field, axes=[], exact=False):

~/cfdm/cfdm/core/data/data.py in array(self)
    138 
    139         '''
--> 140         array = self._get_Array().array
    141 
    142         # Set the numpy array fill value

~/cfdm/cfdm/data/netcdfarray.py in array(self)
    425 
    426         '''
--> 427         return self[...]
    428 
    429     def open(self):

~/cfdm/cfdm/data/netcdfarray.py in __getitem__(self, indices)
    111 
    112         '''
--> 113         netcdf = self.open()
    114 
    115 #        indices = tuple(self.parse_indices(indices))

~/cfdm/cfdm/data/netcdfarray.py in open(self)
    450             self._set_component('netcdf', netcdf, copy=False)
    451 
--> 452         return netcdf
    453 
    454     def to_memory(self):

UnboundLocalError: local variable 'netcdf' referenced before assignment

@davidhassell
Copy link
Copy Markdown
Contributor Author

Further to offline discussions:

1) I think is OK, as the data being written is within the valid range. Assigning a large (+ve or -ve) number to the data array prior to writing triggers the warning for me.

2) has been hopefully been fixed by c112ed2 and 7643d4f

@sadielbartholomew Can you confirm? Thanks

Copy link
Copy Markdown
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent thanks for investigating, @davidhassell. I'm glad the first case is doing as it should, apologies for not spotting that & thanks for the hint on getting it to trigger the warning as that is what I now want to achieve as a test.

I'll start reviewing again now.

I see there are some merge conflicts to resolve at some point, but I'll save my lines used in testing so I can just repeat them afterwards if any significant changes need to be made towards that.

Copy link
Copy Markdown
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE 1) from my previous comment: I can confirm also that it works as it should, with the warning emerging when values sit outside the valid min, max or range, e.g (where the code also runs as in my previous comment):

In [27]: e[0].data[0,0,0,0]                                                     
Out[27]: <Data(1, 1, 1, 1): [[[[11.127561569213867]]]]  >

In [28]: e[0].data[0,0,0,0] = cfdm.Data([[[[100000000000000]]]])                

In [29]: e[0].data[0,0,0,0]                                                     
Out[29]: <Data(1, 1, 1, 1): [[[[100000000376832.0]]]]  >

In [30]: cfdm.write(e, 'testing-write.nc', warn_valid=True)                     
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96))  > has data values written to testing-write.nc that are strictly greater than the valid maximum defined by the valid_max property: 11.552237510681152. Set warn_valid=False to remove warning.

In [31]: cfdm.write(e, 'testing-write.nc', warn_valid=False)                    

(There was no text output for the final line.)

[Edit: also to anyone reading this note the assignment to 100000... value has produced a different number on the field due to the precision & data type of that data. That is the correct behaviour etc. Note (as demonstrated by David):

In [23]: e.dtype=floatIn [24]: e[0,0,0,0]=100000000000000In [25]: e.data[0,0,0,0]
Out[25]: <CF Data(1, 1, 1, 1): [[[[100000000000000.0]]]]>

New
5:20

In [27]: e.dtype='float32'
    ...: 
    ...: In [28]: e.data[0,0,0,0]
    ...: 
Out[28]: <CF Data(1, 1, 1, 1): [[[[100000000376832.0]]]]>

]]) The warning message (in the Input [30] case) is very clear & accurate 👍

But RE 2), I am now seeing that the file doesn't actually get written out as I believe it should be (& if it should not be, there is no message as such to warn the user but I don't get that):

In [2]: e = cfdm.read('~/cf-test-datasets/ecm-e40_1deg-model-levs_2008090706_all
   ...: .nc', warn_valid=False)                                                 

In [3]: e = cfdm.read('~/cf-test-datasets/ecm-e40_1deg-model-levs_2008090706_all
   ...: .nc', warn_valid=True)                                                  
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude(73), ncdim%longitude_1(96))  > has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=Logarithm of surface pressure(ncdim%t(1), ncdim%hybrid_1(1), ncdim%latitude_1(72), ncdim%longitude(96))  > has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=Temperature(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude(96)) K> has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=U velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude(73), ncdim%longitude_1(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to suppress warning.
WARNING: <Field: long_name=V velocity(ncdim%t(1), ncdim%hybrid(60), ncdim%latitude_1(72), ncdim%longitude(96)) m s**-1> has valid_max, valid_min properties. Set warn_valid=False to suppress warning.

In [4]: cfdm.write(e, 'testing-write-2.nc', warn_valid=False)                   

In [5]: e = cfdm.read('~/cf-test-datasets/testing-write-2.nc', warn_valid=True) 
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-5-e602085a028e> in <module>
----> 1 e = cfdm.read('~/cf-test-datasets/testing-write-2.nc', warn_valid=True)

~/cfdm/cfdm/read_write/read.py in read(filename, external, extra, verbose, warnings, warn_valid, mask, _implementation)
    245 
    246     if not os.path.isfile(filename):
--> 247         raise IOError("Can't read non-existent file {}".format(filename))
    248 
    249     # ----------------------------------------------------------------

OSError: Can't read non-existent file /home/vagrant/cf-test-datasets/testing-write-2.nc

In [6]: exit()

where checking my filesystem such a file has not been created:

$ ls ~/cf-test-datasets/testing*
/home/vagrant/cf-test-datasets/testing-write.nc

(The one listed is from previous work, note the -2 in the name.) So I think the logic for the removal of the file the overwrite case, & messaging about it to the user, needs adjusting.

Otherwise looks like the functionality is there. I will check the new documentation & tests next.

@sadielbartholomew
Copy link
Copy Markdown
Member

Actually since you may be working on other things, I'll go in & see if I can work out what needs to be tweaked... I will share a diff if so.

@davidhassell
Copy link
Copy Markdown
Contributor Author

Hmm. Works for me. Perhaps I hadn't pushed up the relevant changes (sorry)? Try 6f229e5 ...

@sadielbartholomew
Copy link
Copy Markdown
Member

Hmm. Works for me. Perhaps I hadn't pushed up the relevant changes (sorry)? Try 6f229e5 ...

Strange, I have checked out your updated branch & it the written file has still gone AWOL. If I run it with verbose=True it shows it is being written fine. Maybe I'm doing something silly, let me take a closer look at the lines I am using...

@sadielbartholomew
Copy link
Copy Markdown
Member

sadielbartholomew commented Apr 29, 2020

Aha! It is writing them, but to the current directory (where I am running the iPython shell), rather than into ~/cf-test-datasets/, which is where it was writing them before. Actually, the current write location seems the correct one. I wonder why it was writing to the latter previously in this PR? Oh of course, it is because I was running my iPython shell from there last time! So it is me being silly after all. Works fine, sorry about that!

Copy link
Copy Markdown
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further functionality testing this evening confirms the appropriate warning message is provided in (only) the applicable cases as agreed upon. The new parameters are well documented in both read & write so the behaviour should be clear enough for users. And the tests each look rational & taken all together should be sufficient.

Great work. This is good to merge now (there is one potential minor formatting error from a possible typo but I can fix that in a subsequent commit so I can do the release shortly).

@sadielbartholomew
Copy link
Copy Markdown
Member

sadielbartholomew commented Apr 29, 2020

Oops, I've just observed there are some conflicts to resolve before a merge is possible. I shall try to resolve them now...

@sadielbartholomew
Copy link
Copy Markdown
Member

Conflicts resolved. All resolutions were obvious & most conflicts were caused by my relatively recent addition of the in-place operations decorators.

@sadielbartholomew sadielbartholomew merged commit f63f08b into NCAS-CMS:master Apr 29, 2020
@davidhassell davidhassell deleted the valid branch November 14, 2022 09:11
@davidhassell davidhassell added dataset write Relating to writing datasets dataset read Relating to reading datasets labels Nov 15, 2023
@davidhassell davidhassell changed the title Warnings when writing valid_[min|max|range] properties. Fixes #30 Warnings when writing valid_[min|max|range] properties Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset read Relating to reading datasets dataset write Relating to writing datasets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue a warning if reading and writing datasets that have valid_[min|max|range] attributes

2 participants