-
-
Notifications
You must be signed in to change notification settings - Fork 409
Description
Problem description
The current implementation of SolutionArray allows for extra columns containing str (see discussion in #838), which may or may not be an intended use case: built-in attributes are almost exclusively numeric, with the notable exception of state_of_matter. Other implementations did not anticipate non-numeric data, e.g. CSV import and HDF export/import, so str support either needs to be deprecated/disabled or consistently implemented.
One interpretation is that str input was never intended, but also not explicitly checked for. I.e. it just happens to be supported by numpy (similar to sequences not being checked, see #895). In this case, the most appropriate resolution may be to catch and deprecate str columns, while warning that it is not fully supported for CSV/HDF.
Steps to reproduce
In [1]: import cantera as ct
...: gas = ct.Solution('h2o2.yaml')
...: arr = ct.SolutionArray(gas, 3, extra={'spam': 'eggs'})
...:
In [2]: arr._extra
Out[2]: OrderedDict([('spam', ['eggs', 'eggs', 'eggs'])])
In [3]: arr.write_csv('test.csv')
In [3]: !cat test.csv
spam,T,density,Y_H2,Y_H,Y_O,Y_O2,Y_OH,Y_H2O,Y_HO2,Y_H2O2,Y_AR
eggs,300.0,0.08189392763801234,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
eggs,300.0,0.08189392763801234,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
eggs,300.0,0.08189392763801234,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
In [4]: arr2 = ct.SolutionArray(gas, extra={'spam'})
In [4]: arr2.read_csv('test.csv')
In [5]: arr2._extra
Out[5]: {'spam': array([ nan, nan, nan])}
In [6]: arr.write_hdf('test.h5')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-26-d708cd336db5> in <module>()
----> 1 arr.write_hdf('test.h5')
/usr/local/lib/python3.6/dist-packages/cantera/composite.py in write_hdf(self, filename, cols, group, subgroup, attrs, mode, append, compression, compression_opts, *args, **kwargs)
1117 dgroup.attrs[key] = val
1118 for header, col in data.items():
-> 1119 dgroup.create_dataset(header, data=col, **hdf_kwargs)
1120
1121 return group
/home/docker/.local/lib/python3.6/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
134
135 with phil:
--> 136 dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
137 dset = dataset.Dataset(dsid)
138 if name is not None:
/home/docker/.local/lib/python3.6/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl)
116 else:
117 dtype = numpy.dtype(dtype)
--> 118 tid = h5t.py_create(dtype, logical=1)
119
120 # Legacy
h5py/h5t.pyx in h5py.h5t.py_create()
h5py/h5t.pyx in h5py.h5t.py_create()
h5py/h5t.pyx in h5py.h5t.py_create()
TypeError: No conversion path for dtype: dtype('<U4')
In [7]: arr.spam
Out[7]:
array(['eggs', 'eggs', 'eggs'],
dtype='<U4')
Behavior
The pre-existing write_csv supports str, the newly introduced read_csv fails to import it, and h5py has issues.
I am filing this as a bug report as this presumably needs to be fixed prior to the release of 2.5.
System information
- Cantera version: 2.5.a4
- OS: Ubuntu 18.04
- Python 3.6
Additional context