Skip to content

What to do about str columns in SolutionArray #896

@ischoegl

Description

@ischoegl

Problem description

The current implementation of SolutionArray allows for extra columns containing str (see discussion in #838), which may or may not be an intended use case: built-in attributes are almost exclusively numeric, with the notable exception of state_of_matter. Other implementations did not anticipate non-numeric data, e.g. CSV import and HDF export/import, so str support either needs to be deprecated/disabled or consistently implemented.

One interpretation is that str input was never intended, but also not explicitly checked for. I.e. it just happens to be supported by numpy (similar to sequences not being checked, see #895). In this case, the most appropriate resolution may be to catch and deprecate str columns, while warning that it is not fully supported for CSV/HDF.

Steps to reproduce

In [1]: import cantera as ct
    ...: gas = ct.Solution('h2o2.yaml')
    ...: arr = ct.SolutionArray(gas, 3, extra={'spam': 'eggs'})
    ...: 

In [2]: arr._extra
Out[2]: OrderedDict([('spam', ['eggs', 'eggs', 'eggs'])])

In [3]: arr.write_csv('test.csv')

In [3]: !cat test.csv
spam,T,density,Y_H2,Y_H,Y_O,Y_O2,Y_OH,Y_H2O,Y_HO2,Y_H2O2,Y_AR
eggs,300.0,0.08189392763801234,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
eggs,300.0,0.08189392763801234,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
eggs,300.0,0.08189392763801234,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0

In [4]: arr2 = ct.SolutionArray(gas, extra={'spam'})

In [4]: arr2.read_csv('test.csv')

In [5]: arr2._extra
Out[5]: {'spam': array([ nan,  nan,  nan])}

In [6]: arr.write_hdf('test.h5')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-d708cd336db5> in <module>()
----> 1 arr.write_hdf('test.h5')

/usr/local/lib/python3.6/dist-packages/cantera/composite.py in write_hdf(self, filename, cols, group, subgroup, attrs, mode, append, compression, compression_opts, *args, **kwargs)
   1117                 dgroup.attrs[key] = val
   1118             for header, col in data.items():
-> 1119                 dgroup.create_dataset(header, data=col, **hdf_kwargs)
   1120 
   1121         return group

/home/docker/.local/lib/python3.6/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    134 
    135         with phil:
--> 136             dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
    137             dset = dataset.Dataset(dsid)
    138             if name is not None:

/home/docker/.local/lib/python3.6/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl)
    116         else:
    117             dtype = numpy.dtype(dtype)
--> 118         tid = h5t.py_create(dtype, logical=1)
    119 
    120     # Legacy

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

TypeError: No conversion path for dtype: dtype('<U4')

In [7]: arr.spam
Out[7]: 
array(['eggs', 'eggs', 'eggs'],
      dtype='<U4')

Behavior

The pre-existing write_csv supports str, the newly introduced read_csv fails to import it, and h5py has issues.

I am filing this as a bug report as this presumably needs to be fixed prior to the release of 2.5.

System information

  • Cantera version: 2.5.a4
  • OS: Ubuntu 18.04
  • Python 3.6

Additional context

#838, #895

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions