Enable serialization of Cantera objects#984
Conversation
Codecov Report
@@ Coverage Diff @@
## main #984 +/- ##
==========================================
+ Coverage 70.66% 72.39% +1.73%
==========================================
Files 361 364 +3
Lines 44522 46433 +1911
==========================================
+ Hits 31460 33616 +2156
+ Misses 13062 12817 -245
Continue to review full report at Codecov.
|
599e499 to
86435ab
Compare
004decb to
ad1f1a2
Compare
There was a problem hiding this comment.
@speth ... as indicated earlier, it is great that this is finally close to becoming reality (:tada:). I haven't looked at a very detailed level (I assume that the unit test suite is able to ensure that the emitted YAML code can be re-imported without issues), but do have some minor comments.
interfaces/cython/cantera/utils.pyx
Outdated
|
|
||
| cdef public PyObject* pyCanteraError = <PyObject*>CanteraError | ||
|
|
||
| cdef anyvalueToPython(string name, CxxAnyValue& v): |
There was a problem hiding this comment.
names in utils.pyx are not very 'pythonic' (although it's not exposed to Python) - anyvalue_to_python would be (same for anymap_to_dict below - would suggest dict as this is what it returns)
interfaces/cython/cantera/utils.pyx
Outdated
| return {key: value for (_, key, value) in py_items} | ||
|
|
||
|
|
||
| cdef mergeAnyMap(CxxAnyMap& primary, CxxAnyMap& extra): |
There was a problem hiding this comment.
Would it make sense to handle that within the C++ layer? Also, as a comment, the anymapToPython (or anymap_to_dict) applies units whereas mergeAnyMap does not.
Regarding my other comment about getParameters, a C++ implementation of merge would go in hand with having unique_ptr<AnyMap> parameters() const returned throughout, which then would mean that you can AnyMap::merge(const AnyMap& other) (or similar).
I assume that there may be a small performance penalty for smart pointers (and potentially a larger one if merging requires copying), but it would be easier to track where objects are initially created (while this may be subjective, it is often difficult to track how things are assembled when the receiving function creates the object which is subsequently added to as in your getParameters approach).
There was a problem hiding this comment.
while my initial comment is mostly moot, I believe this function is mainly used to merge output from getParameters and input? I believe that it would be nice to merge this by passing an optional flag to getParameters where it applies (and handle the merge in C++). This may eliminate the need for this specific Cython function ...
There was a problem hiding this comment.
The initial reason for this came from the desire to control the order of the output keys. I want the user-defined keys in input to generally come last, and the ordering mechanism mostly relies on preserving insertion order. If the base Reaction class were responsible for adding in the user-defined keys, then they would come before any keys added by child classes.
I actually ended up finding a hybrid solution that both gets rid of the need to merge the fields defining object with the extra input data at the point of use, and avoids the requirement for the user-accessible function to take an AnyMap as it's input, by making the virtual getParameters method protected, and introducing a non-virtual method with the signature AnyMap parameters(bool withInput=true) on the base classes only. This way, the parameters method can initialize the AnyMap, use the virtual getParameters method to populate all of the relevant fields, and then add the extra keys last before returning it.
There was a problem hiding this comment.
@speth ... I rescinded some of my initial comments as I believe your implementation is more efficient (despite being harder to track).
Beyond, I think adding an optional flag to getParameters to add input where it applies (and handling a merge within C++) may be more consistent and easier long-term. I'm also not sure about getSpeciesParameters, as it seems somewhat superfluous. Spot-wise benchmark tests that illustrate that a re-imported solution preserves results may add another level of security.
PS: I believe it would make sense to finalize this before #995 (and rebase the latter, as this would allow me to port getParameters immediately, rather than requiring another go-around).
PPS: Adding this capability to the extract_submechanism.py example would be neat.
| self.assertEqual(len(generated['Pt_surf-reactions']), surf.n_reactions) | ||
| self.assertEqual(len(generated['species']), surf.n_total_species) | ||
|
|
||
|
|
There was a problem hiding this comment.
It may make sense to expand tests some, e.g. generate yaml and benchmark test problems for both original and generated yaml input (along the lines of test_convert for CTI/XML->YAML). Including larger mechanisms (Reitz?) may be interesting also.
Individual objects are well covered by C++ tests, so this is mainly testing the YamlWriter's front end, correct?
There was a problem hiding this comment.
Yes, these tests were mainly just focused on the YamlWriter Python interface, but I did add a couple of additional checks here to show that whole Solution is being restored in a consistent state. I think the consistency of all the different phase and reaction types is reasonably-well covered by the ThermoYamlRoundTrip test suite (in test/thermo/thermoToYaml.cpp), and the checks done in the ReactionToYaml test suite (in test/kinetics/kineticsFromYaml.cpp) by the compareReactions function.
test/general/test_serialization.cpp
Outdated
| {"quantity", "mol"}, | ||
| {"length", "cm"} | ||
| }); | ||
| // Should fail because pre-exponential factors from XML can't be converted |
There was a problem hiding this comment.
Not sure that I am following based on what this comment states - while XML may have limitations for import, shouldn't this become moot once data are read and available to the C++ objects? (at which point everything should be standardized, i.e. everything is known implicitly)
There was a problem hiding this comment.
This is definitely a thorny issue. Because reactions are instantiated from XML without having a Kinetics object available, there is no way to determine the dimensions of the rate constant. All you know is that the value itself has already been converted to Cantera's mks+kmol system. I guess you could set the value of Reaction.rate_units when calling Kinetics.addReaction, which would resolve this for what will presumably be the most common use case, serializing a complete phase definition. But a bare Reaction object created from XML does not contain the information needed to serialize it to a non-default unit system.
There was a problem hiding this comment.
From what I understand, the reaction order provides most of the missing information, so the main issues are non-standard units as well as volumetric vs surface geometry. Am I seeing this correctly? Regardless, I believe export should work if associated kinetics objects are defined (as you mention, this can be done implicitly and is likely the most common use case); this would leave an exception if the reaction isn’t fully set up.
There was a problem hiding this comment.
Correct, the units are determined by the dimensionality of the various phases, plus the units of the standard concentration for each phase. And while kmol/m^3 or kmol/m^2 are probably the most common, there are several phases where the standard concentration is dimensionless (see the various implementations of standardConcentrationUnits).
There was a problem hiding this comment.
Thanks for the clarification; I believe I’m on the same page now - units do pop up in #995.
Regarding the XML conversion, I still believe that implicitly setting units would make sense.
There was a problem hiding this comment.
I realized while thinking about this that CTI/XML isn't the only case where you can manage to create a reaction without needing to set it's rate units -- this is also true if you just create a Reaction object from scratch. Setting Reaction.rate_units now occurs as part of Kinetics.addReaction, if it doesn't get set before then. I also ended up refactoring up the rateCoeffUnits function and making it a member function of the Reaction class (as calculateRateCoeffUnits).
There was a problem hiding this comment.
That’s what had been my suspicion all along. If you look at #995, I have new constructors for ReactionRate from AnyMap where this will become relevant. Obviously things need to be consolidated as I didn’t have rate_units at my disposal before.
| "Multiple species with different definitions are not " | ||
| "supported:\n>>>>>>\n{}\n======\n{}\n<<<<<<\n", | ||
| speciesDef.toYamlString(), | ||
| speciesDefs[speciesDefIndex[name]].toYamlString()); |
There was a problem hiding this comment.
As YamlWriter is central to serialization, it may make sense to add a unit test for special cases (here and elsewhere as marked by CodeCov).
| out << valueStr; | ||
| width += name.size() + valueStr.size() + 4; | ||
| } else { | ||
| // Put items of an unknown (compound) type on a line alone |
There was a problem hiding this comment.
As AnyMap is central to serialization, it may make sense to add a unit test for special cases (here and elsewhere as marked by CodeCov).
interfaces/cython/cantera/utils.pyx
Outdated
| return {key: value for (_, key, value) in py_items} | ||
|
|
||
|
|
||
| cdef mergeAnyMap(CxxAnyMap& primary, CxxAnyMap& extra): |
There was a problem hiding this comment.
while my initial comment is mostly moot, I believe this function is mainly used to merge output from getParameters and input? I believe that it would be nice to merge this by passing an optional flag to getParameters where it applies (and handle the merge in C++). This may eliminate the need for this specific Cython function ...
Currently only handles thermo, not kinetics or transport
The default order is the order in which items are added. Items originating from an input file come after programmatically added items, and retain their existing ordering.
Eliminates the 'mergeAnyMap' function, and introduces a 'parameters' method for classes to return an AnyMap which can optionally contain the user-provided input data, rather than needing to create the AnyMap in advance and add the user-created fields separately.
2f5ecea to
c8aea00
Compare
|
I think I'm happy with the code coverage provided by the tests at this point. |
|
@speth ... thanks for going into more detail with test coverage. As mentioned above, this looks good to go. |
|
@ischoegl Feel free to hit the merge button 😄 |
|
🎉 |
Changes proposed in this pull request
getParameters(AnyMap&)methods to ThermoPhase, Kinetics, Reaction, etc. classes to collect data needed for serializationinput_dataproperty to the corresponding Python objects, which returns this information converted to native Python types, e.g.dicts andlistsYamlWriterclass for creating input files defining phases, species, and reactions. Generated output files can contain multiple phase definitions and reaction sections, and have use user-specified unit systems.Examples
Python
input_dataproperty:YamlWriterclass in C++:From a Python
SolutionorInterface:Remaining issues
_SolutionBase.input_dataproperty and_SolutionBase.write_yamlmethod with theSolutionclass. I tried adding a:members: input_data, write_yamldirective inimporting.rst, but it didn't seem to do anything.Potential future improvements
--extraoption ofck2yaml. This probably requires adding a converter for Python data structures toAnyMap.precisionoption.Addresses Cantera/enhancements#11.
Checklist
scons build&scons test) and unit tests address code coverage