Overload to Phase::speciesIndex() to take an atomic composition map of the species#635
Overload to Phase::speciesIndex() to take an atomic composition map of the species#635ThanasisMattas wants to merge 1 commit intoCantera:mainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #635 +/- ##
==========================================
+ Coverage 71.40% 71.58% +0.18%
==========================================
Files 372 372
Lines 43605 44520 +915
==========================================
+ Hits 31134 31870 +736
- Misses 12471 12650 +179
Continue to review full report at Codecov.
|
|
Thanks for this contribution! Two questions:
Thanks again. |
|
Hello! Thank you for the quick response.
|
|
I think the issue of isomers is quite important, and a case where multiple species match a composition should not pass quietly. Intuitively a warning should be logged, so a user is aware of what is going on. Based on that, additional arguments can resolve this. |
|
Ok, this is quite reasonable. So, what about: a search whether there are other species with the input atomic composition map and, if true:
|
|
I did write a function looking for isomers some time ago in Python, and decided to returned a list. The C++ core is very different, and having alternative return types for overloaded functions may not be as straight-forward. |
|
The intented return value of the method is the index of the species at: With the current implementation, I think that the only way to be sure that one is accessing the right isomer is by the correct name. A different characteristic attribute to use to access is case specific, I guess. The approach that I can think of is a simple log message that prints the isomers (if the specific species has an isomer at the phase) and the exact names that one can access them by name and a prompt to do so, if the first encountered is not the intented. Any ideas are wellcome! I haven't use cantera that much, so is isomers in a single phase a common situation? |
|
From an application point of view, isomers are not uncommon, especially with large mechanisms. I agree that accessing species via composition is not uniquely defined (as in the one-to-one mapping of species names and indices). I am not as familiar with the cantera core, but believe that there may be an argument for introducing a separate function rather than overloading (I'd be interested in @speth's opinion). On which on Python is handled via |
|
While isomers are not uncommon, using the current code base as a guide, the most common use case is to find the index of a single species that probably does not have any isomers (at least in the mechanism), so a function that just returns a single index would probably be the most useful. To provide some flexibility, I would suggest adding a second argument to the function which determines whether to just return the first match found, or to ensure uniqueness and raise an exception if multiple matches are found. I agree that issuing a warning is the least useful option, since it will likely go unnoticed in the case where the code is being run non-interactively. There is then a second question of whether adding a second function to return a list of all isomers is useful enough to include. This can be done already in the Python interface with a fairly straightforward list comprehension, e.g. [S for S in gas.species() if S.composition == {'C': n, 'H', m}] |
|
Thanks for the feadback. I have 2 questions:
|
|
Regarding #2, I think the best option is to rewrite the method with two
arguments, the second being a default argument (
https://en.cppreference.com/w/cpp/language/default_arguments) that uses the
current behavior (return first match) if left unchanged by the user. That
way you maintain backwards compatibly.
…On Sat, May 11, 2019, 5:00 PM Thanasis Mattas ***@***.***> wrote:
Thanks for the feadback. I have 2 questions:
1.
What would the 2nd argument be? Is something like "bool
checkForIsomer" reasonable? And at the method description stated that if
the user wants the 1st encountered, they may pass false, or if they want an
exception thrown in case of isomers, they may pass true. (And what could
they do then? Pass the name?)
2.
If a user is searching for a non-isomer in the phase, is it wise to
still have to use the two-argument method, or should there be a
single-argument like that in the 1st commit, too. I think that only the
two-argument should exist, in order to avoid a possible bug, but I would
like to hear you opinions.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCantera%2Fcantera%2Fpull%2F635%23issuecomment-491547030&data=02%7C01%7Cnicholas.curtis%40uconn.edu%7C3ac18ac489ae4e9135e408d6d65c1cc8%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636932088450013203&sdata=mJ%2BFVzt2XvH0KjPqeRzDkLrrRatzVQrOUr%2FUbqKhlgU%3D&reserved=0>,
or mute the thread
<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABRKDCJ3PLNE75AAXGAWWYDPU46YTANCNFSM4HKTQYUA&data=02%7C01%7Cnicholas.curtis%40uconn.edu%7C3ac18ac489ae4e9135e408d6d65c1cc8%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636932088450013203&sdata=%2FCViWH2VkOJj0MCFTxN3RycnQoD%2BuVo5SjidbzB7o9s%3D&reserved=0>
.
|
|
I routinely download large mechanisms (hundreds of species) where the first problem is to identify names used for specific fuel species (various research groups have non-uniform conventions for large hydrocarbons): the simplest way to do that is by their formula, which is where the issue of isomers (or missing species) comes up very frequently. Following up on @speth's convenient comprehension suggestion, my go-to python solution would be which however does lead to more fundamental questions:
|
|
@arghdos I will try that. @ischoegl Sorry if there is a misunderstanding. If indeed isomers in a single phase is a common problem, then a user has to search the correct name in the mechanism and then query the name. Consequently, a more strict approach with names should be evaluated. As a probably wrong idea, another approach may be to pass, along with the composition map, the name of a reaction that uniquely identifies a species (?). I agree that it is best for maintaining and performance to keep the core in C++, but this is not me to say! Again, I will try the default second argument, rasing an exception, if the user chooses so, to see how it works. |
|
My own two cents, here. Barring any unintended consequences, what I would appreciate as a user is:
If at least >1 isomer is found, throw an error and print out a list, one line per isomer, with:
<species name (as given in cti file)>, <species index>
Then the user can edit their code and just search by name for the actual intended species.
Steven
… On May 13, 2019, at 10:54 AM, Thanasis Mattas ***@***.***> wrote:
@arghdos <https://github.com/arghdos> I will try that.
@ischoegl <https://github.com/ischoegl>
By my understunding, the particular function searches through the species in a user defined phase, where a user includes the species from a eg .cti file. Is a user defined phase composed of a whole mechanism (hundreds of species)? If yes, then there is a problem. This far I can go with my experience. Namely, I think that the particular function does not address the problem of non-uniform conventions in a mechanism, because it does not search the .cti / .xml / YAML / whatever input file. It searches through these vector and map <#635 (comment)>, which refer to a particular phase. In the examples that I saw, phase are composed of not that much species draw upon a .cti file. Then again, I don't have much hands on experience to tell.
Sorry if there is a misunderstanding. If indeed isomers in a single phase is a common problem, then a user has to search the correct name in the mechanism and then query the name. Consequently, a more strict approach with names should be evaluated.
As a probably wrong idea, an other approach may be to pass, along with the composition map, the name of a reaction that uniquely identifies a species (?).
I agree that it is best for maintaining and performance to keep the core in C++, but this is not me to say!
Again, I will try the default second argument, rasing an exception, if the user chooses so, to see how it works.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#635?email_source=notifications&email_token=ABEC7PBCNRP6IEYKOJJQFATPVGMLLA5CNFSM4HKTQYUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVI5LFY#issuecomment-491902359>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABEC7PEHXHSUXAMCHVEEAR3PVGMLLANCNFSM4HKTQYUA>.
|
|
So, by default, Phase::speciesIndex() searches for isomers in the phase object. If at least one isomer to the query exists, an error is thrown, printing a list of the isomers. There is at least 50% chance that the first encountered is the wrong one, thus an error seems a reasonable responce. Then, the user can use the existing oveload and pass the correct name. Providing some flexibility, as suggested, a user can pass as a second argument (which defaults to true) a false expression, in order to get the first encountered, ignoring the case of isomers. |
|
This seems like the correct treatment, to me. Two questions:
1. Does/could it also print the species index, so that a user could access the species that way, if preferred?
2. Have you added anything to the test suite to verify correct operation?
Many thanks,
Steven
… On Jun 6, 2019, at 6:09 PM, Thanasis Mattas ***@***.***> wrote:
So, by default, Phase::speciesIndex() searches for isomers in the phase object. If at least one isomer to the query exists, an error is thrown, printing a list of the isomers. There is at least 50% chance that the first encountered is the wrong one, thus an error seems a reasonable responce. Then, the user can use the existing oveload and pass the correct name.
Providing some flexibility, as suggested, a user can pass as a second argument (which defaults to true) a false expression, in order to get the first encountered, ignoring the case of isomers.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#635?email_source=notifications&email_token=ABEC7PGBEUMI6FSGHTQGYYLPZGRNBA5CNFSM4HKTQYUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXEQLCA#issuecomment-499713416>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABEC7PH3JWT22OHNCNNV3EDPZGRNBANCNFSM4HKTQYUA>.
|
Since the iteration is done over the |
114a536 to
3bc16d7
Compare
|
@speth I edited the description, stating that if "bool checkForIsomers == false", the first "lexicographically" encountered isomer is returned.
Locally, I ran the following test: #include "cantera/thermo.h"
#include "cantera/IdealGasMix.h"
#include <iostream>
using namespace Cantera;
void checkSpeciesIndex()
{
IdealGasPhase gas("gri30_ion.cti","gas");
compositionMap HCNOcomp = gas.species("HCNO")->composition;
std::cout << gas.speciesIndex(HCNOcomp) << std::endl;
}
int main()
{
try {
checkSpeciesIndex();
} catch(std::exception& err) {
std::cout << err.what() << std::endl;
}
}Output: CanteraError thrown by Phase::speciesIndex(): |
3bc16d7 to
b38c044
Compare
b38c044 to
2496941
Compare
|
@ThanasisMattas ... thanks for your contribution so far. I have two suggestions that would imho improve this PR:
PS: @speth ... yes, it is possible to filter out isomers using {i: spc for i, spc in enumerate(gas.species()) if spc.composition == {'C': n, 'H', m}}but using gas.find_isomers({'C': n, 'H', m})to get the same dictionary would be quite a bit more convenient. As it is already necessary to track down isomers in the C++ core, why shouldn't it be broken out? |
|
@ischoegl Thanks for the suggestions. They seem reasonable. The 1st one is quite straightforward. For the 2nd one, I 'll get on it when I find a little time. |
|
@ThanasisMattas ... I ended up implementing the |
277291c to
7eb4cc2
Compare
|
@speth @ischoegl @decaluwe I've updated the implementation of cpdef int species_index(self, species, raise_if_isomers=True) except *:
"""
The index of species *species*, which may be specified as a string or
an integer. In the latter case, the index is checked for validity and
returned. If no such species is present, an exception is thrown.
"""
if isinstance(species, (str, bytes)):
index = self.thermo.speciesIndex(stringify(species))
elif isinstance(species, dict):
index = self.thermo.speciesIndex(
comp_map(species), bool(raise_if_isomers)
)
elif .... # There's more code here, but not relevantDoesn't seem good because that optional argument is used in only one of the branches. On top of that, I couldn't actually get it to work, the |
|
@bryanwweber ... intuitively, I would suggest to eliminate the Any other option would potentially lead to unexpected outcomes where root causes may be extremely hard to track down (e.g. set the override when using mechanism A as this has the desired outcome; when using the same code for mechanism B things get dicey). |
|
@ischoegl I disagree that the option should be removed, assuming we can come up with a decent interface. I think there are many use cases where returning a value would be appropriate. Since the default is to raise, anyone not aware would get the error and would have to opt into the other behavior. Moreover, a user would have the same problem with any kinds of operations involving species indices when moving between mechanisms, so I don't see how this is all that different. |
|
@bryanwweber ... no problem agreeing to disagree here. I believe the cleanest solution in case of an error raised by non-unique isomers would be to specify a species by its unique name. Picking a species based on a predefined position in a list appears somewhat arbitrary. I’ll rest my case at this point 😉 ... |
|
I'm trying to think of the use case (or I guess the user's ultimate intent) where getting just the first index found is acceptable. It essentially says "the species properties don't matter, I just want something with this composition," no? What would be the value to the user, here? Something related to species balance? Not to stir the pot, but if that were their intent, I would think they'd want all indices returned, in an array. I guess that would be a next step and doesn't impact this change, necessarily. @bryanwweber in terms of only one branch using the optional argument... is this sort of a "code style" issue, rather than anything related to the functionality? Just trying to be clear on the problem with it. I agree though, that the only way around that would be a completely separate user function for finding id by map vs. via string. If so, I'd agree--sub-optimal, to be kind. Of the two problems, I'd prefer the former, but I feel pretty sympathetic to @ischoegl's take, here: I'm having a hard time IDing the case where the user doesn't care which isomer gets returned. |
|
@decaluwe @ischoegl I had written a long post justifying my position, and then I had a light bulb and I think I get what you're both saying now. In #714, @ischoegl added the Let me see whether it makes sense to continue modifying the code here or open a new PR. |
|
@bryanwweber I think this works. If they think they do want just a single species, returning an array will probably error out in some way that they will catch. I need to think on that, a little more. Definitely don't want a case where someone thinks they're getting a single index, but unknowingly gets an array, undetected. But in general, I think what you suggest above is the best approach. |
Phase::speciesIndex() can take an atomic composition map to find the index of species with that composition. Due to the possibility of isomers, the overloaded function always returns a vector<size_t>, even if no isomers are found. Similarly, in the Python interface, passing a dictionary to species_index returns a list even if only a single species is found. Resolves Cantera#603
|
@ischoegl @speth @decaluwe I made another round of updates here. It feels a little inconsistent that a single Python function can give you either an integer or a list of integers, but I couldn't see a better way to provide this interface in Python. Maybe it makes sense just to drop this from Python, but I think it would be generally useful so 🤷♂️ |
|
@bryanwweber ... the only way to avoid a list would be to raise errors for |
|
@ischoegl I considered that, but |
|
Per @bryanwweber's concluding suggestion on #603, I'm going to close this. Thanks to all for the discussion -- at least we avoided adding potentially confusing behavior to a fairly basic function 😕. |
|
Seems like this never got closed, so I'm doing that now. |
Fixes #603
Changes proposed in this pull request:
As noticed at the issue #603, in some cases, querying a species by name is not a well-defined process, because there is not a requirement that species be named in this way. Thus, an overload of Phase::speciesIndex() is provided, in order to find the index of the species, by passing its atomic composition map (compositionMap Species::composiotion).