Chemical identity comparison for Molecules and Species#1329
Chemical identity comparison for Molecules and Species#1329mjohnson541 merged 20 commits intomasterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1329 +/- ##
==========================================
- Coverage 41.67% 41.54% -0.13%
==========================================
Files 176 176
Lines 29102 29103 +1
Branches 5988 5975 -13
==========================================
- Hits 12127 12090 -37
- Misses 16138 16181 +43
+ Partials 837 832 -5
Continue to review full report at Codecov.
|
0e1c3c4 to
c3e65d9
Compare
c46bd06 to
f9d2f17
Compare
mjohnson541
left a comment
There was a problem hiding this comment.
Ok, this look pretty good except for the naming issues we already talked about, I just have a few questions and comments.
rmgpy/molecule/molecule.pxd
Outdated
| cdef str _inchi | ||
| cdef str _smiles | ||
|
|
||
| cpdef bint is_equal(self, Molecule other) |
There was a problem hiding this comment.
Where does this function come from?
rmgpy/rmg/model.py
Outdated
| if spec is not None and spec.is_same(molecule): | ||
| self.speciesCache.pop(i) | ||
| self.speciesCache.insert(0, spec) | ||
| # Check whether the resonance form is already included in the species |
There was a problem hiding this comment.
Shouldn't all of the resonance forms we care about for the species already be generated? Can we avoid the isomorphism check at the end?
rmgpy/reaction.py
Outdated
|
|
||
| def isIsomorphic(self, other, eitherDirection=True, checkIdentical = False, | ||
| checkOnlyLabel = False, checkTemplateRxnProducts=False): | ||
| def isIsomorphic(self, other, eitherDirection=True, level=1, checkTemplateRxnProducts=False): |
There was a problem hiding this comment.
We really should rename this to something more general...like is_same, but I'm not sure how much of a pain that would be.
rmgpy/reactionTest.py
Outdated
| self.assertTrue(r1.isIsomorphic(self.makeReaction('a=B'))) | ||
| self.assertTrue(r1.isIsomorphic(self.makeReaction('b=A'))) | ||
| self.assertFalse(r1.isIsomorphic(self.makeReaction('B=a'),eitherDirection=False)) | ||
| self.assertFalse(r1.isIsomorphic(self.makeReaction('B=a'), eitherDirection=False)) |
There was a problem hiding this comment.
These whitespace changes should be in a separate commit
rmgpy/reaction.py
Outdated
| Compares the provided reactants and products against the reactants | ||
| and products of this reaction. Both directions are checked. | ||
|
|
||
| Uses level 2 of `isomorphis_species_lists`, which compares InChI. |
rmgpy/molecule/translator.py
Outdated
| del mol_elementcount[key] | ||
| # If we didn't find anything, that would mean the result is wrong | ||
| conditions.append(found) | ||
|
|
There was a problem hiding this comment.
So the check below (inchi_elementcount == mol_elementcount) works because util.retrieveElementCount(identifier) doesn't account for isotopes and the above loop moves isotopes to there corresponding element?
| products0 = reaction.products[:] if forward else reaction.reactants[:] | ||
|
|
||
| # For aromatics, generate aromatic resonance structures to accurately identify isomorphic species | ||
| if prod_resonance: |
There was a problem hiding this comment.
Why can we get rid of this stuff?
|
This PR is out of date now. The commits related to InChI generation for isotopes has been merged via #1453. The species comparison changes will be refactored and a new PR will be opened once that's ready. |
Major changes have been made to this PR
|
Please see the updated initial comment for summary and explanation of the new changes. Performance comparison with minimal example:
This is a 14% reduction in runtime. I expect more significant differences for models with larger species and/or more resonance structures. The model core is identical on these two branches. The edge on this branch has 7 fewer species, most likely due to species being identified as duplicates using the non-strict comparison. I think this is a positive change though, since this is a more robust method of identifying whether species are truly the same (as compared to generating resonance structures). RMG-tests results:
|
01aaf8f to
e0ce75d
Compare
e0ce75d to
719f66b
Compare
More pythonic and prevents extra getter/setter functions from cluttering the namespace.
Allow instantiation using either SMILES or InChI Change SMILES and InChI to all lowercase for easier typing Make smiles and inchi read only properties that generate and save the respective identifiers to private attributes
For strict=False, ignores electrons (i.e. bond order, radicals, lone pairs, etc.) Only implemented for Molecule isomorphism, exceptions raised in other cases For atoms, strict argument is added to the `equivalent` method For bonds, strict argument is handled within vf2
Because its functionality is replaced by the strict argument
Also add strict argument which is passed to isIsomorphic
Since this option allows us to compare resonance structures directly, we don't need to worry about nonreactive resonance structures or aromatic resonance structures. Also, the return value of checkForExistingSpecies is reduced to a single value, since returning whether or not the species was found is redundant with returning the species itself.
No need to ensure species - the reactions will always contain Molecules at this point, so either Molecule or Species objects are acceptable for the product list to compare with Aromatic resonance structure generation can be omitted by using the strict=False option, which neglects electrons during the isomorphism comparison
Remove checks that the product list is converted in place to Species objects, since that step was removed
Similar to strict option for isomorphism, setting strict=False ignores electrons and bond orders for the comparison, which is useful for comparing different resonance structures
Of Molecule, Species, and Reaction, which is passed to Graph.isMappingValid
Instead, use the strict=False option to ignore resonance issues Change ensure_species to not generate resonance structures by default Add call to ensure_independent_atom_ids in addReverseAttribute because the changes to resonance structure generation means that products may not necessarily have proper atom IDs
Add SMILES property and unit tests
In certain situations, we would call ensure_species with keep_isomorphic=False, which would ultimately result in wrong degeneracy values. This fixes that oversight and makes sure that keep_isomorphic=True when generating reactions. A unit test is also added.
mjohnson541
left a comment
There was a problem hiding this comment.
Looks pretty good, I just have a few noted questions.
| """InChI string representation of this species. Read-only.""" | ||
| if self._inchi is None: | ||
| if self.molecule: | ||
| self._inchi = self.molecule[0].InChI |
There was a problem hiding this comment.
is self.molecule[0].InChI always not None?
There was a problem hiding this comment.
Molecule.InChI is a property which will set the InChI if it's not already there.
|
|
||
| def isomorphic_species_lists(list1, list2, check_identical=False, only_check_label=False, generateInitialMap=False): | ||
|
|
||
| def same_species_lists(list1, list2, check_identical=False, only_check_label=False, generate_initial_map=False, strict=True): |
There was a problem hiding this comment.
If we're renaming this already can we either generalize it so it works for N species or rename it in a less general way?
There was a problem hiding this comment.
You mean since it can only compare list of up to 3 species? Do you any name suggestions?
There was a problem hiding this comment.
isomorphic_species_trios?
There was a problem hiding this comment.
isomorphic_species_lists_le3, isomorphic_species_lists_lt4?
There was a problem hiding this comment.
Do you think it's not worth worrying about the confusion that much?
Improvements:
is_samemethod checks "chemical identity" via InChI and multiplicity (better name suggestions welcome)is_sameincheckForExistingSpecies()instead of isomorphism. Testing has shown that this is much faster and also more accurate.April 2019 Revision
I have finally refactored and updated this branch. There are a number of changes from the prior version, but I decided not to make a new PR.
I decided to abandon the idea of creating a new
is_samemethod, since it seemed to add unnecessary complexity, especially since people are already familiar with usingisIsomorphicandisIdentical.Major changes
strictargument to almost all isomorphism related methods. The default isstrict=Truewhich is the same as current behavior. Specifyingstrict=Falseignores electrons when comparing atoms and bonds, meaning bond orders, radicals, and lone pairs are not considered at all. The result is a resonance structure independent comparison.strict=Falsecomparisons for:strict=Falseis slower than strict isomorphism (due to more potential matches), it is more than compensated for by eliminating resonance structure generation.checkForExistingSpecies, though non-reactive Molecules can still be created.Quality of life changes
fromInChIorfromSMILESBugfixes
ensure_independent_atom_ids, make sure we select a reactive molecule when generating resonance structuresensure_independent_atom_ids, passkeep_isomorphic=Truetoensure_speciesfor accurate degeneracy determination