Fix CC smiles#2545
Conversation
|
Checking the error |
Codecov Report
@@ Coverage Diff @@
## main #2545 +/- ##
=======================================
Coverage 55.32% 55.32%
=======================================
Files 125 125
Lines 37140 37143 +3
=======================================
+ Hits 20548 20550 +2
- Misses 16592 16593 +1
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Regression Testing Results
Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:01:35 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Failed Edge Comparison ❌Original model has 106 species. Non-identical thermo! ❌
Identical thermo comments: Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsCsH) + group(Cdd-CdsCds) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(124cyclohexatriene) + ring(1,4-Cyclohexadiene) Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: DetailsObservables Test Case: Aromatics Comparison✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! aromatics Passed Observable Testing ✅Regression test liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:03:11 liquid_oxidation Passed Core Comparison ✅Original model has 37 species. liquid_oxidation Failed Edge Comparison ❌Original model has 202 species. DetailsObservables Test Case: liquid_oxidation Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! liquid_oxidation Passed Observable Testing ✅Regression test nitrogen:Reference: Execution time (DD:HH:MM:SS): 00:00:01:58 nitrogen Passed Core Comparison ✅Original model has 41 species. nitrogen Failed Edge Comparison ❌Original model has 132 species. Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(O2s-CdN3d) + group(N3d-OCd) + group(Cd-HN3dO) + ring(Cyclopropene) + radical(CdJ-NdO) Non-identical kinetics! ❌
kinetics: DetailsObservables Test Case: NC Comparison✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions! nitrogen Passed Observable Testing ✅Regression test oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:03:14 oxidation Passed Core Comparison ✅Original model has 59 species. oxidation Passed Edge Comparison ✅Original model has 230 species. DetailsObservables Test Case: Oxidation Comparison✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! oxidation Passed Observable Testing ✅Regression test sulfur:Reference: Execution time (DD:HH:MM:SS): 00:00:01:18 sulfur Passed Core Comparison ✅Original model has 27 species. sulfur Failed Edge Comparison ❌Original model has 89 species. DetailsObservables Test Case: SO2 Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! sulfur Passed Observable Testing ✅Regression test superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:00:48 superminimal Passed Core Comparison ✅Original model has 13 species. superminimal Passed Edge Comparison ✅Original model has 18 species. Regression test RMS_constantVIdealGasReactor_superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:03:45 RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅Original model has 13 species. RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅Original model has 13 species. DetailsObservables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅Regression test RMS_CSTR_liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:09:23 RMS_CSTR_liquid_oxidation Passed Core Comparison ✅Original model has 37 species. RMS_CSTR_liquid_oxidation Failed Edge Comparison ❌Original model has 206 species. Non-identical kinetics! ❌
kinetics: DetailsObservables Test Case: RMS_CSTR_liquid_oxidation Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_CSTR_liquid_oxidation Passed Observable Testing ✅beep boop this comment was written by a bot 🤖 |
JacksonBurns
left a comment
There was a problem hiding this comment.
Thanks for the quick patch @xiaoruiDong! This looks thorough and quite reasonable to me, and the regression tests indicate that this isn't impacting anything. I think we should merge, but let's get additional comments from @hwpang as well.
rwest
left a comment
There was a problem hiding this comment.
Made a couple comments. In general looks like a good fix. Thanks
I'm not sure how to interpret the Codecov annotations - I assume it's being tested?
rmgpy/molecule/converter.py
Outdated
| for label, ind_list in label_dict.items(): | ||
| for ind in ind_list: | ||
| Chem.SetSupplementalSmilesLabel(rdkitmol.GetAtomWithIdx(ind), label) | ||
| [atom.SetNoImplicit(True) for atom in rdkitmol.GetAtoms() |
There was a problem hiding this comment.
If style guides (line length) make this 2 lines anway, I'd have preferred the more explicit::
for atom in rdkitmol.GetAtoms():
if atom.GetAtomicNum() > 1: atom.SetNoImplicit(True)
rather than using the side-effect of a list comprehension, but maybe that's just me.
Is this list-comprehension style "more Pythonic"?
There was a problem hiding this comment.
Thanks for the suggestion! With some Google searches, the use of list comprehension here is less Pythonic, as my purpose is not to create a list. I've changed this as suggested.
|
|
||
| rdkitmol = to_rdkit_mol(molcopy, remove_h=True) | ||
| _, auxinfo = Chem.MolToInchiAndAuxInfo(rdkitmol, options='-SNon') # suppress stereo warnings | ||
|
|
There was a problem hiding this comment.
Is Codecov correct that lines #L605 - L606 were not covered by tests ? Does that mean this whole method is not tested? (or is codecov not right?)
There was a problem hiding this comment.
This is something weird. The failures in the unit tests testing this function inspired my change here. I looked at the codecov coverage profile and found it reported 0% coverage of the whole inchi.py, but RMG actually has unit tests for it. @JacksonBurns any thoughts?
There was a problem hiding this comment.
So this test is definitely running (you can search for it in the CI logs) and the new code (and this whole file) are being run, so CodeCov is just wrong. From brief googling I can't find a solution, so I will debug this separately, but this PR should be fine.
|
Thanks, this seems to work properly. I tested it for the different CC structures and also some other species with unpaired electrons at the carbon atom such as CCH3. RMG returned the correct SMILES string. |
3ff2a04
Regression Testing Results
Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:01:23 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Failed Edge Comparison ❌Original model has 106 species. Non-identical thermo! ❌
Identical thermo comments: Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsCsH) + group(Cdd-CdsCds) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(124cyclohexatriene) + ring(124cyclohexatriene) Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: DetailsObservables Test Case: Aromatics Comparison✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! aromatics Passed Observable Testing ✅Regression test liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:46 liquid_oxidation Failed Core Comparison ❌Original model has 37 species. liquid_oxidation Failed Edge Comparison ❌Original model has 202 species. DetailsObservables Test Case: liquid_oxidation Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! liquid_oxidation Passed Observable Testing ✅Regression test nitrogen:Reference: Execution time (DD:HH:MM:SS): 00:00:01:44 nitrogen Passed Core Comparison ✅Original model has 41 species. nitrogen Failed Edge Comparison ❌Original model has 132 species. Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(O2s-CdN3d) + group(N3d-OCd) + group(Cd-HN3dO) + ring(Cyclopropene) + radical(CdJ-NdO) Non-identical kinetics! ❌
kinetics: DetailsObservables Test Case: NC Comparison✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions! nitrogen Passed Observable Testing ✅Regression test oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:49 oxidation Passed Core Comparison ✅Original model has 59 species. oxidation Passed Edge Comparison ✅Original model has 230 species. DetailsObservables Test Case: Oxidation Comparison✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! oxidation Passed Observable Testing ✅Regression test sulfur:Reference: Execution time (DD:HH:MM:SS): 00:00:01:09 sulfur Passed Core Comparison ✅Original model has 27 species. sulfur Failed Edge Comparison ❌Original model has 89 species. DetailsObservables Test Case: SO2 Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! sulfur Passed Observable Testing ✅Regression test superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:00:43 superminimal Passed Core Comparison ✅Original model has 13 species. superminimal Passed Edge Comparison ✅Original model has 18 species. Regression test RMS_constantVIdealGasReactor_superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:03:22 RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅Original model has 13 species. RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅Original model has 13 species. DetailsObservables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅Regression test RMS_CSTR_liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:07:40 RMS_CSTR_liquid_oxidation Failed Core Comparison ❌Original model has 35 species. RMS_CSTR_liquid_oxidation Failed Edge Comparison ❌Original model has 177 species. Non-identical kinetics! ❌
kinetics: DetailsObservables Test Case: RMS_CSTR_liquid_oxidation ComparisonThe following observables did not match: ❌ Observable species CCCCC varied by more than 0.100 on average between old model pentane(2) and new model pentane(2) in condition 1.
RMS_CSTR_liquid_oxidation Failed Observable Testing ❌beep boop this comment was written by a bot 🤖 |
JacksonBurns
left a comment
There was a problem hiding this comment.
The new unit tests cover this edge case well, and the solution is solid. CodeCov is incorrectly reporting the result, which is a problem for another time. This PR LGTM
Since RMG includes hydrogens explicitly, it should be relatively safe to set all heavy atoms to have no implicit hydrogens during the creation of RDKit Mol in `to_rdkit_mol`. The addition shouldn't conflict with the RemoveHs step.
These tests act as checks related to solving issue ReactionMechanismGenerator#2536
Previously, rdkitmol is created after removing hydrogens from the rmg molecule. This is inconsistent with the assumption of the updated version of to_rdkit_mol, where heavy atoms are assumed to have no implicit hydrogens. This commit moves conversion to rdkitmol and generating aux info prior to the hydrogen removal. It fixes the inconsistency and is possibly more robust, as the complete molecule's definition is used to generate the aux info.
This is to improve the readability of commit da3d4b and make it follow a more Pythonic style.
3ff2a04 to
e5998d7
Compare
Regression Testing Results
Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:01:24 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Failed Edge Comparison ❌Original model has 106 species. Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsCsH) + group(Cdd-CdsCds) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(124cyclohexatriene) + ring(1,4-Cyclohexadiene) Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: DetailsObservables Test Case: Aromatics Comparison✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! aromatics Passed Observable Testing ✅Regression test liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:51 liquid_oxidation Passed Core Comparison ✅Original model has 37 species. liquid_oxidation Passed Edge Comparison ✅Original model has 202 species. DetailsObservables Test Case: liquid_oxidation Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! liquid_oxidation Passed Observable Testing ✅Regression test nitrogen:Reference: Execution time (DD:HH:MM:SS): 00:00:01:46 nitrogen Passed Core Comparison ✅Original model has 41 species. nitrogen Passed Edge Comparison ✅Original model has 132 species. DetailsObservables Test Case: NC Comparison✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions! nitrogen Passed Observable Testing ✅Regression test oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:53 oxidation Passed Core Comparison ✅Original model has 59 species. oxidation Passed Edge Comparison ✅Original model has 230 species. DetailsObservables Test Case: Oxidation Comparison✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! oxidation Passed Observable Testing ✅Regression test sulfur:Reference: Execution time (DD:HH:MM:SS): 00:00:01:11 sulfur Passed Core Comparison ✅Original model has 27 species. sulfur Failed Edge Comparison ❌Original model has 89 species. DetailsObservables Test Case: SO2 Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! sulfur Passed Observable Testing ✅Regression test superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:00:44 superminimal Passed Core Comparison ✅Original model has 13 species. superminimal Passed Edge Comparison ✅Original model has 18 species. Regression test RMS_constantVIdealGasReactor_superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:03:24 RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅Original model has 13 species. RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅Original model has 13 species. DetailsObservables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅Regression test RMS_CSTR_liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:08:09 RMS_CSTR_liquid_oxidation Failed Core Comparison ❌Original model has 37 species. RMS_CSTR_liquid_oxidation Failed Edge Comparison ❌Original model has 199 species. DetailsObservables Test Case: RMS_CSTR_liquid_oxidation Comparison✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_CSTR_liquid_oxidation Passed Observable Testing ✅beep boop this comment was written by a bot 🤖 |
Motivation or Problem
This PR tries to fix issue #2536, where the SMILES of bicarbon (CC) is not correctly generated.
Description of Changes
This PR forces the setting of no implicit Hs when converting the RMG Molecule to an RDKit Mol object. This is viable since all hydrogens in the RMG Molecules are defined explicitly.
Testing
Add a few unit tests to test the molecule of CC with various multiplicities. I comment out the case with a quadruple bond, as it is very likely the installed RDKit version doesn't have support for output smiles of quadruple bond (which is introduced in sometime 2022)
There is an InChI error raised due to this change, and I am investigating it.https://github.com/xiaoruiDong/RMG-Py/actions/runs/6238513699/job/16934497099