Conversation
Codecov Report
@@ Coverage Diff @@
## master #1402 +/- ##
==========================================
- Coverage 42.07% 42.02% -0.05%
==========================================
Files 165 165
Lines 27824 27867 +43
Branches 5668 5680 +12
==========================================
+ Hits 11706 11711 +5
- Misses 15332 15371 +39
+ Partials 786 785 -1
Continue to review full report at Codecov.
|
39614f3 to
7ac6569
Compare
|
If this is going to be a brand new file type, I strongly suggest using an established format such as yaml, which you can parse using pyyaml to give a python dict. Using exec on user provided input files is a security risk since we could be executing arbitrary code. It would be best to convert all of our input files and database files to not require exec, but it'll take time. Regarding the idea of having a database of statmech info, it would again be best if it's not a plain-text database like RMG-database is currently. Kehang's thermo central database uses mongoDB, which is an option, especially since it seems like it would overlap or be a part of this future statmech database. There are also tons of other options, but we should definitely put some thought into the database design. |
|
I changed the format into |
mliu49
left a comment
There was a problem hiding this comment.
I really like where this is going! I made a few comments, feel free to push back on any if you have other thoughts.
rmgpy/cantherm/common.py
Outdated
| from rmgpy.molecule.translator import toInChI, toInChIKey | ||
| from rmgpy.thermo import NASA, NASAPolynomial, Wilhoit, ThermoData | ||
| from rmgpy.statmech.conformer import Conformer | ||
| from rmgpy.statmech import IdealGasTranslation, LinearRotor, NonlinearRotor, HarmonicOscillator |
There was a problem hiding this comment.
I think some of these imports are unused, so you could probably remove them.
Also, imports should be grouped by standard imports, third-party libraries, then local imports (PEP 8). It's not that important, but something to consider. For example:
import logging
import os.path
import re
import time
import numpy
import yaml
from yaml import ...
import rmgpy.constants as constants
from rmgpy import ...
There was a problem hiding this comment.
Thanks for informing me of the structure, I'll adhere to it.
The imports here are being used implicitly by the eval() function which creates objects from these classes if they are present in the .yml file, so I think we should leave them (otherwise it errors). Since advanced text editors mark them as unused, I think it's a good idea to add a comment next to them explaining why they are kept. I'll happily reorganize imports across Cantherm.
rmgpy/cantherm/common.py
Outdated
| ' kJ/mol. This can cause significant errors in your computed rate constants. ') | ||
|
|
||
|
|
||
| def determine_molecular_weight(species): |
There was a problem hiding this comment.
Instead of creating this method, you should modify the getter for Species.molecularWeight.
First, it would be more convenient for me if you could update the syntax to match my PR (this commit). Then, you can change the get method:
@property
def molecularWeight(self):
if self._molecularWeight is None:
self._molecularWeight = quantity.Mass(self.molecule[0].getMolecularWeight(), 'kg/mol')
return self._molecularWeight
Once you do that, you can access the Species.molecularWeight property regardless of whether or not it's been set.
There was a problem hiding this comment.
Thanks for the comment!
I'll implement it in the Species __init__():
def __init__(self, ..., molecularWeight=None, ...):
if self.molecularWeight is None and self.molecule is not None and len(self.molecule) > 0:
self._molecularWeight = quantity.Mass(self.molecule[0].getMolecularWeight(), 'kg/mol')
else:
self.molecularWeight = molecularWeightShall I still change the syntax of molecularWeight? I can leave it untouched so #1329 will have less conflicts
There was a problem hiding this comment.
On second though, your recommendation seems better if for some reason an empty Species object is first created, then a structure is assigned.
rmgpy/cantherm/common.py
Outdated
| d['thermo'] = species.thermo | ||
| d['chemkin_thermo_string'] = chemkin_thermo_string | ||
|
|
||
| self.species_dictionary = d |
There was a problem hiding this comment.
I don't think this should be called species_dictionary. I also think that you could save all of these parameters as normal attributes, then create a method as_dictionary if you need them as a dictionary.
If you do that, I think you also wouldn't need the update method.
There was a problem hiding this comment.
I agree.
I think I still need something like the update() method, though, to only append the species attributes to this object (the object is initiated with non-species-specific attributes such as author and level_of_theory which are kept)
rmgpy/cantherm/data/HSOO.yml
Outdated
| SMILES: '[O]OS' | ||
| adjacencyList: 'multiplicity 2 | ||
|
|
||
| 1 S u0 p2 c0 {2,S} {4,S} |
There was a problem hiding this comment.
Do you know why there are extra blank lines in the adjlist?
There was a problem hiding this comment.
I don't know why, it bugs me as well. But I verified that the adjList is being read correctly.
If I simply print it during the run (either using print or logging.info) it looks correct with no extra line brakes.
rmgpy/cantherm/data/HSOO.yml
Outdated
| transport_data: TransportData(epsilon=(3625.11,'J/mol'), sigma=(5.7,'angstrom')) | ||
| use_atom_corrections: true | ||
| use_bond_corrections: false | ||
| use_hindered_rotors: true |
There was a problem hiding this comment.
Some of these parameters don't seem like they would be important to store, e.g. use_atom_corrections, use_bond_corrections, use_hindered_rotors.
There was a problem hiding this comment.
At least use_bond_corrections is crucial, the rest are just "nice to have".
use_bond_corrections: This might impact the decision of the user of whether or not to use this file. If the user wants to use this species in a PES with other species calculated at different levels of theory, it is important/recommended to use bond corrections. On the other hand, if all species were calculated at the same level, it is better practice not to use bond corrections (which don't capture all bond types, and are often ambiguous for TSs).
use_hindered_rotors: This could also be directly inferred from the Conformer object. However, for a quick assessment, it's nice to have it explicitly written.
use_atom_corrections: Can be inferred from the atom_energies attribute, I'm OK with removing it since using this feature is less common.
rmgpy/cantherm/input.py
Outdated
| spec.molecule = [structure] | ||
| spec.conformer = Conformer(E0=E0, modes=modes, spinMultiplicity=spinMultiplicity, opticalIsomers=opticalIsomers) | ||
| if molecularWeight is None: | ||
| if is_pdep(): |
There was a problem hiding this comment.
Does this work if the species block comes before the pdep block?
Also, if you refactor the molecular weight implementation based on my other comment, would that eliminate the need for this if block?
rmgpy/cantherm/input.py
Outdated
| """ | ||
| Load the species with all statMech data from the .yml file | ||
| """ | ||
| global jobList |
There was a problem hiding this comment.
It doesn't seem like you use jobList in this function.
rmgpy/cantherm/input.py
Outdated
| coordinates = np.ndarray(shape=(len(number), 3)) | ||
| for i, coorlist in enumerate(job.species.conformer.coordinates.value_si): | ||
| coordinates[i] = [coor for coor in coorlist] | ||
| mol = Molecule().fromXYZ(atomicNums=number, coordinates=coordinates) |
There was a problem hiding this comment.
This section seems oddly coded. Was it done intentionally? It seems like the following would be sufficient:
number = job.species.conformer.number.value_si.tolist()
coordinates = job.species.conformer.coordinates.value_si.tolist()
There was a problem hiding this comment.
Yes, the issue here is that we must pass a np.ndarray type to Molecule().fromXYZ()
The data generated from .tolist() throws an error.
I'll be happy to consider a different alternative to the non straight forward code I used, though.
There was a problem hiding this comment.
Oh, I misinterpreted it, I thought initially that the conformer attributes were np.ndarray objects, and you were converting them into lists.
If the conformer attributes are np.ndarray, it seems you could use them directly. If they're lists, then couldn't you use np.array(list_input) to convert it to an np.ndarray?
rmgpy/cantherm/statmech.py
Outdated
| filename, file_extension = os.path.splitext(self.path) | ||
| if 'yml' in file_extension or 'yaml' in file_extension: | ||
| self.load_species_from_database = True | ||
| return |
There was a problem hiding this comment.
It seems like you could just call load_species_from_database_file here instead of main.py, and it would also eliminate the need for the load_species_from_database attribute.
There was a problem hiding this comment.
I agree, will change accordingly
rmgpy/cantherm/thermo.py
Outdated
|
|
||
| return chem_string | ||
|
|
||
| def save_species_to_database_file(self, path): |
There was a problem hiding this comment.
I think this may be more appropriate as a method of CanthermSpecies. Let me know what you think.
|
@mliu49, Thanks for all the valuable comments! |
|
@mliu49, can I squash this branch? |
|
Yeah, you can go ahead and squash. I guess this might get replaced by the new method though, if that turns out ok. |
rmgpy/rmg_obj.py
Outdated
| ############################################################################### | ||
| import numpy as np | ||
|
|
||
| class RMG_Obj(object): |
There was a problem hiding this comment.
A couple comments on naming:
- Class names should be CamelCase, though I think we can make an exception for RMG. I would probably call this
RMGObject. - Module names are ideally one word. Perhaps something like
object,base, orcommon? - Variable names should be lowercase. I think
Dshould be renamed to something else, likeoutputor evenoutput_dict.
Also, could you end the file with a newline?
|
I don't think we discussed how to handle Quantity objects with the new dict approach - we would need some way to save and load the units. |
|
They're our own objects right? So can't we just make them inherit from RMGObject? |
rmgpy/rmgobject.py
Outdated
| def __init__(self): | ||
| pass | ||
| def as_dict(self): | ||
| D = self.__dict__.copy() |
There was a problem hiding this comment.
Thanks for this addition, @mjohnson541 !
I think that self.__dict__ contains methods as well, and we should first check if not callable(value) before adding entries to the output dictionary.
322feb1 to
4bfbbb8
Compare
|
I rewrote this branch using the I'll note that since cythonized classes cannot inherit from non-cythonized classes, and vice versa, I had to make two parent classes: |
b132315 to
ad4e8c9
Compare
|
Are you sure that Python classes can't subclass Cython classes? It should be possible according to the Cython documentation (section 3.5). |
|
rebased. @mliu49, let me know when I can squash. |
|
@mliu49, let me know when to squash the fixups to finalize the PR |
mliu49
left a comment
There was a problem hiding this comment.
I made a few more comments. You can go ahead and squash everything else.
arkane/input.py
Outdated
| " cannot be reconstructed.".format(spec.label)) | ||
| if spec.molecularWeight is None and is_pdep(jobList): | ||
| # If one of the jobs is pdep and no molecular weight is given or calculated, raise an error | ||
| raise ValueError("No molecularWeight was entered for species {0}. Since a structure wasn't given" |
There was a problem hiding this comment.
Hmm, my suggestion might not have been correct, since this now misses the case where a structure is not provided. One possibility:
if molecularWeight is not None:
spec.molecularWeight = molecularWeight
elif spec.molecularWeight is None and is_pdep(jobList):
raise ValueError()
There shouldn't be any need to check whether structure exists, because the the Species.molecularWeight property checks if it has a molecule attribute, which will be empty if structure was empty.
rmgpy/rmgobject.pyx
Outdated
| @@ -1,10 +1,11 @@ | |||
| # cython: embedsignature=True, cdivision=True | |||
| #!/usr/bin/env python | |||
| # -*- coding: utf-8 -*- | |||
There was a problem hiding this comment.
This actually isn't necessary either, since this is technically not a python file.
There was a problem hiding this comment.
Did you mean the .pyx or the .pxd file?
arkane/qchem.py
Outdated
| with open(self.path, 'r') as f: | ||
| for line in f: | ||
| lines = f.readlines() | ||
| for line in lines: |
There was a problem hiding this comment.
The original implementation is preferable because it treats the file as an iterator, so only one line is loaded at a time. If you use readlines(), then the entire file has to be loaded into memory.
There was a problem hiding this comment.
OK, I think this isn't related to this PR (sorry) so I removed it, we could discuss it at another time (I think it might be necessary for running Arkane through ARC). Thanks for bringing this up, I removed the commit.
For ScalarQuantity and ArrayQuantity
Classes modified: Conformer, SingleExponentialDown, Mode, TransportData
simply by providing the structure
Along with its .yml file
- Added YAML description for the species() block - Added levelOfTheory and author descriptions
|
OK, done and squashed |
We'd like to be able to conveniently use previously calculated statmech data in Arkane (prev. Cantherm). The purpose is to have an online repository of such files, from which users can pick the desired species at the appropriate level of theory (if available). This PR sets the ground by allowing Arkane to save and parse such files.