Allow RMG to load thermo or kinetic libraries from a file#2340
Allow RMG to load thermo or kinetic libraries from a file#2340xiaoruiDong merged 8 commits intomainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2340 +/- ##
==========================================
+ Coverage 47.72% 47.77% +0.04%
==========================================
Files 98 98
Lines 27678 27688 +10
Branches 7317 7317
==========================================
+ Hits 13210 13227 +17
+ Misses 13073 13066 -7
Partials 1395 1395
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
xiaoruiDong
left a comment
There was a problem hiding this comment.
@alongd Thank you for adding this feature, and I agree that this feature is handy! I just had one comment for you to address. Otherwise, it looks great.
| library = KineticsLibrary(label=short_library_name) | ||
| library.load(library_file, self.local_context, self.global_context) | ||
| self.libraries[library.label] = library | ||
| elif os.path.exists(library_file): |
There was a problem hiding this comment.
Can you check if there is a need to update library.library_order?
At rmgpy/data/rmg.py line 165, items in the kinetics_libraries will be assigned as (library, 'Reaction Library')). Therefore, if a file path is provided, this will be (PATH, 'Reaction Library').
At rmgpy/data/kinetics/database.py line 438, the script iterates through library_order and uses the first elements (the PATH) as the label to query the library (self.libraries[label]), but the change here redefines label as short_library_name. So, it is possible to cause a KeyError.
There was a problem hiding this comment.
Thanks for bringing up these important points, I did not think of them originally.
I think we can either take an approach of converting an external library name (which is a path at first) to the actual label when loading the object, and only working with labels and not paths downstream, or we can retain the full path to an external library as its name within RMG.
The first approach is cleaner or at least has shorter names, but the second might be more robust if users make an external library that has the same name as an internal one (e.g., testing an updated existing library).
If we take the second approach, which I believe is currently implemented in this PR, then I think that the current implementation is OK as is. Here's the train of calls, as I see it, I list it in detail just to help me sort it out :)
1. We load the database, e.g., as in rmgpy/rmg/main.py line 374
2. We call load_kinetics() in rmgpy/data/rmg.py in line 99, passing the database path for the kinetics folder, and a list of library names as reaction_libraries.
3. We append (library, 'Reaction Library') tuples in rmgpy/data/rmg.py line 165
4. Using this correct library order, we then turn to load the libraries on rmgpt/data/kinetics/database.py line 230, which is the block this PR modifies.
I think that the modifications to process an external library we did here are downstream to setting the library order, hence it is OK and the order set by the user will still be preserved.
I see the concern that we carry both the database PATH as well as the specific path of an external library as the library name. The load_libraries() function modified here is in charge of loading the specific library correctly if it represents a complete external path, and in this case the db PATH is ignored.
This branch runs as expected: I copied an existing RMG library (e.g., FFCM1) into my run folder, here's part of the RMG.log file:
Loading kinetics library GRI-Mech3.0-N from /Data/Code/RMG-database/input/kinetics/libraries/GRI-Mech3.0-N/reactions.py...
Loading kinetics library FFCM1(-) from /Data/Code/runs/debug/RMG_ext_lib/FFCM1(-)...
I tried loading an external seed mechanism, and it also works as expected.
I'll add another commit of an example that shows potential users how to use this feature.
There was a problem hiding this comment.
Thanks for the detailed explanation and the addition of the example. I double check with the workflow, and it makes sense with the second approach.
xiaoruiDong
left a comment
There was a problem hiding this comment.
Looks good. Please rebase the branch and I will merge it!
This message was meant for a time when these were very well-known and wide use RMG thermo and kinetic libraries. They were superseded by the BurkeH2O2 libraries, and now also by PrimaryH2O2 for thermo. No need to keep these messages, users are not aware of this old library name anymore.
Motivation or Problem
Users who execute RMG on a server where it is installed in a read-only partition cannot add new libraries for their runs.
Description of Changes
Added functionality to load thermo and kinetic libraries from a user specified location
Tests added