Reaction generation using multiprocessing by ajocher · Pull Request #1459 · ReactionMechanismGenerator/RMG-Py

ajocher · 2018-08-30T16:13:14Z

Motivation or Problem

Scoop has been found to increase the memory consumption until the OS's out of memory killer kills processes and the reaction generation fails. Consequently, python's multiprocessing module is implemented for MAC and Linux and a deprecation warning is added to scoop references.

Description of Changes

Multiprocessing is implemented for reaction generation in rmgpy/rmg/react.py. The processes are spawned and closed on a single node within the function 'react()'. The number of processes is determined based on the ratio of currently available RAM and currently used RAM. The user can input the maximum number of allowed processes from the command line. For each reaction generation the number of processes will be the minimum value either being the number of allowed processes due to user input or the value obtained by the RAM ratio. The RAM limitation is employed, because multiprocessing is forking the base process and the memory limit (SWAP + RAM) might be exceeded when using too many processes for a base process large in memory.

Multiprocessing is employed from the command line using the -n command and the maximum number of processes the user want to use, here 4. The default is 1 process, only an integer value smaller than the number of available cpu is allowed.
python rmg -n 4 input.py

Testing

The implementation has been tested for the superminimal, minimal and PDD cases on both MAC and RMG-server, ranging from 1 to 24 processes on a single node. An example submission script for the RMG-server is attached here:
submit.txt

Reviewer Tips

Try your own cases and document the changes in run time and memory consumption.

codecov · 2018-08-30T20:07:08Z

Codecov Report

Merging #1459 into master will increase coverage by 0.07%.
The diff coverage is 57.48%.

@@            Coverage Diff            @@
##           master   #1459      +/-   ##
=========================================
+ Coverage   41.53%   41.6%   +0.07%     
=========================================
  Files         176     176              
  Lines       29111   29142      +31     
  Branches     5975    5990      +15     
=========================================
+ Hits        12090   12125      +35     
+ Misses      16189   16173      -16     
- Partials      832     844      +12

Impacted Files	Coverage Δ
rmgpy/species.py	`0% <0%> (ø)`	⬆️
rmgpy/thermo/thermoengine.py	`82.89% <100%> (-0.44%)`	⬇️
rmgpy/data/kinetics/database.py	`49.37% <100%> (+2.72%)`	⬆️
rmgpy/data/kinetics/family.py	`52.95% <28.57%> (+0.29%)`	⬆️
rmgpy/scoop_framework/util.py	`65.51% <33.33%> (-0.64%)`	⬇️
rmgpy/qm/main.py	`65.32% <41.37%> (-7.31%)`	⬇️
rmgpy/rmg/model.py	`38.46% <48%> (-0.43%)`	⬇️
rmgpy/rmg/pdep.py	`16.12% <50%> (ø)`	⬆️
rmgpy/rmg/main.py	`22.72% <56%> (+0.56%)`	⬆️
rmgpy/rmg/react.py	`87.03% <86.36%> (+8.6%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ce334d...f749f49. Read the comment docs.

mliu49

Great work! I made some comments and suggestions. It would be nice to see some profiling results before this is merged.

Regarding trailing whitespaces and other Codacy issues, you generally only need to address issues in lines that you changed. Removing all trailing whitespaces in a file is not necessary.

mliu49 · 2018-09-10T16:19:53Z

rmg.py


+    # Add option to select max number of processes for reaction generation
+    parser.add_argument('-n', '--maxProc', type=int, nargs=1, default=1,
+                        help='used by multiprocessing in react.py')


This may not be the most helpful info, since users may not know what react.py is. Perhaps max number of processors to use during reaction generation or something similar.

mliu49 · 2018-09-10T16:21:01Z

rmgpy/rmg/main.py

+            pass
+
+        if maxProc > psutil.cpu_count():
+            raise ValueError('Invalid format for user defined maximum number of procesors {0}; should be an integer and smaller or equal to your available number of cpu {1}'.format(maxProc, psutil.cpu_count()))


It doesn't seem like you're checking the format here?

Changed to 'Invalid input ...'

mliu49 · 2018-09-10T16:21:53Z

rmgpy/rmg/model.py

+                spcs.extend(rxn.reactants)
+                spcs.extend(rxn.products)
+
+            ensure_independent_atom_ids(spcs, resonance=True) 


Why do you need to ensure_independent_atom_ids here?

Otherwise Species.getResonanceHybrid() fails due to not having valid atom ids. We call ensure_independent_atom_ids in generate_reactions_from_families, however, this is not working for multiprocessing. Using ensure_independent_atom_ids here fixes that issue.

mliu49 · 2018-09-10T16:24:19Z

rmgpy/rmg/react.py

+    tmp = divmod(user_mem, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
+    tmp2 = min(maxProc, tmp[0])
+    procNum = max(1, tmp2)
+    print 'For reaction generation {0} processes are used.'.format(procNum)


We like to avoid print statements in favor of using the logging module. So you would just replace print with logging.info() or maybe logging.debug depending on whether you want this to be printed normally.

Something you should consider is how many times this will be printed (seems like it will be a lot) and whether it's useful info for a typical user.

Done. It will be printed for each time we are spawning a pool of worker. It might be of interest as the number of processes/worker might differ from the user input, depending on the available RAM and the memory consumption of the RMG base process before spawning the workers.

rmg.py

mliu49 · 2018-09-10T16:28:56Z

rmgpy/scoop_framework/util.py

    when SCOOP is loaded, the future object.
    """
+    warnings.warn("The option scoop is no longer supported"\
+     " and may be removed in Version: 2.3 ", DeprecationWarning)


For all of these warnings, could you indent the second line to line up with the " in the first line? Also, the slashes shouldn't be necessary.

Done. Got the template from here:
https://github.com/ReactionMechanismGenerator/RMG-Py/wiki/RMG-Contributor-Guidelines
I guess it should be changed accordingly?

mliu49 · 2018-09-10T16:33:53Z

rmgpy/rmg/react.py

-    tmp = divmod(usermem, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
+
+    # Get available RAM (GB)and procnum dependent on OS
+    if platform == "linux" or platform == "linux2":


You could use platform.startswith('linux') here to check for both.

Thanks! Done.

mliu49 · 2018-09-10T16:36:37Z

rmgpy/rmg/react.py

+        # OS X
+        memoryavailable = psutil.virtual_memory().available/(1000.0 ** 3)
+        memoryuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/(1000.0 ** 3) 
+


I think you should add handling for Windows as well. It doesn't have to be equivalent handling, but you could simply add an else for any other platform and perhaps only use one processor in that case.

Done, but couldn't test it. Is someone with a Windows OS volunteering?

ajocher · 2018-09-11T22:04:12Z

presentation1209.pdf

mliu49 · 2019-03-25T22:37:18Z

Hmm, something seems to have gone wrong with your rebase. This branch now includes a ton of commits from master.

mliu49 · 2019-03-29T20:28:27Z

I added a commit refactoring the code for family splitting. Let me know what you think, and whether it matches your original design.

rmgpy/rmg/model.py

mliu49 · 2019-03-29T21:00:32Z

rmgpy/rmg/model.py

+                    # This method chops the iterable into a number of chunks which it
+                    # submits to the process pool as separate tasks.
+                    p = Pool(processes=procnum)
+                    p.map(calculate_thermo_parallel,spcs)


I have a couple concerns about the overall implementation for parallel QM. Normally, thermo is calculated for one species at a time in self.processNewReactions() below. Here, you determine all species which would be calculated using QM and pre-calculate their thermo in parallel, then in processNewReactions, it sees the thermo and does not re-calculate it.

The calculate_thermo_parallel function seems to skip a lot of processing done in ThermoDatabase.getThermoData and thermoEngine.processThermoData, which can have a significant effect on the final values.

It appears that you're using SMILES to determine whether species are unique? This is not safe because different resonance structures will have different SMILES. Also, generating SMILES for every new molecule will be relatively time consuming.

Minor suggestions, which may or may not be relevant after resolving the above issues:

Since the procnum determination based on RAM is done both here and in react.py, it might be useful to turn it into a function somewhere and import it here.

I would also split this section off as a new method, since enlarge is already a really big method.

The thought behind the implemented procedure is that the QM files should be generated in parallel and then looked up for one species at a time in self.processNewReactions() below. Testing showed that calculating QM thermo for one species at a time is more time consuming than in a bulk up here.

I use SMILES for comparison and then generate resonance structures within calculate_thermo_parallel() to generate QM files for each resonance structure. What would be a better way to ensure that the same QM file is not generated twice/overlapping during the parallel generation?

The SMILES comparison is now substituted with isIsomorphic comparison.

The procnum determination is now a function and imported for QMTP parallel and reaction generation in parallel.

And the parallel QM files writing section is split off outside of enlarge.

Allows thermo pruning with parallel QMTP.

Add explicit `rename` argument to generateThermo which is only true when called from enlarge. Thus, only new species without thermo are renamed, which prevents initial species and bath gases from getting accidentally renamed.

mliu49 · 2019-05-30T22:58:43Z

I changed the species re-labeling to only be done for new species created from reaction generation. Turns out that thermo generation for input species was happening earlier than we thought.

I also removed the whitespace changes to clean up the PR a bit.

Reduce number of families being tested Make separate tests for serial and parallel processing

mliu49 · 2019-06-03T16:24:30Z

I think this is ready code-wise. Are there any additional tests that you think should be run?

ajocher · 2019-06-03T19:33:11Z

I think this is ready code-wise. Are there any additional tests that you think should be run?

Just re-tested, QMTP, pdep, and thermo filtering in serial and parallel and seems to work fine. Ready to go from my side.

mliu49

Hooray!

alongd · 2019-07-05T14:01:31Z

Should we update the documentation re parallelization?
http://reactionmechanismgenerator.github.io/RMG-Py/users/rmg/running.html?highlight=parallel

ajocher · 2019-07-05T14:04:52Z

It is updated. Where do you think it needs more update?

Motivation: - processNewReactions requires a newSpecies argument which was not being properly identified after multiprocessing changes in #1459 Background: - The newSpecies argument is retained from early RMG when reactions were generated right after adding a new species to the core. - Hence, newSpecies referred to the newly added core species which was used to generate the new reactions. - The overall model generation algorithm has since changed, so identifying newSpecies is not as straightforward. - The multiprocessing PR changed newSpecies to be any core species, missing the original purpose of identifying the species from which the reaction was generated. Changes: - Use the species tuples created during reaction generation to keep track of which species were used to generate the reaction - Update unit tests which are affected by the changes in return values from react and react_all

ajocher requested review from mjohnson541 and mliu49 August 30, 2018 16:13

ajocher added Topic: Performance Type: Feature Topic: Parallel Complexity: Medium labels Aug 30, 2018

ajocher force-pushed the parallelRMG_RXNgen branch from cd3d25c to 97ce8a1 Compare August 30, 2018 18:15

mjohnson541 added Topic: SpeedUp and removed Topic: SpeedUp labels Sep 1, 2018

mjohnson541 assigned ajocher Sep 1, 2018

mjohnson541 added the Epic label Sep 1, 2018

mliu49 reviewed Sep 10, 2018

View reviewed changes

ajocher force-pushed the parallelRMG_RXNgen branch from 1604410 to 1ad85ac Compare September 11, 2018 21:53

ajocher force-pushed the parallelRMG_RXNgen branch 4 times, most recently from ee97295 to 915910b Compare March 29, 2019 18:52

mliu49 reviewed Mar 29, 2019

View reviewed changes

ajocher force-pushed the parallelRMG_RXNgen branch from 8e3e27b to 55e1fc6 Compare March 29, 2019 22:06

ajocher force-pushed the parallelRMG_RXNgen branch 3 times, most recently from e82fe61 to d7a7f84 Compare April 10, 2019 20:57

mliu49 force-pushed the parallelRMG_RXNgen branch from d7a7f84 to 63e2976 Compare April 11, 2019 19:00

ajocher force-pushed the parallelRMG_RXNgen branch 3 times, most recently from 6ed0fc6 to d27647b Compare April 21, 2019 03:24

mliu49 and others added 12 commits May 30, 2019 18:44

For Species, regenerate resonance structures if atom IDs are invalid

4e5ce61

Restore retrieve species to keep pdep functional.

3f3ada2

Moved thermo pruning to happen directly after thermo data calculation.

490ad14

Allows thermo pruning with parallel QMTP.

Fix pickle error for QMTP parallel.

318e074

Update arkane/explorerTest.py.

d59da68

Auto-format spacing in qm.mainTest

a5085dd

Make generate_QMfiles a method of QMCalculator

0585607

Move generate_QMfiles unit test to qm.mainTest

fcda6b6

Moving determine_procnum_from_RAM() to rmgpy.rmg.main.

80b288a

Update tests for moving determine_procnum_from_RAM() to rmgpy.rmg.main.

cd22300

Changed available number of processes to available number of processors.

189d5c9

Refactor species labeling based on thermo label

7318b37

Add explicit `rename` argument to generateThermo which is only true when called from enlarge. Thus, only new species without thermo are renamed, which prevents initial species and bath gases from getting accidentally renamed.

mliu49 force-pushed the parallelRMG_RXNgen branch from e69745b to 7318b37 Compare May 30, 2019 22:46

mliu49 force-pushed the parallelRMG_RXNgen branch from ab4afeb to f362c9d Compare May 31, 2019 20:56

Improvements to reactTest

f749f49

Reduce number of families being tested Make separate tests for serial and parallel processing

mliu49 force-pushed the parallelRMG_RXNgen branch from f362c9d to f749f49 Compare May 31, 2019 22:07

mliu49 approved these changes Jun 3, 2019

View reviewed changes

mliu49 merged commit f90526f into master Jun 3, 2019

mliu49 deleted the parallelRMG_RXNgen branch June 3, 2019 19:37

mliu49 mentioned this pull request Jul 17, 2019

Fix pressure dependent network generation in RMG jobs #1658

Merged

Conversation

ajocher commented Aug 30, 2018

Motivation or Problem

Description of Changes

Testing

Reviewer Tips

Uh oh!

codecov bot commented Aug 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mliu49 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajocher commented Sep 11, 2018

Uh oh!

mliu49 commented Mar 25, 2019

Uh oh!

mliu49 commented Mar 29, 2019

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mliu49 commented May 30, 2019

Uh oh!

mliu49 commented Jun 3, 2019

Uh oh!

ajocher commented Jun 3, 2019

Uh oh!

mliu49 left a comment

Choose a reason for hiding this comment

Uh oh!

alongd commented Jul 5, 2019

Uh oh!

ajocher commented Jul 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Aug 30, 2018 •

edited

Loading