Skip to content

[WIP] MPI update#328

Merged
quaquel merged 47 commits intomasterfrom
mpi_update
Dec 20, 2023
Merged

[WIP] MPI update#328
quaquel merged 47 commits intomasterfrom
mpi_update

Conversation

@quaquel
Copy link
Copy Markdown
Owner

@quaquel quaquel commented Dec 8, 2023

This pull request makes the MPIEvaluator feature complete. It adds

  • centralized logging analogous to the MultiprocessingEvaluator and the IpyparallelEvaluator
  • handling of WorkingDirectoryModels
  • fixing the use of initializer functions for worker processes
  • make it possible to use chunksize as a kwarg on perform_experiment.
  • Various performance enhancements

Some more thoughts

  • The code has been tested on DelftBlue already using the new example_mpi_lake_model and associated slurm file.
  • DelftBlue sometimes seems to fail to spawn all workers. This results in a deadlock for the logging because creating an MPI intercommunicator is a blocking operation. The MPI intercommunicator that is created is only used to centralize logging. There are two possible fixes: have a timeout and fail the process or make centralized logging a user-specifiable choice.
  • The handling of WorkingDirectoryModels is implemented using the same structure as the MultiprocessingEvaluator but still needs to be tested

fixes #261

columns with the same value for all entries don't matter for feature scoring, so they can be ignored.
bunch of things happening at the same time
* code reorganization, moving mpi-specific code into a separate module
* Return of initializer
* update to logging (hacky) to include rank information in all log messages
in preparation for future updates to the mpi code, all evaluator code has been reorganized. All evaluators go into their respective module, and code sthared among 2 or more goes into a new util package.

All relevant modules also have been renamed to future_{style of parallelization}
attempted import error fix
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@quaquel quaquel added this to the 2.5.0 milestone Dec 8, 2023
@coveralls
Copy link
Copy Markdown

coveralls commented Dec 8, 2023

Coverage Status

Changes unknown
when pulling 7482825 on mpi_update
into ** on master**.

@EwoutH
Copy link
Copy Markdown
Collaborator

EwoutH commented Dec 8, 2023

Looks like an impressive effort. If you want me to review at some point, let me know!

@quaquel
Copy link
Copy Markdown
Owner Author

quaquel commented Dec 8, 2023

No rush. This is just in preparation for our meeting next week. I hope to have tested WorkingDirectory models by then as well.

Most of the commits were just stupid tests on DelftBlue. I learned a lot about MPICH and OpenMPI as well as about how slurm interacts with MPI.

@EwoutH
Copy link
Copy Markdown
Collaborator

EwoutH commented Dec 8, 2023

Most of the commits were just stupid tests on DelftBlue. I learned a lot about MPICH and OpenMPI as well as about how slurm interacts with MPI.

That really describes my experience in Q1! There are so many hidden assumptions everywhere. So much trial and error.

Even for me, and I really like trial and error.

@EwoutH EwoutH marked this pull request as draft December 13, 2023 20:56
@quaquel quaquel marked this pull request as ready for review December 20, 2023 12:29
Copy link
Copy Markdown
Collaborator

@EwoutH EwoutH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Testen working directories

@quaquel quaquel merged commit a968cc8 into master Dec 20, 2023
@quaquel quaquel deleted the mpi_update branch December 20, 2023 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants