-
Notifications
You must be signed in to change notification settings - Fork 5.5k
replace multi processing with joblib #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace multi processing with joblib #477
Conversation
5100a5a to
ef7fe8a
Compare
1b7080f to
4a62e02
Compare
6dcbf51 to
4ffb05a
Compare
qlib/data/updateparallel.py
Outdated
| require=None, | ||
| maxtasksperchild=None, | ||
| **kwargs) | ||
| self._backend_args["maxtasksperchild"] = ["maxtasksperchild"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._backend_args["maxtasksperchild"] = maxtasksperchild
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(self._backend, MultiprocessingBackend):
self._backend_args["maxtasksperchild"] = maxtasksperchild
qlib/data/updateparallel.py
Outdated
| from joblib import Parallel | ||
|
|
||
|
|
||
| class UpdateParallel(Parallel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateParallel moves to qlib/utils/__init__.py
UpdateParllel renamed to ParallelExt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/microsoft/qlib/blob/main/qlib/utils/paral.py will be a better place
qlib/data/updateparallel.py
Outdated
| maxtasksperchild=None, | ||
| **kwargs | ||
| ): | ||
| super(UpdateParallel, self).__init__(n_jobs=n_jobs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(UpdateParallel, self).__init__(
n_jobs=n_jobs,
backend=backend,
verbose=verbose,
timeout=timeout,
pre_dispatch=pre_dispatch,
batch_size=batch_size,
temp_folder=temp_folder,
max_nbytes=max_nbytes,
mmap_mode=mmap_mode,
prefer=prefer,
require=require,
)
qlib/data/updateparallel.py
Outdated
| backend=None, | ||
| verbose=0, | ||
| timeout=None, | ||
| pre_dispatch="2 * n_jobs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using *args, **kwargs instead of explicitly list all the arguments?
qlib/data/updateparallel.py
Outdated
| from joblib import Parallel | ||
|
|
||
|
|
||
| class UpdateParallel(Parallel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/microsoft/qlib/blob/main/qlib/utils/paral.py will be a better place
qlib/config.py
Outdated
| "kernels": NUM_USABLE_CPU, | ||
| # How many tasks belong to one process. Recommend 1 for high-frequency data and None for daily data. | ||
| "maxtasksperchild": None, | ||
| "joblib_backend" : None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set the default backend to multiprocessing if loky is very likely to OOM?
qlib/data/updateparallel.py
Outdated
| @@ -0,0 +1,41 @@ | |||
| from joblib import Parallel | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
tests/misc/test_get_multi_proc.py
Outdated
| """ | ||
| For testing if it will raise error | ||
| """ | ||
| qlib.init(provider_uri=TestAutoData.provider_uri, expression_cache=None, dataset_cache=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to use loky to pass the test
* replace multi processing with joblib * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * Fix Parallel support for maxtasksperchild Co-authored-by: wangw <[email protected]> Co-authored-by: zhupr <[email protected]>
…flow (microsoft#477) * several improvement on kaggle loop * small refinement on prompt * fix bugs * add the score of each model in every experiment * fix ci error * fix error in ventilator tpl * fix CI --------- Co-authored-by: Xu Yang <[email protected]> Co-authored-by: Bowen Xian <[email protected]> Co-authored-by: WinstonLiye <[email protected]> Co-authored-by: TPLin22 <[email protected]>
Description
Multiprocessing has following weakness
Joblib has no above problems.
So we try to replace multi processing with joblib
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.pyunder upper directory ofqlib.Types of changes