"spack mirror create -a" (mirror all) in parallel#48411
"spack mirror create -a" (mirror all) in parallel#48411aquan9 wants to merge 35 commits intospack:developfrom
Conversation
There was a problem hiding this comment.
Thanks for your PR.
I wouldn't use ThreadPoolExecutor here cause Spack isn't thread safe. In particular fetching creates stage objects, which create lock objects, of which the bookkeeping is not thread safe iirc. See spack.util.parallel.make_concurrent_executor in the code base elsewhere.
Secondly it's better to update mirror_stats from the return value of create_mirror_for_one_spec in the main thread/process, instead of updating it in parallel.
Finally I would implement -j using spack.cmd.common.arguments.add_common_arguments(..., ["jobs"]) plus spack.config.determine_number_of_jobs(parallel=True) since it takes into account config files and the number of available CPUs.
|
@haampie Thank you for the code review. I'll work on implementing these changes. |
@haampie Mirror_stats seems to be updated as a part of that process and the only value returned from "create_mirror_from_package_object" is a true or false for success or failure. |
Actually I think I figured out what you mean... let me push some changes... |
|
@spackbot request review |
lib/spack/spack/cmd/mirror.py
Outdated
| for candidate in mirror_specs | ||
| ] | ||
| for mirror_future in futures: | ||
| pkg_obj = mirror_future.result() |
There was a problem hiding this comment.
this should be in the context manager, __exit__ destroys the thread pool immediately.
lib/spack/spack/cmd/mirror.py
Outdated
| def create_mirror_for_one_spec(candidate, mirror_cache, mirror_stats): | ||
| pkg_cls = spack.repo.PATH.get_pkg_class(candidate.name) | ||
| pkg_obj = pkg_cls(spack.spec.Spec(candidate)) | ||
| spack.mirrors.utils.create_mirror_from_package_object(pkg_obj, mirror_cache, mirror_stats) |
There was a problem hiding this comment.
You're still mutating mirror_stats in a subprocess, so it's not reflected in the main process. The function should return what it added / what errored. Do the mirror_stats reduction in the main process.
… deep in directory
This PR improves the "spack mirror create -a" process by adding threading and an additional parameter "-j" to choose the number of threads used to download a full mirror.