vine: deserialize argument infile before forking#3902
Merged
dthain merged 1 commit intocooperative-computing-lab:masterfrom Aug 7, 2024
JinZhou5042:vine_deserialize_args_before_forking
Merged
vine: deserialize argument infile before forking#3902dthain merged 1 commit intocooperative-computing-lab:masterfrom JinZhou5042:vine_deserialize_args_before_forking
dthain merged 1 commit intocooperative-computing-lab:masterfrom
JinZhou5042:vine_deserialize_args_before_forking
Conversation
dthain
approved these changes
Aug 7, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed Changes
To fix #3892
In the serverless mode, we use
cloudpickleto serialize and deserialize arguments to/from a file. However, if some of the arguments are python packages or objects, thecloudpicklewill serialize them by reference as the default approach, and unpickling them triggers the import of their dependencies at load time. (difference between pickle and cloudpickle for reference)For example, the code snippets are dumping exactly the same
Dataobject, except that one serializes by reference and the other serializes by value. The first one has a smaller file (<100 kb) and unpickling takes 0.8s, the second one has a much larger file (>1200 kb) and unpickling takes 0.0001s. Pickling by reference creates a smaller file but takes longer as imports happen at unpickling.In the
forkmode, doing the imports in each of the child processes doesn't contribute to reusing the overlapped environment, thus introduces latency to each of the function calls. The imports happen at the load time, which is the time to invokecloudpickle.load(...). Loading arguments before forking enables the library to cache some packages in advance and thus to avoid such slowdown.As discussed here, there are other possible ways and one of the disadvantages is that if unpickling the argument file is somehow unavoidably expensive, this change will slightly impact the concurrency as the child processes could've done the deserializations in parallel.
Merge Checklist
The following items must be completed before PRs can be merge.
Check these off to verify you have completed all steps.
make testRun local tests prior to pushing.make formatFormat source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lintRun lint on source code prior to pushing.