Conversation
The default buffering was set to 0 to disable buffering when opening files so that file writes would appear immediately on the file system. However, writes larger tha 2GB were getting silently truncated by the OS as noted in Issue #704. To preserve the ability to disable buffering for unit tests, a buffering parameter is added to the connectors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The file/Globus connectors disabled file buffering which would cause some silently truncated file writes with very large files (>=2.14GB). Both connectors now use the "unset" buffering policies that uses the system default, with an option to override the policy on construction. This issue was identified in #704.
That issue also benefitted from using a custom serializer/deserializer to more efficiently serialize large torch models. I change the serialization protocols to be more flexible, operating on BytesLike objects rather that just bytes, so it is easier for people to implement new serializers. I also updated the example in the docs.
Fixes
Type of Change
Testing
To replicate the issue I used this script:
And I tested the custom serializers with this:
Pull Request Checklist
Please confirm the PR meets the following requirements.
pre-commit(e.g., mypy, ruff, etc.).