-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
(edited based on later comments)
Attempting to put a non-ASCII string as a user metadata value on a S3 object fails due to an explicit prohibition in the boto3 source code:
Line 543 in 04d1fae
| def validate_ascii_metadata(params, **kwargs): |
Example:
>obj.put(Body="hello",Metadata={"meta":"™"})
Traceback (most recent call last): ...
UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in position 0: ordinal not in range(128)
The documentation cited in the validation function linked above has changed since the code was written, and now states "Amazon S3 allows arbitrary Unicode characters in your metadata values" (https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html)
The mismatch between what boto does and what S3 claims to support was surprising and wasted some time.
If I monkey-patch this check out, everything works fine and the PUT succeeds, as the documentation suggests it should. When fetching the object, the metadata is UTF-8 encoded and then base64 encoded for ASCII transmission via http headers, again totally as documented.
>import botocore
>botocore.handlers.BUILTIN_HANDLERS = [elem for elem in botocore.handlers.BUILTIN_HANDLERS if not (elem[0].startswith('before-parameter-build.s3.') and elem[1] == botocore.handlers.validate_ascii_metadata)]>sess = boto3.Session()
>s3 = session.resource('s3')
>obj = s3.Object('[bucket omitted]','testupload.txt')
>obj.put(Body="hello",Metadata={"meta":"™"})
{'ResponseMetadata': ...}
>obj.get()
{ ... 'x-amz-meta-meta': '=?UTF-8?B?w6LChMKi?=' ... }
In order to have boto support the documented behavior (which is, admittedly, not great behavior - double-encoding all of the strings) we could revert #861 or else make it check keys only (instead of values).
Alternatively, the Unicode error message could be updated to say that S3's support for Unicode values over REST protocol is incomplete, and therefore boto intentionally does not support setting non-ASCII values because it is not fully round-trip safe.