Skip to content

MultipartWriter quotes field name wrong #4012

@kohtala

Description

@kohtala

Long story short

My client needs to send multipart/form-data to an API that expects field names with [] in the name. The server does not accept the submission with default set_content_disposition parameters due to wrong quoting.

Expected behaviour

Content-Disposition: form-data; name="files[]"; filename="filename"

Actual behaviour

Content-Disposition: form-data; name="files%5B%5D"; filename="filename"; filename*=utf-8''filename

Steps to reproduce

Client code is like

        with aiohttp.MultipartWriter('form-data') as mpw:
                f = mpw.append(file)
                f.set_content_disposition("form-data", name="files[]", filename="filename")

        res = await self.session.post(url, data=mpw)

Your environment

aiohttp==3.5.4 async client, Ubuntu 18.04, python 3.6.8.

Analysis

Returning Values from Forms: multipart/form-data says

In most multipart types, the MIME header fields in each part are
restricted to US-ASCII; for compatibility with those systems, file
names normally visible to users MAY be encoded using the percent-
encoding method in Section 2, following how a "file:" URI
[URI-SCHEME] might be encoded.

NOTE: The encoding method described in [RFC5987], which would add a
"filename*" parameter to the Content-Disposition header field, MUST
NOT be used.

It would seem the current implementation misinterpreted this to mean all field values are to be percent-encoded. But the RFC7578 is clear that the encoding is only to be used on file names. Furthermore, the filename*= form from MIME Parameter Value and Encoded Word Extensions should be used only for the other fields, but as the filename is already via percent-encoding to within US-ASCII, filename*= is not to be used on the filename.

For converting from unicode string to bytes for the percent-encoding, user will need to specify charset in some cases, as in the RFC:

The encoding used for the file names is typically UTF-8, although
HTML forms will use the charset associated with the form.

Thus, in some cases, an additional charset parameter is needed in set_content_disposition. Is it needed in other functions?

The RFCs refer to RFC822 for quoted-string definition, which is currently obsoleted by Internet Message Format RFC5322.

   qtext           =   %d33 /             ; Printable US-ASCII
                       %d35-91 /          ;  characters not including
                       %d93-126 /         ;  "\" or the quote character
                       obs-qtext

   qcontent        =   qtext / quoted-pair

   quoted-string   =   [CFWS]
                       DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                       [CFWS]

   quoted-pair     =   ("\" (VCHAR / WSP)) / obs-qp

And from Augmented BNF for Syntax Specifications: ABNF

  VCHAR          =  %x21-7E
                                ; visible (printing) characters

         WSP            =  SP / HTAB
                                ; white space```

The quoted-pair quoting of quoted-string is missing in the current implementation.

There is also a rather far-fetched case of extremely long values causing the line length limit of 998 characters to be exceeded https://tools.ietf.org/html/rfc5322#section-2.1.1 and requiring using the Folding White Space (FWS).

I can not tell if there would be any compatibility impact of just changing the percent-quoting to the correct quoted-pair quoting. Should the quote_fields parameter concern the percent-encoding of filename or the quoted-pair of all fields?

The current behavior seems to be result of discussion in #916 to fix #903.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions