From 4f32bbdfef383c57557fd01a21fbebf91597f8e9 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 14 Apr 2026 16:19:44 +0200 Subject: [PATCH 1/2] Define multipart/form-data Based on Andreu's work at https://github.com/andreubotella/multipart-form-data with mostly editorial changes and some corrections. WPT coverage at fetch/api/response/response-form-data.html --- fetch.bs | 590 +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 551 insertions(+), 39 deletions(-) diff --git a/fetch.bs b/fetch.bs index 1668dda27..1a68fcce0 100755 --- a/fetch.bs +++ b/fetch.bs @@ -8079,18 +8079,20 @@ steps:
{{FormData}}
-

Set action to this step: run the - multipart/form-data encoding algorithm, with object's +

Let (boundary, chunks) be the result of running the + multipart/form-data chunk serializer with object's entry list and UTF-8. +

Set stream to the result of creating a multipart/form-data + readable stream from chunks. +

Set source to object. -

Set length to unclear, see - html/6424 for improving this. +

Set length to the length of a + multipart/form-data payload given chunks. -

Set type to `multipart/form-data; boundary=`, followed by the - multipart/form-data boundary string generated by the - multipart/form-data encoding algorithm. +

Set type to `multipart/form-data; boundary=`, followed by + boundary.

{{URLSearchParams}}
@@ -8307,39 +8309,14 @@ running consume body with this and the following steps gi
"multipart/form-data"
    -
  1. -

    Parse bytes, using the value of the `boundary` parameter from - mimeType, per the rules set forth in - Returning Values from Forms: multipart/form-data. [[!RFC7578]] - -

    Each part whose `Content-Disposition` header contains a - `filename` parameter must be parsed into an entry whose - value is a {{File}} object whose contents are the contents of the part. The {{File/name}} - attribute of the {{File}} object must have the value of the `filename` parameter - of the part. The {{Blob/type}} attribute of the {{File}} object must have the value of the - `Content-Type` header of the part if the part has such header, and - `text/plain` (the default defined by [[!RFC7578]] section 4.4) otherwise. - -

    Each part whose `Content-Disposition` header does not contain a - `filename` parameter must be parsed into an entry whose - value is the UTF-8 decoded without BOM content of the - part. This is done regardless of the presence or the value of a - `Content-Type` header and regardless of the presence or the value of a - `charset` parameter. - -

    A part whose `Content-Disposition` header contains a - `name` parameter whose value is `_charset_` is parsed like any other - part. It does not change the encoding. - -

  2. If that fails for some reason, then throw a {{TypeError}}. - -

  3. Return a new {{FormData}} object, appending each entry, - resulting from the parsing operation, to its entry list. -

+
  • Let entryList be the result of running the + multipart/form-data parser given bytes and mimeType. -

    The above is a rough approximation of what is needed for - `multipart/form-data`, a more detailed parsing specification is to be written. - Volunteers welcome. +

  • If entryList is failure, then throw a {{TypeError}}. + +

  • Return a new {{FormData}} object whose entry list is + entryList. +

    "application/x-www-form-urlencoded"
    @@ -9802,6 +9779,540 @@ that RFC's normative processing requirements to be compatible with deployed cont +

    Formats

    + +

    multipart/form-data

    + +

    Serializing

    + +

    A multipart/form-data +boundary is a byte sequence such that: + +

      +
    • its length is greater than 26 and less than 71, and + +

    • it is composed by bytes in the ranges 0x30 to 0x39, 0x41 to 0x5A, or 0x61 to 0x7A, + inclusive (ASCII alphanumeric), or which are 0x27 ('), 0x2D (-) or 0x5F (_). +

    + +
    +

    To generate a +multipart/form-data boundary, return an +implementation-defined byte sequence which fulfills the conditions for +boundaries, such that part of it is randomly generated, with a minimum entropy of 95 bits. +

    + +

    Previous definitions of multipart/form-data required that the +multipart/form-data/boundary associated with a multipart/form-data payload not +be present anywhere in the payload other than as a delimiter, although they allow for generating the +multipart/form-data/boundary probabilistically. Since this generation algorithm is separate +from a payload, however, it has to specify a minimum entropy instead. [[RFC7578]] [[RFC2046]] + +

    If a user agent generates multipart/form-data boundaries with a length +of 27 and an entropy of 95 bits, given a payload made specifically to generate collisions with that +user agent's boundaries, the expected length of the payload before a collision is found is well over +a yottabyte. + +


    + +
    +

    To escape a multipart/form-data name with a string +name, an optional encoding encoding (default UTF-8) and an +optional boolean isFilename (default false): + +

      +
    1. If isFilename is true, then set name to the result of + converting name. + +

    2. +

      Otherwise: + +

        +
      1. Assert: name is a scalar value string. + +

      2. Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence + of U+000A (LF) not preceded by U+000D (CR), in name, by a string consisting of + U+000D (CR) and U+000A (LF). +

      + +
    3. Let encoded be the result of encoding name with + encoding. + +

    4. Replace every 0x0A (LF) byte in encoded with the byte sequence + `%0A`, 0x0D (CR) with `%0D` and 0x22 (") with `%22`. + +

    5. Return encoded. +

    +
    + +
    +

    The multipart/form-data chunk serializer takes an +entry list entries and an optional encoding +encoding (default UTF-8), and returns a tuple of a +multipart/form-data boundary and a list +of chunks, each of which can be either a byte sequence or a {{File}}: + +

      +
    1. Set encoding to the result of getting an output encoding from + encoding. + +

    2. Let boundary be the result of + generating a + multipart/form-data boundary. + +

    3. Let outputChunks be an empty list. + +

    4. +

      For each entry of entries: + +

        +
      1. Let chunk be a byte sequence containing `--`, + followed by boundary, followed by 0x0D 0x0A (CR LF). + +

      2. Append `Content-Disposition: form-data; name="`, followed by the result of + escaping a multipart/form-data name given entry's + name and encoding, followed by 0x22 ("), to + chunk. + +

      3. Let value be entry's value. + +

      4. +

        If value is a string: + +

          +
        1. Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to chunk. + +

        2. Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every + occurrence of U+000A (LF) not preceded by U+000D (CR), in value, by a string + consisting of U+000D (CR) and U+000A (LF). + +

        3. Append the result of encoding value with + encoding to chunk. + +

        4. Append 0x0D 0x0A (CR LF) to chunk. + +

        5. Append chunk to outputChunks. +

        + +
      5. +

        Otherwise: + +

          +
        1. Assert: value is a {{File}}. + +

        2. Append `; filename="`, followed by the result of + escaping a multipart/form-data name given value's {{File/name}} + with encoding and isFilename set to true, + followed by 0x22 0x0D 0x0A (" CR LF), to chunk. + +

        3. Let type be value's {{Blob/type}}, if it is not the empty string, + or "application/octet-stream" otherwise. + +

        4. Append `Content-Type: `, followed by the result of + isomorphic encoding type, to chunk. + +

        5. Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to chunk. + +

        6. Append chunk, followed by value, followed by the + byte sequence 0x0D 0x0A (CR LF), to outputChunks. +

        +
      + +
    5. Append the byte sequence containing `--`, + followed by boundary, followed by `--`, followed by 0x0D 0x0A (CR LF), to + outputChunks. + +

    6. Return the tuple (boundary, outputChunks). +

    +
    + +
    + +
    +

    The length of a multipart/form-data +payload, given a list of chunks chunks which can be either byte sequences or +{{File}} objects, is the result of running the following steps: + +

      +
    1. Let length be 0. + +

    2. +

      For each chunk of chunks: + +

        +
      1. +

        If chunk is a byte sequence: + +

          +
        1. Increase length by chunk's length. +

        + +
      2. +

        Otherwise: + +

          +
        1. Assert: chunk is a {{File}}. + +

        2. Increase length by chunk's {{Blob/size}}. +

        +
      + +
    3. Return length. +

    +
    + +
    +

    To create a multipart/form-data readable stream from a list of +chunks chunks which can be either byte sequences or {{File}} objects: + +

      +
    1. Let fileStream be null. + +

    2. Let stream be a new {{ReadableStream}}. + +

    3. +

      Let pullAlgorithm be an algorithm that runs the following steps: + +

      +
      If fileStream is null and chunks is not empty +
      +
        +
      1. +

        If chunks[0] is a byte sequence, then + enqueue the result of + [=ArrayBufferView/create|creating=] a {{Uint8Array}} from chunks[0] into + stream. + +

      2. +

        Otherwise: + +

          +
        1. Assert: chunks[0] is a {{File}} object. + +

        2. Set fileStream to the result of running chunks[0]'s + {{Blob/stream}} method. + +

        3. Run pullAlgorithm. +

        + +
      3. Remove the first item from chunks. +

      + +
      If fileStream is null and chunks is empty +
      +
        +
      1. Close stream. +

      + +
      If fileStream is not null +
      +
        +
      1. +

        Let readRequest be a new read request with the following + items: + +

        +
        chunk steps, given chunk +
        +
          +
        1. If chunk is not a {{Uint8Array}} object, then error + stream with a {{TypeError}} and abort these steps. + +

        2. Enqueue chunk into stream. +

        + +
        close steps +
        +
          +
        1. Set fileStream to null. + +

        2. Run pullAlgorithm. +

        + +
        error steps, given e +
        +
          +
        1. Error stream with e. +

        +
        + +
      2. Let reader be the result of getting a reader for + fileStream. + +

      3. Read a chunk from reader with + readRequest. +

      +
      + +
    4. +

      Let cancelAlgorithm be an algorithm that runs the following steps, given + reason: + +

        +
      1. If fileStream is not null, then cancel + fileStream with reason. +

      + +
    5. Set up stream with + pullAlgorithm set to pullAlgorithm and + cancelAlgorithm set to cancelAlgorithm. + +

    6. Return stream. +

    +
    + +

    Parsing

    + +
    +

    The multipart/form-data parser takes a byte sequence +input and a MIME type mimeType, and returns either an +entry list or failure: + +

      +
    1. Assert: mimeType's essence is + "multipart/form-data". + +

    2. If mimeType's parameters["boundary"] does not + exist, then return failure. + +

    3. +

      Let boundary be the result of UTF-8 encoding mimeType's + parameters["boundary"]. + +

      The definition of MIME type in Mime Sniffing has the + parameter values being ASCII strings, but the + parse a MIME type algorithm can create + MIME type records containing non-ASCII parameter values. See + whatwg/mimesniff#141. + +

    4. Let entryList be an empty entry list. + +

    5. Let position be a pointer to a byte in input, initially pointing at + the first byte. + +

    6. +

      While true: + +

        +
      1. If position does not point to a sequence of bytes starting with 0x2D 0x2D + (`--`) followed by boundary, then return failure. + +

      2. Advance position by 2 + the length of boundary. + +

      3. Collect a sequence of bytes that are HTTP tab or space bytes given + position. (Do nothing with those bytes.) + +

      4. +

        If position points to a sequence of bytes starting with 0x2D 0x2D + (`--`): + +

          +
        1. If position + 2 points to the end of input, or position + + 2 points to a sequence of bytes starting with 0x0D 0x0A (CR LF), then return + entryList. +

        + +
      5. If position does not point to a sequence of bytes starting with 0x0D 0x0A + (CR LF), then return failure. + +

      6. Advance position by 2. (This skips past the newline.) + +

      7. Let result be the result of parsing multipart/form-data + headers on input and position. + +

      8. If result is failure, then return failure. + +

      9. Let (name, filename, contentType) be + result. + +

      10. Advance position by 2. (This skips past the empty line that marks the end of + the headers.) + +

      11. Let body be the empty byte sequence. + +

      12. +

        Body loop: While position is not past the end of input: + +

          +
        1. Append the byte at position to body. + +

        2. Advance position by 1. + +

        3. If body ends with boundary: + +

            +
          1. Remove the last 4 + (length of boundary) bytes from body. + +

          2. Decrease position by 4 + (length of boundary). + +

          3. Break out of body loop. +

          +
        + +
      13. If position does not point to a sequence of bytes starting with 0x0D 0x0A + (CR LF), then return failure. + +

      14. Advance position by 2. + +

      15. +

        If filename is not null: + +

          +
        1. If contentType is null, then set contentType to + "text/plain". + +

        2. If contentType is not an ASCII string, then set contentType to + the empty string. + +

        3. Let value be a new {{File}} object with name filename, type + contentType, and body body. +

        + +
      16. +

        Otherwise: + +

          +
        1. Let value be the UTF-8 decode without BOM of body. +

        + +
      17. Assert: name is a scalar value string and + value is either a scalar value string or a {{File}} object. + +

      18. Create an entry with name and value, and + append it to entryList. +

      +
    +
    + +
    +

    To parse multipart/form-data headers, given a +byte sequence input and a pointer into it position: + +

      +
    1. Let name, filename and contentType be null. + +

    2. +

      While true: + +

        +
      1. +

        If position points to a sequence of bytes starting with 0x0D 0x0A (CR LF): + +

          +
        1. If name is null, then return failure. + +

        2. Return (name, filename, contentType). +

        + +
      2. Let headerName be the result of collecting a sequence of bytes that are not + 0x0A (LF), 0x0D (CR) or 0x3A (:), given position. + +

      3. Remove any HTTP tab or space bytes from the start or end of + headerName. + +

      4. If headerName does not match the field-name token + production, then return failure. + +

      5. If the byte at position is not 0x3A (:), then return failure. + +

      6. Advance position by 1. + +

      7. Collect a sequence of bytes that are HTTP tab or space bytes given + position. (Do nothing with those bytes.) + +

      8. +

        Byte-lowercase headerName and switch on the result: + +

        +
        `content-disposition` +
        +
          +
        1. Set name and filename to null. + +

        2. If position does not point to a sequence of bytes starting with + `form-data; name="`, then return failure. + +

        3. Advance position so it points at the byte after the next 0x22 (") byte (the + one in the sequence of bytes matched above). + +

        4. Set name to the result of parsing a multipart/form-data + name given input and position. + +

        5. If name is failure, then return failure. + +

        6. +

          If position points to a sequence of bytes starting with + `; filename="`: + +

            +
          1. Advance position so it points at the byte after the next 0x22 (") byte + (the one in the sequence of bytes matched above). + +

          2. Set filename to the result of + parsing a multipart/form-data name given input and + position. + +

          3. If filename is failure, then return failure. +

          +
        + +
        `content-type` +
        +
          +
        1. Let headerValue be the result of collecting a sequence of bytes that are + not 0x0A (LF) or 0x0D (CR), given position. + +

        2. Remove any HTTP tab or space bytes from the end of headerValue. + +

        3. Set contentType to the isomorphic decoding of + headerValue. +

        + +
        Otherwise +

        Collect a sequence of bytes that are not 0x0A (LF) or 0x0D (CR), given + position. (Do nothing with those bytes.) +

        + +
      9. If position does not point to a sequence of bytes starting with 0x0D 0x0A + (CR LF), then return failure. + +

      10. Advance position by 2. (This skips past the newline.) +

      +
    +
    + +
    +

    To parse a multipart/form-data name, given a +byte sequence input and a pointer into it position: + +

      +
    1. Assert: The byte at (position − 1) is 0x22 ("). + +

    2. Let name be the result of collecting a sequence of bytes that are not 0x0A (LF), + 0x0D (CR) or 0x22 ("), given position. + +

    3. If the byte at position is not 0x22 ("), then return failure. + +

    4. Advance position by 1. + +

    5. +

      Replace any occurrence of the following subsequences in name with the given byte: + +

      +
      `%0A` +

      0x0A (LF) + +

      `%0D` +

      0x0D (CR) + +

      `%22` +

      0x22 (") +

      + +
    6. Return the UTF-8 decode without BOM of name. +

    +
    +

    Background reading

    @@ -10166,6 +10677,7 @@ Alan Jeffrey, Alexey Proskuryakov, Andreas Kling, Andrés Gutiérrez, +Andreu Botella, Andrew Sutherland, Andrew Williams, Ángel González, From 4af534c22feaddb7f8ccbff1adec5f219cc7218b Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 14 Apr 2026 17:39:58 +0200 Subject: [PATCH 2/2] nits --- fetch.bs | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/fetch.bs b/fetch.bs index 1a68fcce0..c4a4f1339 100755 --- a/fetch.bs +++ b/fetch.bs @@ -9802,11 +9802,13 @@ boundary is a byte sequence such that: boundaries, such that part of it is randomly generated, with a minimum entropy of 95 bits. -

    Previous definitions of multipart/form-data required that the -multipart/form-data/boundary associated with a multipart/form-data payload not -be present anywhere in the payload other than as a delimiter, although they allow for generating the -multipart/form-data/boundary probabilistically. Since this generation algorithm is separate -from a payload, however, it has to specify a minimum entropy instead. [[RFC7578]] [[RFC2046]] +

    Previous definitions of multipart/form-data specified that the +multipart/form-data boundary associated +with a multipart/form-data payload not be present anywhere in the payload other than as +a delimiter, although they allow for generating the +multipart/form-data boundary +probabilistically. Since this generation algorithm is separate from a payload, however, it has to +specify a minimum entropy instead. [[RFC7578]] [[RFC2046]]

    If a user agent generates multipart/form-data boundaries with a length of 27 and an entropy of 95 bits, given a payload made specifically to generate collisions with that @@ -9902,7 +9904,7 @@ of chunks, each of which can be either a byte sequence or a {{File}

  • Append `; filename="`, followed by the result of escaping a multipart/form-data name given value's {{File/name}} - with encoding and isFilename set to true, + with encoding and isFilename set to true, followed by 0x22 0x0D 0x0A (" CR LF), to chunk.

  • Let type be value's {{Blob/type}}, if it is not the empty string, @@ -10065,7 +10067,7 @@ chunks chunks which can be either byte sequences or {{Fi

    The multipart/form-data parser takes a byte sequence -input and a MIME type mimeType, and returns either an +input and a MIME type mimeType, and returns either an entry list or failure:

      @@ -10079,7 +10081,7 @@ chunks chunks which can be either byte sequences or {{Fi

      Let boundary be the result of UTF-8 encoding mimeType's parameters["boundary"]. -

      The definition of MIME type in Mime Sniffing has the +

      The definition of MIME type in Mime Sniffing has the parameter values being ASCII strings, but the parse a MIME type algorithm can create MIME type records containing non-ASCII parameter values. See @@ -10099,7 +10101,7 @@ chunks chunks which can be either byte sequences or {{Fi

    1. Advance position by 2 + the length of boundary. -

    2. Collect a sequence of bytes that are HTTP tab or space bytes given +

    3. Collect a sequence of bytes that are HTTP tab or space bytes given position. (Do nothing with those bytes.)

    4. @@ -10107,9 +10109,9 @@ chunks chunks which can be either byte sequences or {{Fi (`--`):
        -
      1. If position + 2 points to the end of input, or position - + 2 points to a sequence of bytes starting with 0x0D 0x0A (CR LF), then return - entryList. +

      2. If position + 2 points to the end of input, or + position + 2 points to a sequence of bytes starting with 0x0D 0x0A (CR LF), then + return entryList.

    5. If position does not point to a sequence of bytes starting with 0x0D 0x0A @@ -10217,7 +10219,7 @@ chunks chunks which can be either byte sequences or {{Fi

    6. Advance position by 1. -

    7. Collect a sequence of bytes that are HTTP tab or space bytes given +

    8. Collect a sequence of bytes that are HTTP tab or space bytes given position. (Do nothing with those bytes.)

    9. @@ -10283,7 +10285,7 @@ chunks chunks which can be either byte sequences or {{Fi

      To parse a multipart/form-data name, given a -byte sequence input and a pointer into it position: +byte sequence input and a pointer into it position:

      1. Assert: The byte at (position − 1) is 0x22 (").