Skip to content

Percent encoding | in paths #896

@Kludex

Description

@Kludex

Originally opened by @nathaniel-daniel on 2025-05-07 23:19:21 in encode/httpx

  • Initially raised as discussion #3479

I got no response, so I'm opening this issue for more visibility.

OS: Windows 11
python --version: Python 3.12.8
httpx version: 0.28.1

I believe the | should be percent encoded in paths, which is not currently the case. If I'm understanding RFC3986 correctly, path characters are pchar, which can be unreserved, pct-encoded, sub-delims, ":", or "@". unreserved can be composed of ALPHA, DIGIT, "-", ".", "_", or "~". pct-encoded is the percent encoding sequences. sub-delims can be "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", or "=". Nowhere in this set is the | character present, meaning it has to be percent-encoded.

Simplifying my problem, httpx seems to call its internal urlparse function to process urls. So, here's an example using that function. This function normally percent-encodes characters as needed, like spaces:

httpx._urlparse.urlparse('http://example.com/ ')

will return

ParseResult(scheme='http', userinfo='', host='example.com', port=None, path='/%20', query=None, fragment=None)

However, this does not happen for |:

httpx._urlparse.urlparse('http://example.com/|')

will return

ParseResult(scheme='http', userinfo='', host='example.com', port=None, path='/|', query=None, fragment=None)

In Firefox and Google Chrome, | is percent-encoded:

encodeURI('http://example.com/|') 

will return

"http://example.com/%7C"

In the requests library, | is also percent-encoded:

requests.utils.requote_uri('http://example.com/|')

will return

'http://example.com/%7C'

The rfc3986 library also percent encodes |:

rfc3986.urlparse('http://example.com/|')

will return

ParseResult(scheme='http', userinfo=None, host='example.com', port=None, path='/%7C', query=None, fragment=None)

Using urllib itself, | also seems to be percent-encoded for path components:

urllib.parse.quote('/|')

will return

'/%7C'

I'm fairly certain that I've interpreted this RFC right, and I think that | should be excluded from the PATH_SAFE set here. Here is its current value: "!$%&'()*+,-./0123456789:;=@abcdefghijklmnopqrstuvwxyz[\\]^_abcdefghijklmnopqrstuvwxyz|~".

Potential Fix: nathaniel-daniel/httpx@a2f327f

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions