Originally opened by @nathaniel-daniel on 2025-05-07 23:19:21 in encode/httpx
I got no response, so I'm opening this issue for more visibility.
OS: Windows 11
python --version: Python 3.12.8
httpx version: 0.28.1
I believe the | should be percent encoded in paths, which is not currently the case. If I'm understanding RFC3986 correctly, path characters are pchar, which can be unreserved, pct-encoded, sub-delims, ":", or "@". unreserved can be composed of ALPHA, DIGIT, "-", ".", "_", or "~". pct-encoded is the percent encoding sequences. sub-delims can be "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", or "=". Nowhere in this set is the | character present, meaning it has to be percent-encoded.
Simplifying my problem, httpx seems to call its internal urlparse function to process urls. So, here's an example using that function. This function normally percent-encodes characters as needed, like spaces:
httpx._urlparse.urlparse('http://example.com/ ')
will return
ParseResult(scheme='http', userinfo='', host='example.com', port=None, path='/%20', query=None, fragment=None)
However, this does not happen for |:
httpx._urlparse.urlparse('http://example.com/|')
will return
ParseResult(scheme='http', userinfo='', host='example.com', port=None, path='/|', query=None, fragment=None)
In Firefox and Google Chrome, | is percent-encoded:
encodeURI('http://example.com/|')
will return
In the requests library, | is also percent-encoded:
requests.utils.requote_uri('http://example.com/|')
will return
The rfc3986 library also percent encodes |:
rfc3986.urlparse('http://example.com/|')
will return
ParseResult(scheme='http', userinfo=None, host='example.com', port=None, path='/%7C', query=None, fragment=None)
Using urllib itself, | also seems to be percent-encoded for path components:
will return
I'm fairly certain that I've interpreted this RFC right, and I think that | should be excluded from the PATH_SAFE set here. Here is its current value: "!$%&'()*+,-./0123456789:;=@abcdefghijklmnopqrstuvwxyz[\\]^_abcdefghijklmnopqrstuvwxyz|~".
Potential Fix: nathaniel-daniel/httpx@a2f327f
I got no response, so I'm opening this issue for more visibility.
OS: Windows 11
python --version:Python 3.12.8httpxversion:0.28.1I believe the
|should be percent encoded in paths, which is not currently the case. If I'm understanding RFC3986 correctly, path characters arepchar, which can beunreserved,pct-encoded,sub-delims,":", or"@".unreservedcan be composed ofALPHA,DIGIT,"-",".","_", or"~".pct-encodedis the percent encoding sequences.sub-delimscan be"!","$","&","'","(",")","*","+",",",";", or"=". Nowhere in this set is the|character present, meaning it has to be percent-encoded.Simplifying my problem,
httpxseems to call its internalurlparsefunction to process urls. So, here's an example using that function. This function normally percent-encodes characters as needed, like spaces:will return
However, this does not happen for
|:will return
In Firefox and Google Chrome,
|is percent-encoded:will return
In the
requestslibrary,|is also percent-encoded:will return
The
rfc3986library also percent encodes|:will return
Using
urllibitself,|also seems to be percent-encoded for path components:will return
'/%7C'I'm fairly certain that I've interpreted this RFC right, and I think that
|should be excluded from thePATH_SAFEset here. Here is its current value:"!$%&'()*+,-./0123456789:;=@abcdefghijklmnopqrstuvwxyz[\\]^_abcdefghijklmnopqrstuvwxyz|~".Potential Fix: nathaniel-daniel/httpx@a2f327f