Python's CGI and Javascripts uriEncode: A disconnect.

**Andrew Clover** · Jul 18 '05, 12:19 AM

Re: Python's CGI and Javascripts uriEncode: A disconnect.

Elf M. Sternberg <[email protected] m> wrote:
[color=blue]
> Netscape (and possibly even earlier browsers like Mosaic) used the
> plus symbol '+' as a substitute for the space in the last part of
> the URI[/color]

This is correct in a query parameter. eg. in ...?foo=abc+def , the symbol
is a space.

This is part of the specification for the media type
application/x-www-form-urlencoded, defined by HTML itself (section
17.13.4.1 of the 4.01 spec). This states that spaces should normally
be encoded as '+', however really using '%20' is just as good and
causes less confusion, so that's what newer browsers (and I) do.

Elsewhere, spaces should not be encoded as '+'.

The reasoning for this initial decision is unclear - presumably it is
intended to improve readability, but URIs with query parts are
generally not going to be very readable anyway.
[color=blue]
> The ECMA-262 "Javascript " standard now supported by both Netscape and
> Internet Explorer honor RFC 2396, translating spaces into their hex
> equivalent %20 and leaving pluses alone.[/color]

Depends which function you are talking about. The 'escape' and 'encodeURI'
built-in functions are not designed to encode single URI query parameter
values, they're designed to encode larger chunks of URI. As such they do
not need to encode plus characters.

The encodeURICompon ent function *does*, and it is this function that you
should use if you want some JavaScript code to submit a query parameter.

The only drawback is that encodeURICompon ent is relatively new, so you
won't find it on medium-old browsers like Netscape 4 and IE 5.0. (The
same goes for encodeURI - you only get 'escape' in older browsers.)
[color=blue]
> The Python library cgi.FieldStorag e decodes it backwards, expecting
> pluses to be spaces and %2b to represent pluses.[/color]

The Python library is correct per spec. If your scripts are not encoding
plus symbols in query parameters to %2B, they are at fault (and will go
equally wrong in any other language).

Possible solutions:

a. use encodeURICompon ent() instead. This is best, but won't work
universally.
b. use escape(), then replace any pluses in its output with %2B. This
is OK, but won't handle Unicode properly or predictably. (note: in IE,
encodeURI() also fails to handle Unicode predictably.)
c. roll your own encodeURICompon ent function.

It's a bit off-topic for c.l.py, but here's a (c.)-style solution I've used
before:

function encPar(wide) {
var narrow= encUtf8(wide);
var enc= '';
for (var i= 0; i<narrow.length ; i++) {
if (encPar_OK.inde xOf(narrow.char At(i))==-1)
enc= enc+encHex2(nar row.charCodeAt( i));
else
enc= enc+narrow.char At(i);
}
return enc;
}
var encPar_OK= 'abcdefghijklmn opqrstuvwxyzABC DEFGHIJKLMNOPQR STUVWXYZ'+
'0123456789*@-_./';

function encHex2(v) {
return '%'+encHex2_DIG ITS.charAt(v>>> 4)+encHex2_DIGI TS.charAt(v&0xF );
}
var encHex2_DIGITS= '0123456789ABCD EF';

function encUtf8(wide) {
var c, s;
var enc= '';
var i= 0;
while(i<wide.le ngth) {
c= wide.charCodeAt (i++);
// handle UTF-16 surrogates
if (c>=0xDC00 && c<0xE000) continue;
if (c>=0xD800 && c<0xDC00) {
if (i>=wide.length ) continue;
s= wide.charCodeAt (i++);
if (s<0xDC00 || c>=0xDE00) continue;
c= ((c-0xD800)<<10)+(s-0xDC00)+0x10000 ;
}
// output value
if (c<0x80) enc+=
String.fromChar Code(c);
else if (c<0x800) enc+=
String.fromChar Code(0xC0+(c>>6 ),0x80+(c&0x3F) );
else if (c<0x10000) enc+=
String.fromChar Code(0xE0+(c>>1 2),0x80+(c>>6&0 x3F),0x80+(c&0x 3F));
else enc+=
String.fromChar Code(0xF0+(c>>1 8),0x80+(c>>12& 0x3F),
0x80+(c>>6&0x3F ),0x80+(c&0x3F) );
}
return enc;
}

if that's of any use.

Kind of sucks having to do this, eh?

--
Andrew Clover
mailto:and@doxd esk.com

doxdesk.com: home

http://www.doxdesk.com/

Python's CGI and Javascripts uriEncode: A disconnect.

Python's CGI and Javascripts uriEncode: A disconnect.

Comment