<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/python-packages/paste-git.git/tests, branch 2.0.2</title>
<subtitle>github.com: cdent/paste.git
</subtitle>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/paste-git.git/'/>
<entry>
<title>test_wsgirequest_charset: Use UTF-8 instead of iso-8859-1</title>
<updated>2015-05-01T00:39:24+00:00</updated>
<author>
<name>Marc Abramowitz</name>
<email>marc@marc-abramowitz.com</email>
</author>
<published>2015-05-01T00:39:24+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/python-packages/paste-git.git/commit/?id=fa100c92c06d3a8a61a0dda1a2e06018437b09c6'/>
<id>fa100c92c06d3a8a61a0dda1a2e06018437b09c6</id>
<content type='text'>
because it seems that the defacto standard for encoding URIs is to use UTF-8.

I've been reading about url encoding and it seems like perhaps using an
encoding other than UTF-8 is very non-standard and not well-supported (this
test is trying to use `iso-8859-1`).

From http://en.wikipedia.org/wiki/Percent-encoding

&gt; For a non-ASCII character, it is typically converted to its byte sequence in
&gt; UTF-8, and then each byte value is represented as above.

&gt; The generic URI syntax mandates that new URI schemes that provide for the
&gt; representation of character data in a URI must, in effect, represent
&gt; characters from the unreserved set without translation, and should convert
&gt; all other characters to bytes according to UTF-8, and then percent-encode
&gt; those values. This requirement was introduced in January 2005 with the
&gt; publication of RFC 3986

From http://tools.ietf.org/html/rfc3986:

&gt; Non-ASCII characters must first be encoded according to UTF-8 [STD63], and
&gt; then each octet of the corresponding UTF-8 sequence must be percent-encoded
&gt; to be represented as URI characters.  URI producing applications must not use
&gt; percent-encoding in host unless it is used to represent a UTF-8 character
&gt; sequence.

From http://tools.ietf.org/html/rfc3987:

&gt; Conversions from URIs to IRIs MUST NOT use any character encoding other than
&gt; UTF-8 in steps 3 and 4, even if it might be possible to guess from the
&gt; context that another character encoding than UTF-8 was used in the URI.  For
&gt; example, the URI "http://www.example.org/r%E9sum%E9.html" might with some
&gt; guessing be interpreted to contain two e-acute characters encoded as
&gt; iso-8859-1. It must not be converted to an IRI containing these e-acute
&gt; characters.  Otherwise, in the future the IRI will be mapped to
&gt; "http://www.example.org/r%C3%A9sum%C3%A9.html", which is a different URI from
&gt; "http://www.example.org/r%E9sum%E9.html".

See issue #7, which I think this at least partially fixes.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
because it seems that the defacto standard for encoding URIs is to use UTF-8.

I've been reading about url encoding and it seems like perhaps using an
encoding other than UTF-8 is very non-standard and not well-supported (this
test is trying to use `iso-8859-1`).

From http://en.wikipedia.org/wiki/Percent-encoding

&gt; For a non-ASCII character, it is typically converted to its byte sequence in
&gt; UTF-8, and then each byte value is represented as above.

&gt; The generic URI syntax mandates that new URI schemes that provide for the
&gt; representation of character data in a URI must, in effect, represent
&gt; characters from the unreserved set without translation, and should convert
&gt; all other characters to bytes according to UTF-8, and then percent-encode
&gt; those values. This requirement was introduced in January 2005 with the
&gt; publication of RFC 3986

From http://tools.ietf.org/html/rfc3986:

&gt; Non-ASCII characters must first be encoded according to UTF-8 [STD63], and
&gt; then each octet of the corresponding UTF-8 sequence must be percent-encoded
&gt; to be represented as URI characters.  URI producing applications must not use
&gt; percent-encoding in host unless it is used to represent a UTF-8 character
&gt; sequence.

From http://tools.ietf.org/html/rfc3987:

&gt; Conversions from URIs to IRIs MUST NOT use any character encoding other than
&gt; UTF-8 in steps 3 and 4, even if it might be possible to guess from the
&gt; context that another character encoding than UTF-8 was used in the URI.  For
&gt; example, the URI "http://www.example.org/r%E9sum%E9.html" might with some
&gt; guessing be interpreted to contain two e-acute characters encoded as
&gt; iso-8859-1. It must not be converted to an IRI containing these e-acute
&gt; characters.  Otherwise, in the future the IRI will be mapped to
&gt; "http://www.example.org/r%C3%A9sum%C3%A9.html", which is a different URI from
&gt; "http://www.example.org/r%E9sum%E9.html".

See issue #7, which I think this at least partially fixes.
</pre>
</div>
</content>
</entry>
</feed>
