Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Zope: Dev

Non-ASCII characters in URLs

 

 

Zope dev RSS feed   Index | Next | Previous | View Threaded


limi at plone

Apr 6, 2008, 4:37 PM

Post #1 of 15 (491 views)
Permalink
Non-ASCII characters in URLs

Hi,

Is there a good technical explanation for why Zope doesn't allow non-ASCII
characters in URLs?

I'd like to be able to let URLs work like this example from Wikipedia:

http://ja.wikipedia.org/wiki/メインページ

When I try adding an object with ID "メインページ" in Zope 2, I get the
following error message:

Error Type: BadRequest
Error Value: The id
"メインページ"
contains characters illegal in URLs.

Is there a fundamental reason (ie. Python objects can only be ASCII) or is
it simply bugs that need to be fixed?


Curiously yours,

--
Alexander Limi · http://limi.net

_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


slinkp at gmail

Apr 6, 2008, 7:51 PM

Post #2 of 15 (470 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

On Sun, Apr 06, 2008 at 04:37:22PM -0700, Alexander Limi wrote:
> Hi,
>
> Is there a good technical explanation for why Zope doesn't allow non-ASCII
> characters in URLs?

I suspect it's only for hysterical raisins. The code in question is
in OFS/ObjectManager.py, in the checkValidId() function. Non-ASCII
characters trigger a match on the bad_id regular expression search.
As I recall, if you look at the revision history, that code is very
old.

There might even be an existing bug filed about this; I don't
remember.

--

Paul Winkler
http://www.slinkp.com
_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


lists at zopyx

Apr 6, 2008, 9:24 PM

Post #3 of 15 (473 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

--On 6. April 2008 16:37:22 -0700 Alexander Limi <limi[at]plone.org> wrote:

> Hi,
>
> Is there a good technical explanation for why Zope doesn't allow
> non-ASCII characters in URLs?
>
> I'd like to be able to let URLs work like this example from Wikipedia:
>
> http://ja.wikipedia.org/wiki/メインページ
>
> When I try adding an object with ID "メインページ" in Zope 2, I get
> the following error message:
>
> Error Type: BadRequest
> Error Value: The id
> "&amp;#12513;&amp;#12452;&amp;#12531;&amp;#12506;&amp;#12540;&amp;#12472;
> " contains characters illegal in URLs.
>
> Is there a fundamental reason (ie. Python objects can only be ASCII) or
> is it simply bugs that need to be fixed?
>

As Paul indicated: the issue dates back to the times when there was only
ASCII in the URL world. Especially object IDs have to be ascii - well...Zope
came from US :-)

Andreas


mj at zopatista

Apr 7, 2008, 1:39 AM

Post #4 of 15 (469 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi <limi[at]plone.org> wrote:
> Is there a good technical explanation for why Zope doesn't allow non-ASCII
> characters in URLs?

Because URLs don't allow non-ASCII characters?

> I'd like to be able to let URLs work like this example from Wikipedia:
>
> http://ja.wikipedia.org/wiki/メインページ

Your browser translates that into
http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8

> Is there a fundamental reason (ie. Python objects can only be ASCII) or is
> it simply bugs that need to be fixed?

RFC 1738 (http://www.ietf.org/rfc/rfc1738.txt) doesn't allow non-ascii
characters in URLs.

No corresponding graphic US-ASCII:

URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
control characters; these must be encoded.

Now, Zope could well support UTF-8 ids, and translate URLs
appropriately, but in the meantime you could use the same scheme?

--
Martijn Pieters


dev101 at magma

Apr 7, 2008, 5:32 AM

Post #5 of 15 (462 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

----- Original Message -----
From: "Martijn Pieters" <mj[at]zopatista.com>
To: "Alexander Limi" <limi[at]plone.org>
Cc: <zope-dev[at]zope.org>
Sent: Monday, April 07, 2008 4:39 AM
Subject: Re: [Zope-dev] Non-ASCII characters in URLs


> On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi <limi[at]plone.org> wrote:
>> Is there a good technical explanation for why Zope doesn't allow
>> non-ASCII
>> characters in URLs?
>
> Because URLs don't allow non-ASCII characters?
>
>> I'd like to be able to let URLs work like this example from Wikipedia:
>>
>> http://ja.wikipedia.org/wiki/メインページ
>
> Your browser translates that into
> http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8
>
>> Is there a fundamental reason (ie. Python objects can only be ASCII) or
>> is
>> it simply bugs that need to be fixed?
>
> RFC 1738 (http://www.ietf.org/rfc/rfc1738.txt) doesn't allow non-ascii
> characters in URLs.
>
> No corresponding graphic US-ASCII:
>
> URLs are written only with the graphic printable characters of the
> US-ASCII coded character set. The octets 80-FF hexadecimal are not
> used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
> control characters; these must be encoded.
>
> Now, Zope could well support UTF-8 ids, and translate URLs
> appropriately, but in the meantime you could use the same scheme?

IDNA (http://www.ietf.org/rfc/rfc3490.txt) and Punycode
(http://www.faqs.org/rfcs/rfc3492.html) may be of some use.

Jonathan


_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


dieter at handshake

Apr 7, 2008, 11:38 AM

Post #6 of 15 (462 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

Martijn Pieters wrote at 2008-4-7 10:39 +0200:
>On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi <limi[at]plone.org> wrote:
>> Is there a good technical explanation for why Zope doesn't allow non-ASCII
>> characters in URLs?
>
>Because URLs don't allow non-ASCII characters?

Almost surely, Alexander wants to ask why Zope does not allow
non-ASCII characters in ids.

And, in fact, there are only two reasons:

* lazyness of the Zope developpers:

without the restriction to ASCII characters
careful quoting (and unquoting) is necessary
in order to adhere to RFC 2396 (the modern uri syntax specification)

* there is no way to specify the encoding used for non ASCII characters.

HTML 4 suggests to convert non ASCII characters first to
UTF-8 and then url escape the result
but most HTTP clients do not follow this suggestion.
Instead, they use the charset found one the page
that cause them to construct the uri.

I have observed that MS WebDAV from some WebDAV commands
transfers the url as given and for some other
commands recodes them into utf-8.

Thus, supporting non ASCII ids occationally may cause
surprises.



--
Dieter
_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


wichert at wiggy

Apr 7, 2008, 11:45 AM

Post #7 of 15 (462 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

Previously Dieter Maurer wrote:
> Martijn Pieters wrote at 2008-4-7 10:39 +0200:
> >On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi <limi[at]plone.org> wrote:
> >> Is there a good technical explanation for why Zope doesn't allow non-ASCII
> >> characters in URLs?
> >
> >Because URLs don't allow non-ASCII characters?
>
> Almost surely, Alexander wants to ask why Zope does not allow
> non-ASCII characters in ids.
>
> And, in fact, there are only two reasons:
>
> * lazyness of the Zope developpers:
>
> without the restriction to ASCII characters
> careful quoting (and unquoting) is necessary
> in order to adhere to RFC 2396 (the modern uri syntax specification)

This is becoming increasingly painful: it means we can't really use Active
Directory's ObjectGUID as userid, it breaks with LDAP DN's with
non-ASCII characters (all too common). I really wish Zope ID's were
either binary strings or unicode strings.

> * there is no way to specify the encoding used for non ASCII characters.
>
> HTML 4 suggests to convert non ASCII characters first to
> UTF-8 and then url escape the result
> but most HTTP clients do not follow this suggestion.
> Instead, they use the charset found one the page
> that cause them to construct the uri.
>
> I have observed that MS WebDAV from some WebDAV commands
> transfers the url as given and for some other
> commands recodes them into utf-8.
>
> Thus, supporting non ASCII ids occationally may cause
> surprises.

You mean non ASCII URI's, not non ASCII ids here I suspect. Somehow I'm
not surprised those are painful :(

Wichert.

--
Wichert Akkerman <wichert[at]wiggy.net> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.
_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


dieter at handshake

Apr 7, 2008, 12:45 PM

Post #8 of 15 (456 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

Wichert Akkerman wrote at 2008-4-7 20:45 +0200:
> ...
>> Almost surely, Alexander wants to ask why Zope does not allow
>> non-ASCII characters in ids.
>>
>> And, in fact, there are only two reasons:
>>
>> * lazyness of the Zope developpers:
>>
>> without the restriction to ASCII characters
>> careful quoting (and unquoting) is necessary
>> in order to adhere to RFC 2396 (the modern uri syntax specification)
>
>This is becoming increasingly painful

I will soon have a patch against Zope 2.11b1
which gets rid of this restriction.

If there is consense, I can add it to the Zope repository.

> ...
>> * there is no way to specify the encoding used for non ASCII characters.
>>
>> HTML 4 suggests to convert non ASCII characters first to
>> UTF-8 and then url escape the result
>> but most HTTP clients do not follow this suggestion.
>> Instead, they use the charset found one the page
>> that cause them to construct the uri.
>>
>> I have observed that MS WebDAV from some WebDAV commands
>> transfers the url as given and for some other
>> commands recodes them into utf-8.
>>
>> Thus, supporting non ASCII ids occationally may cause
>> surprises.
>
>You mean non ASCII URI's, not non ASCII ids here I suspect. Somehow I'm
>not surprised those are painful :(

No, I mean non-ASCII ids.

They lead to uris with some escaped characters and MS WebDAV for some commands
unescapes the uris, interprets them in some default charset ("windows-1252"
in our case), recodes them in utf-8,
escapes them again and then uses them in the commands.
Examples are the COPY and MOVE commands. If an object has
a non ASCII charater in its id, say "tüv", its url
may look like "http:.../t%FCv". Used in a "COPY" or "MOVE",
it is however represented as "http:.../t%C2%BCb".



--
Dieter
_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


limi at plone

Apr 8, 2008, 5:14 PM

Post #9 of 15 (450 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

On Mon, 07 Apr 2008 05:32:17 -0700, Jonathan <dev101[at]magma.ca> wrote:

> IDNA (http://www.ietf.org/rfc/rfc3490.txt) and Punycode
> (http://www.faqs.org/rfcs/rfc3492.html) may be of some use.

I'm not looking for non-ASCII domain names, just object IDs. :)

--
Alexander Limi · http://limi.net

_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


limi at plone

Apr 8, 2008, 5:15 PM

Post #10 of 15 (448 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

On Mon, 07 Apr 2008 12:45:00 -0700, Dieter Maurer <dieter[at]handshake.de>
wrote:

> Wichert Akkerman wrote at 2008-4-7 20:45 +0200:
>> This is becoming increasingly painful
>
> I will soon have a patch against Zope 2.11b1
> which gets rid of this restriction.
>
> If there is consense, I can add it to the Zope repository.

I would love to see support for non-ASCII object IDs, +1. (obviously not
based on any technical understanding from my side :)

--
Alexander Limi · http://limi.net

_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


tino at wildenhain

Apr 9, 2008, 3:43 AM

Post #11 of 15 (447 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

Dieter Maurer wrote:
> Wichert Akkerman wrote at 2008-4-7 20:45 +0200:
>> ...
>>> Almost surely, Alexander wants to ask why Zope does not allow
>>> non-ASCII characters in ids.
>>>
>>> And, in fact, there are only two reasons:
>>>
>>> * lazyness of the Zope developpers:
>>>
>>> without the restriction to ASCII characters
>>> careful quoting (and unquoting) is necessary
>>> in order to adhere to RFC 2396 (the modern uri syntax specification)
>> This is becoming increasingly painful
>
> I will soon have a patch against Zope 2.11b1
> which gets rid of this restriction.
>
> If there is consense, I can add it to the Zope repository.

+1 from my side. Saves me the work to cleanup my own dirty
patch :-))

_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


tseaver at palladion

Apr 12, 2008, 6:43 PM

Post #12 of 15 (435 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tino Wildenhain wrote:
> Dieter Maurer wrote:
>> Wichert Akkerman wrote at 2008-4-7 20:45 +0200:
>>> ...
>>>> Almost surely, Alexander wants to ask why Zope does not allow
>>>> non-ASCII characters in ids.
>>>>
>>>> And, in fact, there are only two reasons:
>>>>
>>>> * lazyness of the Zope developpers:
>>>>
>>>> without the restriction to ASCII characters
>>>> careful quoting (and unquoting) is necessary
>>>> in order to adhere to RFC 2396 (the modern uri syntax specification)
>>> This is becoming increasingly painful
>> I will soon have a patch against Zope 2.11b1
>> which gets rid of this restriction.
>>
>> If there is consense, I can add it to the Zope repository.
>
> +1 from my side. Saves me the work to cleanup my own dirty
> patch :-))

- -1 without *careful* analysis of how the patch is going to break
existing applications which rely on the fact that IDs are only ASCII
(and therefore don't need to be quoted). At a minimum, this kind of
change is going to require documenting the risks, and getting soem
feedback, before any merge to a production release.

Please check the patch in on a "private" branch and ask for comments here.


Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tseaver[at]palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIAWVM+gerLs4ltQ4RAhsAAKDCLcUAb+ZzzYBJZ2OdoZeDKQ49MwCbBpNH
r7gkEMLDz/mzfyCoyMoHgZc=
=/p2I
-----END PGP SIGNATURE-----
_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


tseaver at palladion

Apr 12, 2008, 6:43 PM

Post #13 of 15 (434 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tino Wildenhain wrote:
> Dieter Maurer wrote:
>> Wichert Akkerman wrote at 2008-4-7 20:45 +0200:
>>> ...
>>>> Almost surely, Alexander wants to ask why Zope does not allow
>>>> non-ASCII characters in ids.
>>>>
>>>> And, in fact, there are only two reasons:
>>>>
>>>> * lazyness of the Zope developpers:
>>>>
>>>> without the restriction to ASCII characters
>>>> careful quoting (and unquoting) is necessary
>>>> in order to adhere to RFC 2396 (the modern uri syntax specification)
>>> This is becoming increasingly painful
>> I will soon have a patch against Zope 2.11b1
>> which gets rid of this restriction.
>>
>> If there is consense, I can add it to the Zope repository.
>
> +1 from my side. Saves me the work to cleanup my own dirty
> patch :-))

- -1 without *careful* analysis of how the patch is going to break
existing applications which rely on the fact that IDs are only ASCII
(and therefore don't need to be quoted). At a minimum, this kind of
change is going to require documenting the risks, and getting soem
feedback, before any merge to a production release.

Please check the patch in on a "private" branch and ask for comments here.


Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tseaver[at]palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIAWVM+gerLs4ltQ4RAhsAAKDCLcUAb+ZzzYBJZ2OdoZeDKQ49MwCbBpNH
r7gkEMLDz/mzfyCoyMoHgZc=
=/p2I
-----END PGP SIGNATURE-----

_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


lists at zopyx

Apr 12, 2008, 11:30 PM

Post #14 of 15 (432 views)
Permalink
Re: Re: Non-ASCII characters in URLs [In reply to]

--On 12. April 2008 21:43:40 -0400 Tres Seaver <tseaver[at]palladion.com>
wrote:


>>>> This is becoming increasingly painful
>>> I will soon have a patch against Zope 2.11b1
>>> which gets rid of this restriction.
>>>
>>> If there is consense, I can add it to the Zope repository.
>>
>> +1 from my side. Saves me the work to cleanup my own dirty
>> patch :-))
>
> - -1 without *careful* analysis of how the patch is going to break
> existing applications which rely on the fact that IDs are only ASCII
> (and therefore don't need to be quoted). At a minimum, this kind of
> change is going to require documenting the risks, and getting soem
> feedback, before any merge to a production release.
>
> Please check the patch in on a "private" branch and ask for comments here.

@Dieter: please create a branch for this (and not as patch for Launchpad)

The patch is working for long time (possibly several years) within our
private Zope. So I would not expect much problems. Of course it needs
testing and documentation.

Andreas


dieter at handshake

Apr 30, 2008, 6:19 AM

Post #15 of 15 (263 views)
Permalink
Re: Non-ASCII characters in URLs [In reply to]

Tres Seaver wrote at 2008-4-12 21:43 -0400:
> ...
>> Dieter Maurer wrote:
>>> Wichert Akkerman wrote at 2008-4-7 20:45 +0200:
>>>> ...
>>>>> Almost surely, Alexander wants to ask why Zope does not allow
>>>>> non-ASCII characters in ids.
>>>>>
>>>>> And, in fact, there are only two reasons:
>>>>>
>>>>> * lazyness of the Zope developpers:
>>>>>
>>>>> without the restriction to ASCII characters
>>>>> careful quoting (and unquoting) is necessary
>>>>> in order to adhere to RFC 2396 (the modern uri syntax specification)
>>>> This is becoming increasingly painful
>>> I will soon have a patch against Zope 2.11b1
>>> which gets rid of this restriction.
>>>
>>> If there is consense, I can add it to the Zope repository.
>>
>> +1 from my side. Saves me the work to cleanup my own dirty
>> patch :-))
>
>- -1 without *careful* analysis of how the patch is going to break
>existing applications which rely on the fact that IDs are only ASCII
>(and therefore don't need to be quoted). At a minimum, this kind of
>change is going to require documenting the risks, and getting soem
>feedback, before any merge to a production release.
>
>Please check the patch in on a "private" branch and ask for comments here.

Implemented on "http://svn.zope.org/Zope/branches/dm-arbitrary-ids/".



--
Dieter
_______________________________________________
Zope-Dev maillist - Zope-Dev[at]zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )

Zope dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.