Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: exim: users

IDN, UTF-8 and Punycode curiosity

 

 

exim users RSS feed   Index | Next | Previous | View Threaded


mje at posix

Jan 3, 2012, 1:11 AM

Post #1 of 8 (617 views)
Permalink
IDN, UTF-8 and Punycode curiosity

I'm an ISP in South Africa. I have recently UTF-8'ed all my Databases
(mysql) and Web Systems and sites (Apache + PHP 5.3). My 'virtual web'
system understands UTF8 in a name and does all the (so far) correct
translation from/to puny - both for DNS and Apache. Discovered in apache
I also needed the "SystemAlias" to also include the puny name version of
the Domain - as in (hypothetical domain name, 'co.za' does not support
IDN yet)...

(Virtual host config snippet)
ServerAdmin webmaster [at] caé.co.za
DocumentRoot /home/www/café.co.za/web/
ServerName café.co.za
ServerAlias xn--caf-dma.co.za
ServerAlias www.café.co.za
ServerAlias www.xn--caf-dma.co.za

I'm somewhat stuck with e-mail though. What do people usually do???

So from a backend point of view - do I use 'xn--caf-dma.co.za' or
'café.co.za' for e-mail?


What about UTF-8 support for the LHS of the '@' sign - ie can I allow
'rölf [at] something'?

So far, mailx and evolution seem to dislike UTF-8 in the address fields.

--
. . ___. .__ Posix Systems - (South) Africa
/| /| / /__ mje [at] posix - Mark J Elkins, Cisco CCIE
/ |/ |ARK \_/ /__ LKINS Tel: +27 12 807 0590 Cell: +27 82 601 0496
Attachments: smime.p7s (6.02 KB)


cyborg2 at benderirc

Jan 3, 2012, 1:40 AM

Post #2 of 8 (597 views)
Permalink
Re: IDN, UTF-8 and Punycode curiosity [In reply to]

Am 03.01.2012 10:11, schrieb Mark Elkins:
>
> I'm somewhat stuck with e-mail though. What do people usually do???
>
> So from a backend point of view - do I use 'xn--caf-dma.co.za' or
> 'café.co.za' for e-mail?
>
>
>

You have to use xn--caf-dma.co.za in your backend .. nothing else will
work.

Marius

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


warren at decoy

Jan 3, 2012, 2:49 AM

Post #3 of 8 (592 views)
Permalink
Re: IDN, UTF-8 and Punycode curiosity [In reply to]

On Tue, Jan 3, 2012 at 11:40 AM, Cyborg <cyborg2 [at] benderirc> wrote:
> Am 03.01.2012 10:11, schrieb Mark Elkins:
>
> You have to use  xn--caf-dma.co.za in your backend .. nothing else will
> work.


I have never used it, but doesn't setting allow_utf8_domains and
adjusting dns_check_names_pattern do the job?


--
.warren

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


hs at schlittermann

Jan 3, 2012, 3:52 AM

Post #4 of 8 (595 views)
Permalink
Re: IDN, UTF-8 and Punycode curiosity [In reply to]

Hi Mark,

Mark Elkins <mje [at] posix> (Di 03 Jan 2012 10:11:29 CET):
> I'm somewhat stuck with e-mail though. What do people usually do???
> So from a backend point of view - do I use 'xn--caf-dma.co.za' or
> 'café.co.za' for e-mail?

Never used it, but from my POV it's the job of the MUA to translate

…@café.co.za

into xn--caf-dma.co.za and do all the routing based on the punycode,
since DNS does not serve anything else. In case the MUA is too stupid,
the user has to translate it and then use the punycode address.

> What about UTF-8 support for the LHS of the '@' sign - ie can I allow
> 'rölf [at] something'?

Here I'm clueless. Probably RFC822,2822,5322 give some information about
the allowed charset for the localpart and how to translate it from other
charsets.

But, probably I'm wrong.

--
Heiko :: dresden : linux : SCHLITTERMANN.de
GPG Key 48D0359B : 3061 CFBF 2D88 F034 E8D2 7E92 EE4E AC98 48D0 359B
Attachments: signature.asc (0.19 KB)


pdp at exim

Jan 3, 2012, 5:24 PM

Post #5 of 8 (592 views)
Permalink
Re: IDN, UTF-8 and Punycode curiosity [In reply to]

On 2012-01-03 at 12:49 +0200, Warren Baker wrote:
> On Tue, Jan 3, 2012 at 11:40 AM, Cyborg <cyborg2 [at] benderirc> wrote:
> > Am 03.01.2012 10:11, schrieb Mark Elkins:
> >
> > You have to use  xn--caf-dma.co.za in your backend .. nothing else will
> > work.
>
>
> I have never used it, but doesn't setting allow_utf8_domains and
> adjusting dns_check_names_pattern do the job?

It depends upon the client software; the official IETF approach is to
use Punycode, but another approach is to just put UTF-8 straight into
DNS -- the argument (which I agree with) is that those who want to work
with that part of the world which needs this will upgrade DNS and those
who don't won't, and it's a lot less work to support UTF-8 than to
support Punycode translations everywhere. Mind, normalisation is still
needed.

I personally run with those two settings, but to my knowledge I've never
sent or received mail which depended upon them. One of the TODO items
on my plate is Punycode support in Exim, to better play with the IETF
vision of how complex the world should be.

I haven't yet had time to do so.

In practice, for reasons of "that's what my employer cares about", any
work I put into internationalised domains in Exim will focus first on
whatever consensus has formed amongst real-world operators in Japan.
For me, everything else is a nice extra. Other Exim developers may have
other priorities.
--
https://twitter.com/syscomet

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


cyborg2 at benderirc

Jan 4, 2012, 3:32 AM

Post #6 of 8 (596 views)
Permalink
Re: IDN, UTF-8 and Punycode curiosity [In reply to]

Am 04.01.2012 02:24, schrieb Phil Pennock:
> You have to use xn--caf-dma.co.za in your backend .. nothing else will
> work.
>>
>> I have never used it, but doesn't setting allow_utf8_domains and
>> adjusting dns_check_names_pattern do the job?
> It depends upon the client software; the official IETF approach is to
> use Punycode, but another approach is to just put UTF-8 straight into
> DNS -- the argument (which I agree with) is that those who want to work
> with that part of the world which needs this will upgrade DNS and those
> who don't won't, and it's a lot less work to support UTF-8 than to
> support Punycode translations everywhere.
I personally only would let the DAN's-workinggroup* work on DNS, they
know what they do most of the time.

You don't need to put punycode everywhere to work with it, thats the
brilliance of it. If you have your most loved
mailclient , which is not aware of utf8 , use the punycode version of
your desired domainname and it still works.
It is uncomfortable that way, but hey it works without any additional
problem on the levels below or with any other software around.
It even works in Japan :)

> I personally run with those two settings, but to my knowledge I've never
> sent or received mail which depended upon them. One of the TODO items
> on my plate is Punycode support in Exim, to better play with the IETF
> vision of how complex the world should be.
>
> I haven't yet had time to do so.
>
>
If I may,

UTF-8 breaks DNS in so many ways on so many different levels it's not
worth thinking about. The IETF's approach to the problem is
gigantic on the inside, but very simple on the outside and it works with
everyone, everywhere and even back in time.
I made the first port for the Amiga, and that's an old system, without
vendor support or whatever. If It's possible there,
any japanese engenieer should be able to port it to modern systems and
software instead of making his own dirty solution to the problem.

The ideal implementation results in two simple to use functions like the
Java 1.6 implementation does.
You just cover all input/output domainnames with one of those functions
for encoding_to and decoding_from idn
and that's it. Those functions even check if they need to do something
at all ;) . I upgraded an entire ISP webapp/dns/daemonconfs etc.etc in
less then 10 minutes. With a good choice for your toolkit, you will ask
yourself why you waited so long :)

What makes me wonder is, why Exim should get into this mess at all. It's
a MTA and as one it's not involved
in the process of converting / displaying or encoding domainnames,
that's the job of the mailclient. I loved Exim for
being the most RFC guided MTA and now i read it's operating in sendmail
style ;)

best regards,
Marius

*) Dan Bernstein and Dan Kaminski :)

IDN Example:

http://docs.oracle.com/javase/6/docs/api/java/net/IDN.html

import java.net.IDN;

public class test {
static public void main(String[] args) {
System.out.println( IDN.toUnicode("xn--sf-bcher-95a.de") );
System.out.println( IDN.toASCII("sf-bücher.de") );
}
}







--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


mje at posix

Jan 4, 2012, 8:33 AM

Post #7 of 8 (597 views)
Permalink
Re: IDN, UTF-8 and Punycode curiosity [In reply to]

On Tue, 2012-01-03 at 20:24 -0500, Phil Pennock wrote:
> On 2012-01-03 at 12:49 +0200, Warren Baker wrote:
> > On Tue, Jan 3, 2012 at 11:40 AM, Cyborg <cyborg2 [at] benderirc> wrote:
> > > Am 03.01.2012 10:11, schrieb Mark Elkins:
> > >
> > > You have to use xn--caf-dma.co.za in your backend .. nothing else will
> > > work.
> >
> >
> > I have never used it, but doesn't setting allow_utf8_domains and
> > adjusting dns_check_names_pattern do the job?
>
> It depends upon the client software; the official IETF approach is to
> use Punycode, but another approach is to just put UTF-8 straight into
> DNS -- the argument (which I agree with) is that those who want to work
> with that part of the world which needs this will upgrade DNS and those
> who don't won't, and it's a lot less work to support UTF-8 than to
> support Punycode translations everywhere. Mind, normalisation is still
> needed.
>
> I personally run with those two settings, but to my knowledge I've never
> sent or received mail which depended upon them. One of the TODO items
> on my plate is Punycode support in Exim, to better play with the IETF
> vision of how complex the world should be.

I relented and added a punycode version of the domain name to my MySQL
tables. Also registered (mje@) 'pösix.eu' for proper testing purposes
with e-mail.

Phil:
What exactly do you have 'allow_utf8_domains' and
'dns_check_names_pattern' set to and exactly where should they be
included in the config file. I'm running exim-4.77 (Gentoo).
Just like to get it right first time around.

Are there mail clients (MUA) which translate between UTF-8 and PunyCode
or is this the job of the MTA to try and sort out? Thunderbird and
Evolution both fail with a similar

It would probably be useful if exim could at least 'translate' puny to
UTF in the final deliver stage (eg: deliver by mysql) - but I might be
able to fudge that by having both versions of the name in my DB table -
ie end up with e-mail delivered to my exim via punycode being
appropriately placed in the correct UTF directory with UTF headers.

Some native UTF-8/Punycode would be preferred though.
--
. . ___. .__ Posix Systems - (South) Africa
/| /| / /__ mje [at] posix - Mark J Elkins, Cisco CCIE
/ |/ |ARK \_/ /__ LKINS Tel: +27 12 807 0590 Cell: +27 82 601 0496
Attachments: smime.p7s (3.91 KB)


pdp at exim

Jan 4, 2012, 5:34 PM

Post #8 of 8 (588 views)
Permalink
Re: IDN, UTF-8 and Punycode curiosity [In reply to]

On 2012-01-04 at 18:33 +0200, Mark Elkins wrote:
> Phil:
> What exactly do you have 'allow_utf8_domains' and
> 'dns_check_names_pattern' set to and exactly where should they be
> included in the config file. I'm running exim-4.77 (Gentoo).
> Just like to get it right first time around.

In the main section of the config file, after the macros, before the
first "begin" line, I have:
----------------------------8< cut here >8------------------------------
allow_utf8_domains
# this pattern is straight from spec.txt for allow_utf8_domains:
dns_check_names_pattern = (?i)^(?>(?(1)\.|())[a-z0-9\xc0-\xff]\
(?>[-a-z0-9\x80-\xff]*[a-z0-9\x80-\xbf])?)+$
----------------------------8< cut here >8------------------------------

> Are there mail clients (MUA) which translate between UTF-8 and PunyCode
> or is this the job of the MTA to try and sort out? Thunderbird and
> Evolution both fail with a similar

For internationalised email, done correctly, we're almost at the point
where we can say "the SMTP server advertises an extension in response to
EHLO which tells the client it can send UTF-8 data, and the SMTP server
is responsible for performing IDNA lookups".

RFC 5336 is _Experimental_, the IETF has worked on a Standards-track
document to replace it, visible at:
http://datatracker.ietf.org/doc/draft-ietf-eai-rfc5336bis/

On the 22nd November, the ietf-announce mailing-list noted that
draft-ietf-eai-rfc5336bis-16.txt was approved as a Proposed Standard.
However, that has not yet been published as an RFC.

The most likely course of action for Exim is to wait until whatever the
hold-up is has been resolved, then implement to the shiny new standards,
leaving the standards as default, while making sure that there are
escape hatches for folk who want to do something different. The first
release will probably have EXPERIMENTAL_* build-time guards.

Generally speaking, Exim is strongly biased towards adhering to the IETF
standards, but not committed to implementing all of them, and willing to
make the default behaviour not follow the standards published from
there, but only if there is a very compelling argument.

Eg, http://bugs.exim.org/817 where we're pretty much agreed to turn on
accept_8bitmime by default, because it's what most other systems do in
practice and *not* advertising 8BITMIME is causing more operational
problems than would be caused by advertising it and failing to
down-convert/bounce.

> It would probably be useful if exim could at least 'translate' puny to
> UTF in the final deliver stage (eg: deliver by mysql) - but I might be
> able to fudge that by having both versions of the name in my DB table -
> ie end up with e-mail delivered to my exim via punycode being
> appropriately placed in the correct UTF directory with UTF headers.
>
> Some native UTF-8/Punycode would be preferred though.

Yes, I went so far as to register a punycode domain so that I could test
any changes made to Exim for this, but I then changed employer and
things got hectic, so I failed to make any progress.

Regards,
-Phil

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

exim users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.