Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Forrest: User

UTF-8 setting for Japanese characters

 

 

Forrest user RSS feed   Index | Next | Previous | View Threaded


praveen.bhatia at sumpurn

Dec 14, 2009, 12:58 AM

Post #1 of 5 (769 views)
Permalink
UTF-8 setting for Japanese characters

Hello,
On my forrest 0.8 based website, I have done settings for UTF-8 to make a
Japanese website.
On local tomcat and jetty, it works fine showing the Japanese characters
correctly. (My machines is Japanese Vista m/c)

The problem is when it is uploaded on to the shared server (linux with
tomcat apache), the browser is not seeing them as UTF-8 encoded for display.
The correct UTF-8 Japanese characters can however be seen if the browser
encoding is chosen for EACH page to UTF-8 again and again. (The html file
generated is also having a meta data as follows:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
So generation seems to be ok till here.

This behavior is observable on my forrest website www.sumpurn.com (or
www.sumpurn.com/com.sumpurn.web/index.html) where we will first get garbled
data, but it would become OK if for EACH page the browser encoding is set to
UTF-8 (The characters are entirely in Japanese UTF-8 ....)

I followed all the instructions given in forrest for UTF-8 and the
instructions given in cocoon website
http://cocoon.apache.org/2.2/1366_1_1.html#theory for UTF-8. However, I am
yet unable to make it work. My gut feeling is that apache server's http
header is sending non-UTF encoding to the browser, and that needs to be set
via forrest/cocoon/apache tomcat.

Could someone please guide me as to what other settings are required to be
done?

Thanks
Best wishes
Praveen


praveen.bhatia at sumpurn

Dec 14, 2009, 2:34 AM

Post #2 of 5 (751 views)
Permalink
RE: UTF-8 setting for Japanese characters [In reply to]

Hello,
Further information that I could garner on this. I wrote a program to
read the Response header from the server and this is the result:
Message = GET http://www.sumpurn.com/com.sumpurn.web/index.html HTTP/1.0
HTTP/1.1 200 OK
Date: Mon, 14 Dec 2009 10:30:06 GMT
Server: Apache/2.0.63 (CentOS)
X-Cocoon-Version: 2.2.0-dev
Set-Cookie: JSESSIONID=07213F0EE92415A0E5B8B4D3BCDA0107;
Path=/com.sumpurn.web
Content-Length: 9665
Connection: close
Content-Type: text/html; charset=ISO-8859-1

Clearly, the charset is not getting set to UTF-8 in spite of settings that I
did in forrest.properties, web.xml forrest.xconf, sitemap.xmap (xml
serializer and html serializer).

What settings I could be missing?

Best wishes
Praveen


-----Original Message-----
From: Dr. Praveen Bhatia [mailto:praveen.bhatia [at] sumpurn]
Sent: Monday, December 14, 2009 5:59 PM
To: user [at] forrest
Subject: UTF-8 setting for Japanese characters

Hello,
On my forrest 0.8 based website, I have done settings for UTF-8 to make a
Japanese website.
On local tomcat and jetty, it works fine showing the Japanese characters
correctly. (My machines is Japanese Vista m/c)

The problem is when it is uploaded on to the shared server (linux with
tomcat apache), the browser is not seeing them as UTF-8 encoded for display.
The correct UTF-8 Japanese characters can however be seen if the browser
encoding is chosen for EACH page to UTF-8 again and again. (The html file
generated is also having a meta data as follows:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
So generation seems to be ok till here.

This behavior is observable on my forrest website www.sumpurn.com (or
www.sumpurn.com/com.sumpurn.web/index.html) where we will first get garbled
data, but it would become OK if for EACH page the browser encoding is set to
UTF-8 (The characters are entirely in Japanese UTF-8 ....)

I followed all the instructions given in forrest for UTF-8 and the
instructions given in cocoon website
http://cocoon.apache.org/2.2/1366_1_1.html#theory for UTF-8. However, I am
yet unable to make it work. My gut feeling is that apache server's http
header is sending non-UTF encoding to the browser, and that needs to be set
via forrest/cocoon/apache tomcat.

Could someone please guide me as to what other settings are required to be
done?

Thanks
Best wishes
Praveen


crossley at apache

Dec 14, 2009, 1:54 PM

Post #3 of 5 (745 views)
Permalink
Re: UTF-8 setting for Japanese characters [In reply to]

Dr. Praveen Bhatia wrote:
>
> Clearly, the charset is not getting set to UTF-8 in spite of settings that I
> did in forrest.properties, web.xml forrest.xconf, sitemap.xmap (xml
> serializer and html serializer).
>
> What settings I could be missing?

Some time in the past we had similar issues for our forrest.a.o site.

See $FORREST_HOME/site-author/content/.htaccess

#----------------
# FIXME: Do we still need this? See FOR-877
AddDefaultCharset UTF-8
#----------------

That issue links to a some other issues which might provide
some background.

-David


crossley at apache

Dec 14, 2009, 2:24 PM

Post #4 of 5 (749 views)
Permalink
Re: UTF-8 setting for Japanese characters [In reply to]

David Crossley wrote:
> Dr. Praveen Bhatia wrote:
> >
> > Clearly, the charset is not getting set to UTF-8 in spite of settings that I
> > did in forrest.properties, web.xml forrest.xconf, sitemap.xmap (xml
> > serializer and html serializer).
> >
> > What settings I could be missing?
>
> Some time in the past we had similar issues for our forrest.a.o site.
>
> See $FORREST_HOME/site-author/content/.htaccess
>
> #----------------
> # FIXME: Do we still need this? See FOR-877
> AddDefaultCharset UTF-8
> #----------------
>
> That issue links to a some other issues which might provide
> some background.

Ah, following through from
http://issues.apache.org/jira/browse/FOR-877
to the linked issue:
https://issues.apache.org/bugzilla/show_bug.cgi?id=23421
provides very educational reading on this matter.

-David


praveen.bhatia at sumpurn

Dec 14, 2009, 7:53 PM

Post #5 of 5 (744 views)
Permalink
RE: UTF-8 setting for Japanese characters [In reply to]

Thanks David. The links discussing these UTF-8 issues were very educative
indeed. Taking a clue from them, I contacted our web hoster and asked them
to change the default setting of charset=ISO-8859-1 which was overriding ALL
meta tag charset settings.

Things work now, and UTF-8 is recognized correctly.

Thanks for the information and help.

Best wishes
Praveen


-----Original Message-----
From: David Crossley [mailto:crossley [at] apache]
Sent: Tuesday, December 15, 2009 7:25 AM
To: user [at] forrest
Subject: Re: UTF-8 setting for Japanese characters

David Crossley wrote:
> Dr. Praveen Bhatia wrote:
> >
> > Clearly, the charset is not getting set to UTF-8 in spite of settings
that I
> > did in forrest.properties, web.xml forrest.xconf, sitemap.xmap (xml
> > serializer and html serializer).
> >
> > What settings I could be missing?
>
> Some time in the past we had similar issues for our forrest.a.o site.
>
> See $FORREST_HOME/site-author/content/.htaccess
>
> #----------------
> # FIXME: Do we still need this? See FOR-877
> AddDefaultCharset UTF-8
> #----------------
>
> That issue links to a some other issues which might provide
> some background.

Ah, following through from
http://issues.apache.org/jira/browse/FOR-877
to the linked issue:
https://issues.apache.org/bugzilla/show_bug.cgi?id=23421
provides very educational reading on this matter.

-David

Forrest user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.