Gossamer Forum: Products: DBMan: Discussions: unicode questions

Aug 13, 2003, 1:48 PM

kellner

Enthusiast (606 posts)

Aug 13, 2003, 1:48 PM

Post #1 of 5

Shortcut

unicode questions

Hello again (haven't been here for quite a while ...),

here's a thing that keeps bothering me. I have databases where I need to use characters from the Unicode ranges "Latin Extended" and "Latin Extended Additional". If I enter the characters with the appropriate HTML entity numbers, like &#7717;, then they turn out fine with the charset in the HTML pages set to ISO-8859-1. Only, entering characters and especially searching is bothersome - I don't really want to memorize all these numbers, thank you.

So I thought I could just find a way to enter the characters straight into the web form (using a utility like Keyman on Windows, or just configuring my .Xmodmap on my Linux box), and switch the character setting in the html pages to UTF-8.

Fine. Only: Once I've used this setup for a while, things get messy: German umlauts are kind of "melted" together with the three or four letters after them, and in some cases I can only guess what actually was there in the database before. Is it possible that Perl (or dbman) does something weird when adding or modifying records?

Add to which: Linux and Windows seem to have different ways of entering Unicode characters into web forms. As a result, when I enter something into the database on my Windows machine at work, and try to search for it at home with my Linux box, I can't find it.

I guess it would be better to switch back to ISO-8859-1 as encoding in the HTML page, and find some way of entering special characters that leaves them as numerical HTML entities in the database and is operating-system-independent. Yet, I'd really like to understand what's going on here ....

Thanks !
kellner

Aug 14, 2003, 1:19 PM

joematt

User (287 posts)

Aug 14, 2003, 1:19 PM

Post #2 of 5

Shortcut

Re: [kellner] unicode questions In reply to

Well I might like to know as much as you forgot.

Smile

Maybe this is a good idea, sort of a helper application, of course you would need to make some changes.

http://www.gossamer-threads.com/...i?post=243465#243465

Aug 14, 2003, 1:58 PM

kellner

Enthusiast (606 posts)

Aug 14, 2003, 1:58 PM

Post #3 of 5

Shortcut

Re: [joematt] unicode questions In reply to

um, thanks, but I don't really understand how that thread relates to my question ...
kellner

Aug 14, 2003, 2:07 PM

joematt

User (287 posts)

Aug 14, 2003, 2:07 PM

Post #4 of 5

Shortcut

Re: [kellner] unicode questions In reply to

Quote:

Only,
entering characters and especially searching is bothersome - I don't really want to memorize all these numbers,
thank you.

Use the javascript to "remember" the appropriate HTML entity(s).

I'm not sure how many characters you would need, but it seems like having a button to push to enter the HTML code would be faster?

Sorry if this is off base.

Aug 14, 2003, 3:15 PM

kellner

Enthusiast (606 posts)

Aug 14, 2003, 3:15 PM

Post #5 of 5

Shortcut

Re: [joematt] unicode questions In reply to

Ah, thanks, now I get it. Buttons are not really a solution, though. Often every second character in a word is a special character, and switching from keyboard to mouse would be too tiresome in the long run. A colleague once wrote a Javascript for such purposes - you could enter special characters by keeping CTRL down and pressing some key. Problem was: it only worked with Internet Explorer, and the special characters were always added at the beginning of the input field or textarea, not where you needed them.

Another option is to just think of an otherwise unused character, like #, define special characters as combination of e.g. "a" and "#", and convert these strings into numeric HTML entities when adding, modifying or searching for records. At the moment this seems the simplest.

Anyway, thanks for your input.
kellner