
knight at teksavvy
Mar 22, 2012, 11:55 PM
Post #9 of 13
(1164 views)
Permalink
|
|
Re: Themestrings has been updated for 0.25 - It's time to start translating! :)
[In reply to]
|
|
Hi! On 3/22/2012 2:24 PM, Kenni Lund wrote: > 2012/3/22 Nicolas Riendeau<knight [at] teksavvy> >> On 3/22/2012 12:57 PM, Nick Morrott wrote: >>> I noticed some UTF-8 weirdness today after updating for the en_gb >>> translations. The XML element generated by lupdate containing the >>> description text for the Steppes theme (it contains "Français") was >>> not generated with valid UTF-8, but rather each of the two bytes >>> representing the "ç" character (should be C3 A7) was further >>> re-encoded into UTF-8 so that 4 bytes in total (C3 83 C2 A7) were >>> output for the character in the file. > > Heh...that bug just won't die :) First the encoding issue appeared in > the theme downloader generation script, then in the themestrings tool > and now in lupdate... (Kenni I know you most likely know a good deal of this but since I'm posting this to the mailing list I might as well document the problem we had with this...) I can only assume All of these scripts/programs assumed that the strings where all in US-ASCII or when they were made there wasn't anything to test them with to make sure they produced the expected results. In the case of the themestrings tool what was most likely happening is that the output which was forced to be outputted in UTF-8 was later re-encoded into our local character set (which is most likely UTF-8 for many if not all of us) by the QTextStream. This time the problem is slightly different... By default lupdate assumes that we are using ISO-8859-1** in the source files (when what we trying to make it process is in UTF-8) so it takes it, assumes the "ç" which is encoded using two bytes in UTF-8 is actually two characters in ISO-8859-1 and proceeds to re-encode it into UTF-8 to store it in the translation file with catastrophic results. ** lupdate default encoding, it's also known under the name Latin1. The reason why this never caused problems before is that our strings are normally in US-ASCII and both ISO-8859-1 and UTF-8 are supersets of US-ASCII. What this means is that as long as the original text only contains US-ASCII characters its encoding is *identical* in both ISO-8859-1 and UTF-8. While both are supersets of US-ASCII all non-US-ASCII characters are not encoded them in the same way (even if the character values match). So as long as everything was in US-ASCII none of these encoding problems popped up... >> I think I have an idea how to fix it (assuming lupdate is actually able >> to extract UTF-8 correctly) > > Ok, good, I haven't looked at it yet. I'll do a few more spot checks tomorrow (it's pretty late here now) but unless I find a problem with the fix I found (I'm not expecting to find any though) I'll commit the fix in every file except for the one for the programs under mythtv/ since we don't want to fix it right now since it would actually add a new string. (The fix will be added at a later time...) The resulting translation file is encoded correctly after applying the fix and it will display correctly in the main translation window but Qt Linguist will still be unable to display the source file correctly (a bug in Qt Linguist). > >> freeze. If we get reports from any of the translators that some strings >> are untranslatable we *might* temporarily break the string freeze in >> order to correct these issues and fix this at the same time.. > > Yep, if it's the only string that needs fixing, let's just fix it > through the translations. If we want to, we can always fix the source > string as well as the single character in all of the translations, on > the day before the release of 0.25. Yep, the problem is quite harmless and doesn't justify adding a new string at this time... Have a nice day! Nicolas _______________________________________________ Mythtv-translators mailing list Mythtv-translators [at] mythtv http://www.mythtv.org/mailman/listinfo/mythtv-translators
|