
noreply at sourceforge
Nov 12, 2000, 5:17 AM
Post #9 of 9
(158 views)
Permalink
|
Bug #119960, was updated on 2000-Oct-31 13:38 Here is a current snapshot of the bug. Project: Python Category: Tkinter Status: Open Resolution: None Bug Group: None Priority: 3 Summary: Encoding bugs. Details: Win98, Python2.0final. 1. I can't write cyrillic letters in IDLE editor. I tried to figure, what's happened and found that Tcl has command 'encoding'. I typed in IDLE shell: >>> from Tkinter import * >>> root = Tk() >>> root.tk.call("encoding", "names") 'utf-8 identity unicode' >>> root.tk.call("encoding", "system") 'identity' But Tcl had numerous encodings in 'tcl\tcl8.3\encodings' including 'cp1251'! Then I installed Tk separately and removed tcl83.dll and tk83.dll from DLLs: >>> from Tkinter import * >>> root = Tk() >>> root.tk.call("encoding", "names") 'cp860 cp861 [.........] cp857 unicode' >>> root.tk.call("encoding", "system") 'cp1251' So, when tcl/tk dlls in Python\DLLs directory, TCL can't load all it's encodings. But this is not the end. I typed in IDLE shell: >>> print "hello <in russian>" # all chars looks correctly. and got: Exception in Tkinter callback Traceback (most recent call last): File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__ return apply(self.func, args) File "C:\PYTHON20\Tools\idle\PyShell.py", line 579, in enter_callback self.runit() File "C:\PYTHON20\Tools\idle\PyShell.py", line 598, in runit more = self.interp.runsource(line) File "C:\PYTHON20\Tools\idle\PyShell.py", line 183, in runsource return InteractiveInterpreter.runsource(self, source, filename) File "c:\python20\lib\code.py", line 61, in runsource code = compile_command(source, filename, symbol) File "c:\python20\lib\codeop.py", line 61, in compile_command code = compile(source, filename, symbol) UnicodeError: ASCII encoding error: ordinal not in range(128) print "[the same characters]" Then, when I pressed Enter again, i got the same error message. I stopped this by pressing C-Break. [1/2 hour later] I fix this by editing site.py: if 1: # was: if 0 # Enable to support locale aware default string encodings. I typed again: >>> print "hello <in russian>" and got: <some strange letters> >>> print unicode("hello <in russian>") <some strange letters> [2 hours later] Looking sources of _tkinter.c: static Tcl_Obj* AsObj(PyObject *value) { if type(value) is StringType: return Tcl_NewStringObj(value) elif type(value) is UnicodeType: ... ... } But I read in <http://dev.scriptics.com/doc/howto/i18n.html> that all Tcl functions require all strings to be passed in UTF-8. So, this code must look like: if type(value) is StringType: if TCL_Version >= 8.1: return Tcl_NewStringObj(<value converted to UTF-8 string using sys.getdefaultencoding()>) else: return Tcl_NewStringObj(value) And when I typed: >>> print unicode("hello <in russian>").encode('utf-8') i got: hello <in russian> This is the end. P.S. Sorry for my bad english, but I really want to use IDLE and Tkinter in our school, so I can't wait for somebody other writing bug report. Follow-Ups: Date: 2000-Nov-01 08:00 By: jhylton Comment: I am not entirely sure what the bug is, though I agree that it can be confusing to deal with Unicode strings. ------------------------------------------------------- Date: 2000-Nov-01 12:47 By: lemburg Comment: AFAIK, the _tkinter.c code automatically converts Unicode to UTF-8 and then passes this to Tcl/Tk. So basically the folloing should get you correct results... print unicode("hello <in russian>", "cp1251") Alternatively, you can set your default encoding to "cp1251" in the way your describe and then write: print unicode("hello <in russian>") I am not too familiar with Tcl/Tk, so I can't judge whether trying to recode normal 8-bit into UTF-8 is a good idea in general for the _tkinter.c interface. It would easily be possible using: utf8 = string.encode('utf-8') since 8-bit support the .encode() method too. ------------------------------------------------------- Date: 2000-Nov-01 13:16 By: kirill_simonov Comment: 1. print unicode("<cyrillic>") in IDLE don't work! The mechanics (I think) is a) print unicode_string encodes unicode string to normal string using default encoding and pass it to sys.stdout. b) sys.stdout intercepted by IDLE. IDLE sent this string to Tkinter. c) Tkinter pass this string (not unicode but cp1251!) to TCL but TCL waits for UTF-8 string!!! d) I see messy characters on screen. 2. You breaks compability! In 1.5 I can write Button(root, text="<cyrillic>") and this works. Writing unicode("<>", 'cp1251') is UGLY and ANNOYING! TCL requires string in utf-8. All pythonian strings is sys.getdefaultencoding() encoding. So, we have to recode all strings to utf-8. 3. TCL in DLLs can't found it's encodings in tcl\tk8.3\encodings! I don't no why. So, I can't write in Tkinter.Text in russian. ------------------------------------------------------- Date: 2000-Nov-03 12:49 By: gvanrossum Comment: Assigned to Marc-Andre, since I have no idea what to do about this... :-( ------------------------------------------------------- Date: 2000-Nov-09 02:00 By: lemburg Comment: Ok, as we've found out in discussions on python-dev, the cause for the problem is (partially) the fact that "print obj" does an implicit str(obj), so any Unicode object printed will turn out as default encoded string no matter how hard we try. To fix this, we'll need to tweak the current "print" mechanism a bit to allow Unicode to pass through to the receveiving end (sys.stdout in this case). About the problem that Tcl/tk needs UTF-8 strings: we could have _tkinter.c recode the strings for you in case sys.getdefaultencoding() returns anything other than 'ascii' or 'utf-8'. That way you can use a different default encoding in Python while Tcl/tk will always get a true UTF-8 string. Would this be a solution ? ------------------------------------------------------- Date: 2000-Nov-10 10:53 By: kirill_simonov Comment: Yes, this is a solution. But don't forget that TCL can't load it's encodings at startup. Look at FixTk.py: import sys, os, _tkinter [...] os.environ["TCL_LIBRARY"] = v But 'import _tkinter' loads _tkinter.pyd; _tkinter.pyd loads tcl83.dll; tcl83.dll tryes to load it's encodings at startup and fails, becourse TCL_LIBRARY is not defined! I can fix this: #import sys, os, _tkinter import sys, os #ver = str(_tkinter.TCL_VERSION) ver = "8.3" [...] ------------------------------------------------------- Date: 2000-Nov-12 03:30 By: loewis Comment: It should be no problem that Tcl can't find its encodings. When used with Tkinter, Tcl can only expect Unicode strings, or strings in sys.getdefaultencoding() (i.e. 'ascii'). Therefore, Tk never needs any other encoding. If you want to make use of the Tcl system encoding (which is apparently not supported in Tkinter), you probably need to set the TCL_LIBRARY environment variable. ------------------------------------------------------- Date: 2000-Nov-12 04:17 By: kirill_simonov Comment: No, you are wrong! Entry and Text widget depends on TCL system encoding. If TCL can't find cyrillic encoding (cp1251) then I can't enter cyrillic characters. ------------------------------------------------------- For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=119960&group_id=5470
|