Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue14811] compile fails - UTF-8 character decoding

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

May 14, 2012, 9:31 PM

Post #1 of 10 (153 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding

New submission from Glenn Linderman <v+python [at] g>:

t33a.py demonstrates a compilation problem. OK, it has a long line, but making it one space longer (add a space after the left parenthesis) makes it work... so it must not be line length alone. Rather, since the error is about a bad UTF-8 character starting with \xc3, it seems that the UTF-8 decoder might play a role. I was surprised that I could reduce the test case by removing all the lines before and after these 3: the original failure was in a much longer file to which I added this line.

Originally detected in 3.2.2, I upgraded to 3.2.3 and the problem still occurred.

----------
components: Interpreter Core
files: t33a.py
messages: 160679
nosy: v+python
priority: normal
severity: normal
status: open
title: compile fails - UTF-8 character decoding
type: compile error
versions: Python 3.2
Added file: http://bugs.python.org/file25593/t33a.py

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 14, 2012, 11:25 PM

Post #2 of 10 (153 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Glenn Linderman <v+python [at] g> added the comment:

Forgot to mention that I was running on Windows, 64-bit.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 14, 2012, 11:32 PM

Post #3 of 10 (154 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Changes by Ezio Melotti <ezio.melotti [at] gmail>:


----------
components: +Unicode
nosy: +ezio.melotti

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 14, 2012, 11:45 PM

Post #4 of 10 (153 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Hynek Schlawack <hs [at] ox> added the comment:

Would you mind adding more information like the full traceback? By saying "compilation error", I presume you mean the compilation of the t33a.py file into byte code (and not compilation of Python itself)?

I can't reproduce it neither with the vanilla 3.2.3 on OS X nor with Ubuntu's 3.2.

My only suspicion is that the platform default encoding has bitten you, does it also crash if you add "# -*- coding: utf-8 -*-" as the first line?

----------
nosy: +hynek

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 15, 2012, 1:54 AM

Post #5 of 10 (149 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Glenn Linderman <v+python [at] g> added the comment:

There is no traceback. Here is the text of the Syntax error.

d:\my\im\infiles>c:\python32\python.exe d:\my\py\t33a.py -h
File "d:\my\py\t33a.py", line 2
SyntaxError: Non-UTF-8 code starting with '\xc3' in file d:\my\py\t33a.py on line 3, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

My understanding is Python 3 uses utf-8 as the default encoding for source files -- unless there is an encoding line; and I've set my emacs to save all .py files as utf-8-unix (meaning with no CR, if you aren't an emacs user).

I verified with a hex dump that the encoding in the file is UTF-8, but you are welcome to also, that is the file I uploaded.

So your testing would seem to indicate it is a platform specific bug. Try running it on Windows, then.

Further, if it were the platform default encoding, adding a space wouldn't cure it... the encoding of the file would still be UTF-8, and the platform default encoding would still be the same whatever you think it might be (but I think it is UTF-8 for source text), so adding a space would not effect an encoding mismatch.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 15, 2012, 2:45 AM

Post #6 of 10 (147 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Hynek Schlawack <hs [at] ox> added the comment:

You are right, file system encoding was platform dependent, not file encoding.

This space-after-parentheses trigger is odd; I'm adding the Windows guys to the ticket. Please tell us also your exact version of Windows.

----------
components: -Interpreter Core
nosy: +brian.curtin, tim.golden
type: compile error -> behavior

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 15, 2012, 3:23 AM

Post #7 of 10 (144 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

I tried to reproduce but failed to compile a Windows Python - see issue14813.

----------
components: +Windows
nosy: +pitrou
versions: +Python 3.3

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 15, 2012, 3:40 AM

Post #8 of 10 (145 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

I can reproduce it on Linux. Minimal example:

$ ./python -c "open('longline.py', 'w').write('#' + repr('\u00A1' * 4096) + '\n')"
$ ./python longline.py
File "longline.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xc2' in file longline.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

----------
nosy: +storchaka

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 15, 2012, 3:42 AM

Post #9 of 10 (145 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

And for Python 2.7 too.

----------
versions: +Python 2.7

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 15, 2012, 3:49 AM

Post #10 of 10 (143 views)
Permalink
[issue14811] compile fails - UTF-8 character decoding [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

Function decoding_fgets (Parser/tokenizer.c) reads line in buffer of fixed size 8192 (line truncated to size 8191) and then fails because line is cut in the middle of a multibyte UTF-8 character.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14811>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.