Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue15546] Iteration breaks with bz2.open(filename,'rt')

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Aug 3, 2012, 2:04 AM

Post #1 of 16 (162 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt')

New submission from David Beazley:

The bz2 library in Python3.3b1 doesn't support iteration for text-mode properly. Example:

>>> f = bz2.open('access-log-0108.bz2')
>>> next(f) # Works
b'140.180.132.213 - - [24/Feb/2008:00:08:59 -0600] "GET /ply/ply.html HTTP/1.1" 200 97238\n'

>>> g = bz2.open('access-log-0108.bz2','rt')
>>> next(g) # Fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>

----------
components: Library (Lib)
messages: 167299
nosy: dabeaz
priority: normal
severity: normal
status: open
title: Iteration breaks with bz2.open(filename,'rt')
type: behavior
versions: Python 3.3

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 3, 2012, 3:37 AM

Post #2 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Changes by Antoine Pitrou <pitrou [at] free>:


----------
nosy: +nadeem.vawda

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 3, 2012, 4:29 AM

Post #3 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Nadeem Vawda added the comment:

I can't seem to reproduce this with an up-to-date checkout from Mercurial:

>>> import bz2
>>> g = bz2.open('access-log-0108.bz2','rt')
>>> next(g)
'140.180.132.213 - - [24/Feb/2008:00:08:59 -0600] "GET /ply/ply.html HTTP/1.1" 200 97238\n'

(where 'access-log-0108.bz2' is a file I created with the output above as
its first line, and a couple of other lines of random junk following that)

Would it be possible for you to upload the file you used to trigger this
bug?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 3, 2012, 4:52 AM

Post #4 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

David Beazley added the comment:

File attached. The file can be read in its entirety in binary mode.

----------
Added file: http://bugs.python.org/file26673/access-log-0108.bz2

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 3, 2012, 3:27 PM

Post #5 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Nadeem Vawda added the comment:

The cause of this problem is that BZ2File.read1() sometimes returns b"", even though
the file is not at EOF. This happens when the underlying BZ2Decompressor cannot produce
any decompressed data from just the block passed to it in _fill_buffer(); in this case, it needs to read more of the compressed stream to make progress.

It would seem that BZ2File cannot satisfy the contract of the read1() method - we
can't guarantee that a single call to the read() method of the underlying file will
allow us to return a non-empty result, whereas returning b"" is reserved for the
case where we have reached EOF.

Simply removing the read1() method would simply trade this problem for a bigger one
(resurrecting issue 10791), so I propose amending BZ2File.read1() to make as many reads
from the underlying file as necessary to return a non-empty result.

Antoine, what do you think of this?

----------
nosy: +pitrou

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 3, 2012, 3:29 PM

Post #6 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Antoine Pitrou added the comment:

> I propose amending BZ2File.read1() to make as many reads
> from the underlying file as necessary to return a non-empty result.

Agreed. IMO, read1()'s contract should be read as a best-effort thing, not an absolute guarantee. Returning an empty string when there is still data available is wrong.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 3, 2012, 11:53 PM

Post #7 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Serhiy Storchaka added the comment:

I encountered this when implemented bzip2 support in zipfile (issue14371). I solved this also by rewriting read and read1 to make as many reads from the underlying file as necessary to return a non-empty result.

----------
nosy: +storchaka

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 4, 2012, 6:39 AM

Post #8 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Roundup Robot added the comment:

New changeset cdf27a213bd2 by Nadeem Vawda in branch 'default':
#15546: Fix BZ2File.read1()'s handling of pathological input data.
http://hg.python.org/cpython/rev/cdf27a213bd2

----------
nosy: +python-dev

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 4, 2012, 6:41 AM

Post #9 of 16 (153 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Nadeem Vawda added the comment:

OK, BZ2File should now be fixed. It looks like LZMAFile and GzipFile may
be susceptible to the same problem; I'll push fixes for them shortly.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 4, 2012, 5:19 PM

Post #10 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Roundup Robot added the comment:

New changeset 5284e65e865b by Nadeem Vawda in branch 'default':
#15546: Fix {GzipFile,LZMAFile}.read1()'s handling of pathological input data.
http://hg.python.org/cpython/rev/5284e65e865b

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 4, 2012, 5:28 PM

Post #11 of 16 (154 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Nadeem Vawda added the comment:

Done.

Thanks for the bug report, David.

----------
resolution: -> fixed
stage: -> committed/rejected
status: open -> closed

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 4, 2012, 11:20 PM

Post #12 of 16 (153 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Serhiy Storchaka added the comment:

What about peek()?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 5, 2012, 5:11 AM

Post #13 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Nadeem Vawda added the comment:

Before these fixes, it looks like all three classes' peek() methods were susceptible
to the same problem as read1().

The fixes for BZ2File.read1() and LZMAFile.read1() should have fixed peek() as well;
both methods are implemented in terms of _fill_buffer().

For GzipFile, peek() is still potentially broken - I'll push a fix shortly.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 5, 2012, 5:48 AM

Post #14 of 16 (153 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Roundup Robot added the comment:

New changeset 8c07ff7f882f by Nadeem Vawda in branch 'default':
#15546: Also fix GzipFile.peek().
http://hg.python.org/cpython/rev/8c07ff7f882f

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 5, 2012, 6:27 AM

Post #15 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Serhiy Storchaka added the comment:

I have a doubts. Is it not a dead cycle if the end of the compressed data will happen on the end of reading block? Maybe instead of "while self.extrasize <= 0:" worth to write "while self.extrasize <= 0 and self.fileobj is not None:"?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 5, 2012, 7:25 AM

Post #16 of 16 (152 views)
Permalink
[issue15546] Iteration breaks with bz2.open(filename,'rt') [In reply to]

Nadeem Vawda added the comment:

No, if _read() is called once the file is already at EOF, it raises an
EOFError (http://hg.python.org/cpython/file/8c07ff7f882f/Lib/gzip.py#l433),
which will then break out of the loop.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15546>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.