
djc at object-craft
Jul 12, 2001, 7:10 AM
Post #1 of 1
(47 views)
Permalink
|
|
csv-0.4 (John Machin release) released
|
|
The CSV module provides a fast CSV parser which can split and join CSV records which have been produced by Microsoft products such as Access and Excel. For some reason on Python 2.0, it now outperforms string.split(). Of course the CSV parser can handle much more complex records than string.split()... This is a bugfix release. My thanks to Skip Montanaro for providing most of the following example: CSV files can be syntactically more complex than simply inserting commas between fields. For example, if a field contains a comma, it must be quoted: 1,2,3,"I think, therefore I am",5,6 The fields returned by this example are: ['1', '2', '3', 'I think, therefore I am', '5', '6'] Since fields are quoted using quotation marks, you also need a way to escape them. In Microsoft created CSV files this is done by doubling them: 1,2,3,"""I see,"" said the blind man","as he picked up his hammer and saw" Excel and Access quite reasonably allow you to place newlines in cell and column data. When this is exported as CSV data the output file contains fields with embedded newlines. 1,2,3,"""I see,"" said the blind man","as he picked up his hammer and saw" A single record is split over three lines with text fields containing embedded newlines. This is what happens when you pass that data line by line to the CSV parser. ferret:/home/djc% python Python 2.0 (#0, Apr 14 2001, 21:24:22) [GCC 2.95.3 20010219 (prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import csv >>> p = csv.parser() >>> p.parse('1,2,3,"""I see,""') >>> p.parse('said the blind man","as he picked up his') >>> p.parse('hammer and saw"') ['1', '2', '3', '"I see,"\012said the blind man', 'as he picked up his\012hammer and saw'] Note that the parser only returns a list of fields when the record is complete. The changes in this release are: 1- Exception raising was leaking the error message. Thanks to John Machin for fixing this. 2- When a parsing exception is raised during parse(), the parser will automatically call clear() discard accumulated fields and state the next time you call parse(). The old behaviour can be restored either by passing zero as the auto_clear constructor keyword argument, or by setting the auto_clear parser attribute to zero. As well as raising an exception, a parsing error will also set the readonly parser attribute had_parse_error to 1. This is reset next time you call parse() or clear(). Thanks again to John Machin for suggesting this. 3- An obscure parsing bug has been fixed. The old behaviour: >>> p.parse('12,12,1",') ['12', '12', '1",'] >>> The new behaviour: >>> p.parse('12,12,1",') ['12', '12', '1"', ''] >>> I am still of two minds about whether I should raise an exception when I encounter text like that... The module homepage: http://www.object-craft.com.au/projects/csv/ For people who do not have a C compiler on Windows I have put a Python 2.1 binary up here: http://www.object-craft.com.au/projects/csv/csv.pyd - Dave -- http://www.object-craft.com.au
|