Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

XML expat error

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


dirkheld at gmail

Feb 27, 2008, 7:26 AM

Post #1 of 7 (12116 views)
Permalink
XML expat error

Hi,

I have written a piece of code that reads all xml files in a directory
in onder to retrieve one element in each of these files. All files
have the same XML structure. After file 123 I receive the following
error :

xml.parsers.expat.ExpatError: not well-formed (invalid token): line
554, column 20

I guess that the element I try to read or the XML(which would be
strange since they have been created with the same code) can't ben
retrieved.

Is there a way to :
1. fix this problems so that I can retrieve it
2. is there a way that after such an error the invalid file is being
skipped and the program continues with reading the subsequent files;
Some sort of error handling?

Here is the code I use :

from xml.dom import minidom
import os
path = "/Documents/programming/data/xml/"


dirList = os.listdir(path)
url_file=open('/Documents/programming/data/xml/test.txt','w')
for file in dirList:
xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
xml_elem = xmldoc.getElementsByTagName('webpage')
web_elem = xml_elem[0]
url = web_elem.attributes['uri']
url_file.write(url.value + '\n')
url_file.close()
--
http://mail.python.org/mailman/listinfo/python-list


R.Brodie at rl

Feb 27, 2008, 8:18 AM

Post #2 of 7 (12065 views)
Permalink
Re: XML expat error [In reply to]

"dirkheld" <dirkheld [at] gmail> wrote in message
news:babb6775-311d-4f7a-bc03-90f249e34180 [at] s19g2000prg

> xml.parsers.expat.ExpatError: not well-formed (invalid token): line
> 554, column 20
>
> I guess that the element I try to read or the XML(which would be
> strange since they have been created with the same code) can't ben
> retrieved.

It's fairly easy to write non-robust XML generating code, and also
quick to test if one file is always bad. Drop it into a text editor or
Firefox, and take a quick look at line 554. Most likely some random
control character has sneaked in; it only takes (for example) one NUL
to make the document ill-formed.



--
http://mail.python.org/mailman/listinfo/python-list


dirkheld at gmail

Feb 27, 2008, 2:02 PM

Post #3 of 7 (12058 views)
Permalink
Re: XML expat error [In reply to]

On 27 feb, 17:18, "Richard Brodie" <R.Bro...@rl.ac.uk> wrote:
> "dirkheld" <dirkh...@gmail.com> wrote in message
>
> news:babb6775-311d-4f7a-bc03-90f249e34180 [at] s19g2000prg
>
> > xml.parsers.expat.ExpatError: not well-formed (invalid token): line
> > 554, column 20
>
> > I guess that the element I try to read or the XML(which would be
> > strange since they have been created with the same code) can't ben
> > retrieved.
>
> It's fairly easy to write non-robust XML generating code, and also
> quick to test if one file is always bad. Drop it into a text editor or
> Firefox, and take a quick look at line 554. Most likely some random
> control character has sneaked in; it only takes (for example) one NUL
> to make the document ill-formed.

Something strange here. The xml file causing the problem has only 361
lines. Isn't there a way to catch this error, ignore it and continu
with the rest of the other files?
This is the full error report :

Traceback (most recent call last):
File "xmltest.py", line 10, in <module>
xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/xml/dom/minidom.py", line 1913, in parse
return expatbuilder.parse(file)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/xml/dom/expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
554, column 20
--
http://mail.python.org/mailman/listinfo/python-list


bj_666 at gmx

Feb 27, 2008, 11:18 PM

Post #4 of 7 (12058 views)
Permalink
Re: XML expat error [In reply to]

On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:

> Something strange here. The xml file causing the problem has only 361
> lines. Isn't there a way to catch this error, ignore it and continu
> with the rest of the other files?

Yes of course: handle the exception instead of letting it propagate to the
top level and ending the program.

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


dirkheld at gmail

Feb 28, 2008, 12:37 PM

Post #5 of 7 (12052 views)
Permalink
Re: XML expat error [In reply to]

On 28 feb, 08:18, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
> On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:
> > Something strange here. The xml file causing the problem has only 361
> > lines. Isn't there a way to catch this error, ignore it and continu
> > with the rest of the other files?
>
> Yes of course: handle the exception instead of letting it propagate to the
> top level and ending the program.
>
> Ciao,
> Marc 'BlackJack' Rintsch

Ehm, maybe a stupid question... how. I'm rather new to python and I
never user error handling.
--
http://mail.python.org/mailman/listinfo/python-list


stefan_ml at behnel

Feb 28, 2008, 12:53 PM

Post #6 of 7 (12050 views)
Permalink
Re: XML expat error [In reply to]

dirkheld wrote:
> On 28 feb, 08:18, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
>> On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:
>>> Something strange here. The xml file causing the problem has only 361
>>> lines. Isn't there a way to catch this error, ignore it and continu
>>> with the rest of the other files?
>> Yes of course: handle the exception instead of letting it propagate to the
>> top level and ending the program.
>>
>> Ciao,
>> Marc 'BlackJack' Rintsch
>
> Ehm, maybe a stupid question... how. I'm rather new to python and I
> never user error handling.

Care to read the tutorial?

Stefan
--
http://mail.python.org/mailman/listinfo/python-list


bj_666 at gmx

Feb 28, 2008, 1:03 PM

Post #7 of 7 (12054 views)
Permalink
Re: XML expat error [In reply to]

On Thu, 28 Feb 2008 12:37:10 -0800, dirkheld wrote:

>> Yes of course: handle the exception instead of letting it propagate to the
>> top level and ending the program.
>
> Ehm, maybe a stupid question... how. I'm rather new to python and I
> never user error handling.

Then you should work through the tutorial in the docs, at least until
section 8.3 Handling Exceptions:

http://docs.python.org/tut/node10.html#SECTION0010300000000000000000

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.