Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Ignoring XML Namespaces with ElementTree

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


news at redlamb

Dec 3, 2009, 10:21 AM

Post #1 of 3 (784 views)
Permalink
Ignoring XML Namespaces with ElementTree

Is there anyway to configure ElementTree to ignore the XML namespace?
For the past couple months, I've been using minidom to parse an XML
file that is generated by a unit within my organization that can't
stick with a standard. This hasnt been a problem until recently when
the script was provided a 30MB file that once parsed, increased the
python memory footprint by 1.0GB and now I'm running into Memory
Errors. Based on Google searches and testing it looks like ElementTree
is much more efficient with memory and I'd like to switch, however I'd
like to be able to ignore the namespaces. These XML files tend to
randomly switch the namespace for no reason and ignoring these
namespaces would help the script adapt to the changes. Any help on
this would be greatly appreciated. I'm having a hard time finding the
answer.

Additionally, anyone know how ElementTree handle's XML elements that
include Unicode?
--
http://mail.python.org/mailman/listinfo/python-list


stefan_ml at behnel

Dec 3, 2009, 11:55 AM

Post #2 of 3 (773 views)
Permalink
Re: Ignoring XML Namespaces with ElementTree [In reply to]

Pete, 03.12.2009 19:21:
> Is there anyway to configure ElementTree to ignore the XML namespace?
> For the past couple months, I've been using minidom to parse an XML
> file that is generated by a unit within my organization that can't
> stick with a standard. This hasnt been a problem until recently when
> the script was provided a 30MB file that once parsed, increased the
> python memory footprint by 1.0GB and now I'm running into Memory
> Errors. Based on Google searches and testing it looks like ElementTree
> is much more efficient with memory and I'd like to switch,

Make sure you use cElementTree, then that's certainly the right choice to make.


> however I'd
> like to be able to ignore the namespaces. These XML files tend to
> randomly switch the namespace for no reason and ignoring these
> namespaces would help the script adapt to the changes. Any help on
> this would be greatly appreciated. I'm having a hard time finding the
> answer.

ET uses namespace URIs as part of the tag name, so if you want to ignore
namespaces, just strip the leading "{...}" (if any) from the tag and work
with the rest (so-called "local name").


> Additionally, anyone know how ElementTree handle's XML elements that
> include Unicode?

It's an XML parser, so the answer is: without any difficulties.

Stefan
--
http://mail.python.org/mailman/listinfo/python-list


news at redlamb

Dec 3, 2009, 1:39 PM

Post #3 of 3 (767 views)
Permalink
Re: Ignoring XML Namespaces with ElementTree [In reply to]

On Dec 3, 2:55 pm, Stefan Behnel <stefan...@behnel.de> wrote:
> Pete, 03.12.2009 19:21:
>
> > Is there anyway to configure ElementTree to ignore the XML namespace?
> > For the past couple months, I've been using minidom to parse an XML
> > file that is generated by a unit within my organization that can't
> > stick with a standard. This hasnt been a problem until recently when
> > the script was provided a 30MB file that once parsed, increased the
> > python memory footprint by 1.0GB and now I'm running into Memory
> > Errors. Based on Google searches and testing it looks like ElementTree
> > is much more efficient with memory and I'd like to switch,
>
> Make sure you use cElementTree, then that's certainly the right choice to make.
>
> > however I'd
> > like to be able to ignore the namespaces. These XML files tend to
> > randomly switch the namespace for no reason and ignoring these
> > namespaces would help the script adapt to the changes. Any help on
> > this would be greatly appreciated. I'm having a hard time finding the
> > answer.
>
> ET uses namespace URIs as part of the tag name, so if you want to ignore
> namespaces, just strip the leading "{...}" (if any) from the tag and work
> with the rest (so-called "local name").
>
> > Additionally, anyone know how ElementTree handle's XML elements that
> > include Unicode?
>
> It's an XML parser, so the answer is: without any difficulties.
>
> Stefan

Perfect... I can work with that. Thanks.
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.