Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

validating XML

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


andrea.crotti.0 at gmail

Jun 13, 2012, 3:06 AM

Post #1 of 6 (261 views)
Permalink
validating XML

Hello Python friends, I have to validate some xml files against some xsd
schema files, but I can't use any cool library as libxml unfortunately.

A Python-only validator might be also fine, but all the projects I've
seen are partial or seem dead..

So since we define the schema ourselves, I was allowed to only implement
the parts of the huge XML definition that we actually need.

Now I'm not quite sure how to do the validation myself, any suggestions?

I thought that I could first parse and store in memory the schema, then
iterate over the XML I need to validate and do the needed sanity checks
there.

So I might use or minidom.parse or ElementTree.parse, which both look
fine even if I'm not sure which one is more suitable.

Another thing is that I would like to be able that if the schema changes
my validation is still correct, so I would need some sort of meta-schema
that declares all the XML constructs that I'm allowed to use.

Anyone did something like this?
Thanks,
Andrea


andrea.crotti.0 at gmail

Jun 13, 2012, 8:45 AM

Post #2 of 6 (246 views)
Permalink
Re: validating XML [In reply to]

So as far as I understood what I should do is the following.
Go through my own XML keeping track of the full path of everything for
example

<SETUP>
<SETUP/COMMENT>
<SETUP/OTHER>

and so on, then for every entry found in this iteration, check the schema
to make sure that that particular construct is allowed
on that level of the tree.

I have something like this for example that creates a dictionary from an
element tree element...
Does it make sense or am I going in the wrong direction?


def etree_to_dict(xml_file):
"""Takes the root node from the XML and generates a dictionary
"""
dic = {}
etree = ElementTree.parse(open(xml_file))
root = list(etree.iter())[0]
queue = [root]

while queue:
el = queue.pop()
childs = el.getchildren()
queue += childs
dic[el] = childs

return dic


dieter at handshake

Jun 13, 2012, 9:16 AM

Post #3 of 6 (244 views)
Permalink
Re: validating XML [In reply to]

andrea crotti <andrea.crotti.0 [at] gmail> writes:

> Hello Python friends, I have to validate some xml files against some xsd
> schema files, but I can't use any cool library as libxml unfortunately.

Why?
It seems not very rational to implement a complex task (such as
XML-Schema validation) when there are ready solutions around.

> A Python-only validator might be also fine, but all the projects I've
> seen are partial or seem dead..
> So since we define the schema ourselves, I was allowed to only implement
> the parts of the huge XML definition that we actually need.
> Now I'm not quite sure how to do the validation myself, any suggestions?

I would look for a command line tool available
on your platform which performs the validation and
call this from Python.

--
Dieter

--
http://mail.python.org/mailman/listinfo/python-list


stefan_ml at behnel

Jun 13, 2012, 10:11 AM

Post #4 of 6 (245 views)
Permalink
Re: validating XML [In reply to]

andrea crotti, 13.06.2012 12:06:
> Hello Python friends, I have to validate some xml files against some xsd
> schema files, but I can't use any cool library as libxml unfortunately.

Any reason for that? Because the canonical answer to your question would be
lxml, which uses libxml2.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list


andrea.crotti.0 at gmail

Jun 14, 2012, 1:20 AM

Post #5 of 6 (245 views)
Permalink
Re: validating XML [In reply to]

2012/6/13 Stefan Behnel <stefan_ml [at] behnel>

> andrea crotti, 13.06.2012 12:06:
> > Hello Python friends, I have to validate some xml files against some xsd
> > schema files, but I can't use any cool library as libxml unfortunately.
>
> Any reason for that? Because the canonical answer to your question would be
> lxml, which uses libxml2.
>
> Stefan
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>


Yes sure and I would perfectly agree with you, the more I look into this
taks the less I want to do it ;)

The reason is that it has to work on many platforms and without any c
module installed, the reason of that
I'm not sure yet but I'm not going to complain again...

Anyway in a sense it's also quite interesting, and I don't need to
implement the whole XML, so it should be fine.
What I haven't found yet is an explanation of a possible algorithm to use
for the validation, that I could then implement..

Thanks,
Andrea


dieter at handshake

Jun 14, 2012, 12:19 PM

Post #6 of 6 (242 views)
Permalink
Re: validating XML [In reply to]

andrea crotti <andrea.crotti.0 [at] gmail> writes:
> ...
> The reason is that it has to work on many platforms and without any c module
> installed, the reason of that

Searching for a pure Python solution, you might have a look at "PyXB".

It has not been designed to validate XML instances against XML-Schema
(but to map between XML instances and Python objects based on
an XML-Schema description) but it detects many problems in the
XML instances. It does not introduce its own C extensions
(but relies on an XML parser shipped with Python).

> Anyway in a sense it's also quite interesting, and I don't need to implement
> the whole XML, so it should be fine.

The XML is the lesser problem. The big problem is XML-Schema: it is
*very* complex with structure definitions (elements, attributes and
"#PCData"), inheritance, redefinition, grouping, scoping rules, inclusion,
data types with restrictions and extensions.

Thus if you want to implement a reliable algorithm which for
given XML-schema and XML-instance checks whether the instance is
valid with respect to the schema, then you have a really big task.

Maybe, you have a fixed (and quite simple) schema. Then
you may be able to implement a validator (for the fixed schema).
But I do not understand why you would want such a validation.
If you generate the XML instances, then thouroughly test your
generation process (using any available validator) and then trust it.
If the XML instances come from somewhere else and must be interpreted
by your application, then the important thing is that they are
understood by your application, not that they are valid.
If you get a complaint that your application cannot handle a specific
XML instance, then you validate it in your development environment
(again with any validator available) and if the validation fails,
you have good arguments.


> What I haven't found yet is an explanation of a possible algorithm to use for
> the validation, that I could then implement..

You parse the XML (and get a tree) and then recursively check
that the elements, attributes and text nodes in the tree
conform to the schema (in an abstract sense,
the schema is a collection of content models for the various elements;
each content model tells you how the element content and attributes
should look like).
For a simple schema, this is straight forward. If the schema starts
to include foreign schemas, uses extensions, restrictions or "redefine"s,
then it gets considerably more difficult.


--
Dieter

--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.