Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue7139] ElementTree: Incorrect serialization of end-of-line characters in attribute values

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Nov 2, 2009, 8:12 AM

Post #1 of 4 (142 views)
Permalink
[issue7139] ElementTree: Incorrect serialization of end-of-line characters in attribute values

Moriyoshi Koizumi <mozo+python[at]mozo.jp> added the comment:

Looks like a duplicate of #6492

----------

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue7139>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Nov 2, 2009, 2:06 PM

Post #2 of 4 (119 views)
Permalink
[issue7139] ElementTree: Incorrect serialization of end-of-line characters in attribute values [In reply to]

Ezio Melotti <ezio.melotti[at]gmail.com> added the comment:

If I understood correctly, the correct behavior while reading is:
* literal newlines (\n or \r) and tabs (\t) should be collapsed and
converted to a space
* newlines (&#xA; or &#xD;) and tabs (&#x9;) as entities should be
converted to the literal equivalents (\n, \r and \t)

(See http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html#charescaping)

This should be ok in both xml.minidom and etree.


Instead, while writing, if literal newlines and tabs are written as they
are (\n, \r and \t), they can't be read during the parsing phase because
they are collapsed and converted to a space. They should therefore be
converted to entities (&#xA;, &#xD; and &#x9;) automatically, but this
could be incompatible with the current behavior (i.e. \n, \r or \t that
now are written and collapsed as a space during the parsing will then
become significant).

Moriyoshi, can you confirm that what I said is correct and the problem
is similar to the one described in #5752?
I also closed #6492 as duplicate of this.

----------
nosy: +devon, ezio.melotti
versions: +Python 2.7

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue7139>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Nov 2, 2009, 2:27 PM

Post #3 of 4 (118 views)
Permalink
[issue7139] ElementTree: Incorrect serialization of end-of-line characters in attribute values [In reply to]

Fredrik Lundh <fredrik[at]effbot.org> added the comment:

The real problem here is that XML attributes weren't really designed
to hold data that doesn't survive normalization. One would have
thought that making it difficult to do that, and easy to store such
things as character data, would have made people think a bit before
designing XML formats that does things the other way around, but
apparently some people finds it hard having to use their brain when
designing things...

FWIW, the current ET 1.3 beta escapes newline but not tabs and
carriage returns; I don't really mind adding tabs, but I'm less sure
about carriage return -- XML pretty much treats CT as a junk character
also outside attributes, and escaping it in all contexts would just be
silly.

----------

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue7139>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Nov 11, 2009, 9:38 AM

Post #4 of 4 (62 views)
Permalink
[issue7139] ElementTree: Incorrect serialization of end-of-line characters in attribute values [In reply to]

Moriyoshi Koizumi <mozo+python[at]mozo.jp> added the comment:

@ezio.melotti

Yes, it works flawlessly as for parsing.

Fixing this would actually break the current behavior, but I believe
this is how it should work.

It seems #5752 pretty much says the same thing.

@effbot

As specified in 2.11 End-of-Line Handling [2], any variants of EOL
characters should have been normalized into single #xa before it
actually gets parsed, so bare #xd characters would never appear as they
are amongst parsed information items.


[2] http://www.w3.org/TR/xml/#sec-line-ends

----------

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue7139>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.