marius at gedmin
Feb 24, 2012, 3:18 PM
Post #7 of 14
On Fri, Feb 24, 2012 at 09:57:57PM +0100, Charlie Clark wrote:
Re: Content Type Meta tag stripping in zope.pagetemplate
[In reply to]
> Am 24.02.2012, 09:47 Uhr, schrieb Miano Njoka <mianonjoka [at] gmail>:
> >While it is not essential, it is necessary in some cases where the
> >finished document will be read from disk or is used by other
> >applications eg. Deliverance[http://packages.python.org/Deliverance/].
> >In fact w3c's HTML validator throws a warning that one should declare
> >the character encoding in the document itself if it is missing.
> This is actually what the validator says:
> No character encoding information was found within the document,
> either in an HTML meta element or an XML declaration. It is often
> recommended to declare the character encoding in the document
> itself, especially if there is a chance that the document will be
> read from or saved to disk, CD, etc.
> As ZPT produces XHTML the proper place for any encoding declaration
> is in the XML declaration, defaulting to UTF-8, which should throw a
> validation error if incorrect.
A strong -1 for zope.pagetemplate adding <?xml ... ?> declarations
> Like much of the HTML standard the
> meta tags were never really thought through and, because invisible
> to the user, all too often copied mindlessly from one project to
> another: I have customers today with completely invalid and
> misleading meta tags of which they and the rest of the world is
> blissfully unware. And as a result browsers - the main consumers of
> the format were made fault tolerant - after all the user often had
> no idea what was causing the problem or how to rectify it. I have
> seen many examples of the server saying one think and the meta
> something else entirely. I think nearly all browsers believe what
> the server says over what's in the meta tag.
The HTML spec requires that:
"To sum up, conforming user agents must observe the following
priorities when determining a document's character encoding (from
highest priority to lowest):
1. An HTTP "charset" parameter in a "Content-Type" field.
2. A META declaration with "http-equiv" set to "Content-Type" and a
value set for "charset".
3. The charset attribute set on an element that designates an
(Aside: The rationale for this ordering, IIRC, is that it allows HTTP
servers to do on-the-fly charset conversion from one 8-bit charset to a
different one, without having to parse HTML and modify the charset name
in the <meta> declaration.)
> According to MAMA, which was instrumental in developing HTML 5 based
> on what has actually been written, the charset was set in the
> http-headersover 99 % of the time. Unfortunately, it doesn't contain
> any stats on discrepancies between the http-header and the meta.
> While there is apparently a possible security risk when not
> declaring the charset I think the Pythonic principle of "there
> should be preferably one obvious way to do something" should apply
> when within Zope trying to decide the charset of a file and that
> should be well documented. I'd suggest keeping the stripping but
> implementing a more rigorous approach such as you suggest.
I'm not a big fan of the stripping.
Consider people using wget to mirror websites (or some equivalent way --
hitting Save As in a browser and selecting "Web Page (original)" instead
of "Web Page (complete)"). The Content-Type header is not going to be
saved on disk.
Why should zope.pagetemplate forbid programmers from duplicating the
charset information in the <meta> element, at least as long as that
information is correct (i.e. matches the content type)?
http://pov.lt/ -- Zope 3/BlueBream consulting and development