Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


robla at wikimedia

Jun 28, 2012, 2:11 PM

Post #1 of 9 (655 views)
Permalink
Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

Hi all,

We have a longstanding request to enable HTML5 on all sites:
https://bugzilla.wikimedia.org/show_bug.cgi?id=27478

We've had it enabled on mediawiki.org for ages, with minimal death and
mayhem. There are two issues listed as blockers:

Bug 30525: Search bar icon/button slightly lower when html5 mode is enabled
https://bugzilla.wikimedia.org/show_bug.cgi?id=30525

Bug 36495: Sanitizer incorrectly converts align="right" for elements
that are not table-cells
https://bugzilla.wikimedia.org/show_bug.cgi?id=36495

Bug 30525 doesn't seem like a blocker to me (but patches definitely
welcome). Bug 36495 seems more likely to cause problems, though I'd
like to nudge Krinkle to explain Comment 9.

Assuming we can either get these fixed, or agree they aren't blockers,
I say we set a date and go. Should we plan on sometime in July (say a
week or two after Wikimania)?

Rob

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


maxsem.wiki at gmail

Jun 28, 2012, 3:11 PM

Post #2 of 9 (628 views)
Permalink
Re: Technical hurdles for enabling $wgHtml5 on Wikimedia sites? [In reply to]

On 29.06.2012, 1:11 Rob wrote:

> We have a longstanding request to enable HTML5 on all sites:
> https://bugzilla.wikimedia.org/show_bug.cgi?id=27478

> We've had it enabled on mediawiki.org for ages, with minimal death and
> mayhem. There are two issues listed as blockers:

> Bug 30525: Search bar icon/button slightly lower when html5 mode is enabled
> https://bugzilla.wikimedia.org/show_bug.cgi?id=30525

Doesn't look that scary.

> Bug 36495: Sanitizer incorrectly converts align="right" for elements
> that are not table-cells
> https://bugzilla.wikimedia.org/show_bug.cgi?id=36495

I could poke at it as part of my 20% time.

> Bug 30525 doesn't seem like a blocker to me (but patches definitely
> welcome). Bug 36495 seems more likely to cause problems, though I'd
> like to nudge Krinkle to explain Comment 9.

> Assuming we can either get these fixed, or agree they aren't blockers,
> I say we set a date and go. Should we plan on sometime in July (say a
> week or two after Wikimania)?

I say go for it. Some people are always going to whine, but we
shouldn't wait forever for a few XML-loving bots to upgrade. We
recently added IPv6 support, yet Wikipedia didn't die in pain while
some anti-vandalism tools were broken. Same thing here.


--
Best regards,
Max Semenik ([[User:MaxSem]])


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


p858snake at gmail

Jun 28, 2012, 3:51 PM

Post #3 of 9 (629 views)
Permalink
Re: Technical hurdles for enabling $wgHtml5 on Wikimedia sites? [In reply to]

On Fri, Jun 29, 2012 at 7:11 AM, Rob Lanphier <robla [at] wikimedia> wrote:
> We've had it enabled on mediawiki.org for ages, with minimal death and
> mayhem.  There are two issues listed as blockers:

We don't have half of the <del>automated crap</del><ins>Anti-vandal
and random other tools</ins> running around on mw wiki.

As far as my view, We have given them enough warnings, If they still
aren't fixed to use the API (and/or not suck in HTML5 mode) it's their
loss and not ours if they break.

> Bug 30525: Search bar icon/button slightly lower when html5 mode is enabled
> https://bugzilla.wikimedia.org/show_bug.cgi?id=30525

> Bug 30525 doesn't seem like a blocker to me (but patches definitely
> welcome).

It isn't a "blocker", that is just the best way to link stuff in BZ, I
filed it just as a "note" bug to look at when we change to the world
of tomorrow (HTML5).

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


helder.wiki at gmail

Jun 28, 2012, 5:26 PM

Post #4 of 9 (635 views)
Permalink
Re: Technical hurdles for enabling $wgHtml5 on Wikimedia sites? [In reply to]

Anomie pointed out on enwiki's Village Pump[1] the problem with the
Cite extension mentioned on
https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c12

Will $wgExperimentalHtmlIds be set to false?
How is it configured on mw.org?

Best regards,
Helder

[1] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)?oldid=499831134#HTML5_is_comming_.28again.29

On Thu, Jun 28, 2012 at 7:51 PM, K. Peachey <p858snake [at] gmail> wrote:
> On Fri, Jun 29, 2012 at 7:11 AM, Rob Lanphier <robla [at] wikimedia> wrote:
>> We've had it enabled on mediawiki.org for ages, with minimal death and
>> mayhem. †There are two issues listed as blockers:
>
> We don't have half of the <del>automated crap</del><ins>Anti-vandal
> and random other tools</ins> running around on mw wiki.
>
> As far as my view, We have given them enough warnings, If they still
> aren't fixed to use the API (and/or not suck in HTML5 mode) it's their
> loss and not ours if they break.
>
>> Bug 30525: Search bar icon/button slightly lower when html5 mode is enabled
>> https://bugzilla.wikimedia.org/show_bug.cgi?id=30525
>
>> Bug 30525 doesn't seem like a blocker to me (but patches definitely
>> welcome).
>
> It isn't a "blocker", that is just the best way to link stuff in BZ, I
> filed it just as a "note" bug to look at when we change to the world
> of tomorrow (HTML5).
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


robla at wikimedia

Jun 29, 2012, 10:25 AM

Post #5 of 9 (620 views)
Permalink
Re: Technical hurdles for enabling $wgHtml5 on Wikimedia sites? [In reply to]

Hi Helder,

Thanks for posting this to VPT, and relaying things back here.
Comments inline...

On Thu, Jun 28, 2012 at 5:26 PM, Helder . <helder.wiki [at] gmail> wrote:
> Anomie pointed out on enwiki's Village Pump[1] the problem with the
> Cite extension mentioned on
> https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c12

I'm not sure. I replied on VPT though. It would be great if someone
could repro this problem on test2, and then, if its still a problem,
file a separate bug.

> Will $wgExperimentalHtmlIds be set to false?
> How is it configured on mw.org?

Doesn't seem to be explicitly mentioned in our site config, so I think
this is false.

Rob

[1] http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#HTML5_is_comming_.28again.29

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


z at mzmcbride

Jun 29, 2012, 5:22 PM

Post #6 of 9 (623 views)
Permalink
Re: Technical hurdles for enabling $wgHtml5 on Wikimedia sites? [In reply to]

Rob Lanphier wrote:
> Assuming we can either get these fixed, or agree they aren't blockers,
> I say we set a date and go. Should we plan on sometime in July (say a
> week or two after Wikimania)?

Your e-mail was unclear to me. It's difficult to tell whether you just
looked at the blockers of bug 27478 or if you read (all of) the bug's
comments (and the related previous mailing list discussions about this).

Are you following the deployment plan outlined by Roan here:
<https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c18>? (It was a
follow-up to Aryeh's post here:
<http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html>.

As I understand it, the "enable HTML5 on Wikimedia wikis" goal has become a
bit murky. There's $wgHtml5, but that's distinct from setting the doctype
(which is what I think most people consider to be the most relevant part).
It's unclear how many features (or more pointedly how many lines of
additional code) are dependent on this configuration variable, which was
part of the reason Aryeh laid out the deployment plan he did.

It's also unclear whether every issue reported in the comments of bug 27478
were filed as separate bugs. In particular, I'm unsure if Cite was ever
properly fixed (or if Aryeh's mentioned alternate, stop-gap solution was
implemented). As I recall, the Cite breakage was breaking links in articles.

MZMcBride



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


robla at wikimedia

Jul 2, 2012, 9:36 AM

Post #7 of 9 (617 views)
Permalink
Re: Technical hurdles for enabling $wgHtml5 on Wikimedia sites? [In reply to]

On Fri, Jun 29, 2012 at 5:22 PM, MZMcBride <z [at] mzmcbride> wrote:
> Are you following the deployment plan outlined by Roan here:
> <https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c18>? (It was a
> follow-up to Aryeh's post here:
> <http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html>.

That plan may be more conservative than we need to be, given it's been
enabled on mediawiki.org for so long. At the time Aryeh wrote that,
the feature hadn't been as well tested as it is now. That's not to
say that we won't find bugs, but that I don't think there will be as
many, that they aren't likely to be severe, and it seems we're in a
better position to address them quickly than we were when that was
written. I wouldn't mind going that route if a lot of other people
feel we should, but it seems likely to me that we might accidentally
introduce production glitches in the process of implementing the
interim steps, and that there could very well be bugs in the interim
states that don't occur in the final stage.

> As I understand it, the "enable HTML5 on Wikimedia wikis" goal has become a
> bit murky. There's $wgHtml5, but that's distinct from setting the doctype
> (which is what I think most people consider to be the most relevant part).

Are you sure that $wgHtml5 is distinct from the doctype? It looks
like mediawiki.org also has the doctype set, and it looks as though
Html.php sets it based on that variable.

> It's also unclear whether every issue reported in the comments of bug 27478
> were filed as separate bugs. In particular, I'm unsure if Cite was ever
> properly fixed (or if Aryeh's mentioned alternate, stop-gap solution was
> implemented). As I recall, the Cite breakage was breaking links in articles.

This is what I'm hoping we can get some clarity on. How many of those
comments are still relevant?

FWIW, I'm not in a big rush to enable this; it's just that it seems
like we're running out of good reasons not to just do it already.

Rob

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


z at mzmcbride

Jul 2, 2012, 11:51 PM

Post #8 of 9 (650 views)
Permalink
Re: Technical hurdles for enabling $wgHtml5 on Wikimedia sites? [In reply to]

Rob Lanphier wrote:
> On Fri, Jun 29, 2012 at 5:22 PM, MZMcBride <z [at] mzmcbride> wrote:
>> Are you following the deployment plan outlined by Roan here:
>> <https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c18>? (It was a
>> follow-up to Aryeh's post here:
>> <http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html>.
>
> That plan may be more conservative than we need to be, given it's been
> enabled on mediawiki.org for so long. At the time Aryeh wrote that,
> the feature hadn't been as well tested as it is now. That's not to
> say that we won't find bugs, but that I don't think there will be as
> many, that they aren't likely to be severe, and it seems we're in a
> better position to address them quickly than we were when that was
> written. I wouldn't mind going that route if a lot of other people
> feel we should, but it seems likely to me that we might accidentally
> introduce production glitches in the process of implementing the
> interim steps, and that there could very well be bugs in the interim
> states that don't occur in the final stage.

I agree that it's more conservative and likely needlessly so. Aryeh has
since clarified that the source of most of the previous breakage
($wgExperimentalHtmlIds) was enabled and then re-disabled by default:
<https://bugzilla.wikimedia.org/show_bug.cgi?id=27694#c6>. As long as
$wgExperimentalHtmlIds stays disabled, the issues with Cite, etc. shouldn't
re-appear and $wgHtml5 should be safe to enable.

>> As I understand it, the "enable HTML5 on Wikimedia wikis" goal has become a
>> bit murky. There's $wgHtml5, but that's distinct from setting the doctype
>> (which is what I think most people consider to be the most relevant part).
>
> Are you sure that $wgHtml5 is distinct from the doctype? It looks
> like mediawiki.org also has the doctype set, and it looks as though
> Html.php sets it based on that variable.

Sorry, I was a little unclear here. I was talking about $wgDocType and
$wgDTD, as discussed by Roan here:
<https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c18>.

By default, the DOCTYPE is automatically set to "<!DOCTYPE†html>\n" when
$wgHtml5 is set to true (from includes/Html.php):

---
if ( $wgHtml5 ) {
$ret .= "<!DOCTYPE html>\n";

if ( $wgHtml5Version ) {
$attribs['version'] = $wgHtml5Version;
}
}
---

Roan's plan called for adjusting the DOCTYPE and/or DTD before setting
$wgHtml5 to true. This is probably unnecessary to do, as you say. My point
was that for most people, the DOCTYPE is the most important/relevant piece
and that setting $wgDocType = '<!doctype html>\n' is (or can be, rather)
distinct from setting $wgHtml5 = 'true';. Depending on how much new and
untested code is reliant on $wgHtml5, setting only the DOCTYPE might be a
good interim solution iff issues arise with $wgHtml5, but you want to output
an HTML5 DOCTYPE.

>> It's also unclear whether every issue reported in the comments of bug 27478
>> were filed as separate bugs. In particular, I'm unsure if Cite was ever
>> properly fixed (or if Aryeh's mentioned alternate, stop-gap solution was
>> implemented). As I recall, the Cite breakage was breaking links in articles.
>
> This is what I'm hoping we can get some clarity on. How many of those
> comments are still relevant?
>
> FWIW, I'm not in a big rush to enable this; it's just that it seems
> like we're running out of good reasons not to just do it already.

I believe not enabling $wgHtml5 is holding up other development efforts
(based on some of the comments at bug 27478, e.g., comments 15 and 21). I
also don't see (m)any good reasons to not just do it already. :-)

MZMcBride



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


ayg at aryeh

Jul 3, 2012, 2:44 AM

Post #9 of 9 (615 views)
Permalink
Re: Technical hurdles for enabling $wgHtml5 on Wikimedia sites? [In reply to]

On Mon, Jul 2, 2012 at 7:36 PM, Rob Lanphier <robla [at] wikimedia> wrote:
> That plan may be more conservative than we need to be, given it's been
> enabled on mediawiki.org for so long.  At the time Aryeh wrote that,
> the feature hadn't been as well tested as it is now.  That's not to
> say that we won't find bugs, but that I don't think there will be as
> many, that they aren't likely to be severe, and it seems we're in a
> better position to address them quickly than we were when that was
> written.  I wouldn't mind going that route if a lot of other people
> feel we should, but it seems likely to me that we might accidentally
> introduce production glitches in the process of implementing the
> interim steps, and that there could very well be bugs in the interim
> states that don't occur in the final stage.

Just to clarify the history here, I originally suggested just turning
it on. I expected (and expect) that there will be a bit of fallout,
but not a lot -- it should be quickly fixable. The stuff that carries
bigger compatibility risks is behind separate switches such as
$wgWellFormedXml and $wgExperimentalHtmlIds.

> Are you sure that $wgHtml5 is distinct from the doctype?  It looks
> like mediawiki.org also has the doctype set, and it looks as though
> Html.php sets it based on that variable.

IIRC, I added a separate variable that allows changing the doctype
separately from $wgHtml5 in case anyone wanted to experiment with
changing the doctype and rest of the page separately. This is because
changing the doctype will affect rendering in certain cases, moving
from "almost-standards" to "standards" rendering, while changing the
rest of the markup might have unrelated effects. But the doctype
should change along with $wgHtml5 if you don't override it.

>> It's also unclear whether every issue reported in the comments of bug 27478
>> were filed as separate bugs. In particular, I'm unsure if Cite was ever
>> properly fixed (or if Aryeh's mentioned alternate, stop-gap solution was
>> implemented). As I recall, the Cite breakage was breaking links in articles.
>
> This is what I'm hoping we can get some clarity on.  How many of those
> comments are still relevant?

Comments 0-5 are still relevant. r82413 will likely need to be
reinstated and enforced in review if you don't want to break XML
processors. Named entities like &nbsp; will no longer work in XML
parsers with no DTD in the doctype -- except for the core &amp; &lt;
&gt; &quot; &apos;. This is likely to be a big issue, because it will
be a headache to make sure extensions don't output such entities in
raw HTML. (The parser/sanitizer will already take care of them in
user input or parsed HTML, though.) If auditing isn't put into place,
I'd expect that XML parsers would break as soon as the change is
deployed, and regularly break thereafter as people accidentally
introduce new entities.

The way around this would be either to use a non-HTML5 doctype (see
end of post), or just give up on XML scrapers and tell them that their
bots will break until they switch to an HTML5 parser or the API. In
the latter case, $wgWellFormedXml can be set to false also, if people
like.

Comment 12 is no longer relevant, because $wgExperimentalHtmlIds was
turned off by default.

http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html
is still a good summary of possible issues, particularly the emphasis
on issue 2.

I don't know if comment 27 is still relevant -- probable, but it
should be trivial to fix. There are likely to be some pages using
table-based layout and images that will start displaying badly and
that users will have to add a few extra style rules to fix.


The major issue that I see is still the named-entities problem, which
is what led to rapid disabling both previous times $wgHtml5 was turned
on. To avoid breaking XML tools, the doctype could be set to XHTML
1.0 Strict or such with $wgHtml5 on, so HTML5 features would still
work. This would make the page valid HTML5, since HTML5 allows some
legacy doctypes that do specify a DTD:

http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#obsolete-permitted-doctype-string

The issue is it would confuse validator.w3.org into trying to validate
as XHTML 1.0 etc., which would make people complain the pages are
invalid. You would have to set it specifically to validate as HTML5
for it to pass. (HTML5 validators are generally much pickier, though,
so expect a lot of pages not to validate as HTML5 either.)

The alternative, as I said, would be to just let XML screen-scraper
bots break. Most languages provide some type of HTML parser that they
could be switched to, I do believe. Python has a particularly good
HTML5 parser, I think, which will parse the page the same as browsers.
In this case, switching off $wgWellFormedXml won't hurt anything and
will decrease page size slightly.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.