Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Cutting MediaWiki loose from wikitext

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


daniel at brightbyte

Mar 26, 2012, 7:45 AM

Post #1 of 17 (469 views)
Permalink
Cutting MediaWiki loose from wikitext

Hi all. I have a bold proposal (read: evil plan).

To put it briefly: I want to remove the assumption that MediaWiki pages contain
always wikitext. Instead, I propose a pluggable handler system for different
types of content, similar to what we have for file uploads. So, I propose to
associate a "content model" identifier with each page, and have handlers for
each model that provide serialization, rendering, an editor, etc.

The background is that the Wikidata project needs a way to store structured data
(JSON) on wiki pages instead of wikitext. Having a pluggable system would solve
that problem along with several others, like doing away with the special cases
for JS/CSS, the ability to maintain categories etc separate from body text,
manage Gadgets sanely on a wiki page, or several other things (see the link below).

I have described my plans in more detail on meta:

http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler

A very rough prototype is in a dev branch here:

http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

Please let me know what you think (here on the list, preferably, not on the talk
page there, at least for now).

Note that we *definitely* need this ability for Wikidata. We could do it
differently, but I think this would be the cleanest solution, and would have a
lot of mid- and long term benefits, even if it's a short term pain. I'm
presenting my plan here to find out if I'm on the right track, and whether it is
feasible to put this on the road map for 1.20. It would be my (and the Wikidata
team's) priority to implement this and see it through before Wikimania. I'm
convinced we have the manpower to get it done.

Cheers,
Daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


alex.brollo at gmail

Mar 26, 2012, 9:18 AM

Post #2 of 17 (412 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

I agree that's hyronical to play with a powerful database-built project,
and to have no access nor encouragement to organize our data as should be
organized. But - we do use normal pages as data repository too, simply
marking some specific areas of pages as "data areas". More, we use the same
page both as normal wikitext container and "data container". Why not?

Alex brollo (it.source)
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


jeblad at gmail

Mar 26, 2012, 10:18 AM

Post #3 of 17 (410 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

I like this idea, it solves a lot of problems.
John

On Mon, Mar 26, 2012 at 4:45 PM, Daniel Kinzler <daniel [at] brightbyte> wrote:
> Hi all. I have a bold proposal (read: evil plan).
>
> To put it briefly: I want to remove the assumption that MediaWiki pages contain
> always wikitext. Instead, I propose a pluggable handler system for different
> types of content, similar to what we have for file uploads. So, I propose to
> associate a "content model" identifier with each page, and have handlers for
> each model that provide serialization, rendering, an editor, etc.
>
> The background is that the Wikidata project needs a way to store structured data
> (JSON) on wiki pages instead of wikitext. Having a pluggable system would solve
> that problem along with several others, like doing away with the special cases
> for JS/CSS, the ability to maintain categories etc separate from body text,
> manage Gadgets sanely on a wiki page, or several other things (see the link below).
>
> I have described my plans in more detail on meta:
>
>  http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler
>
> A very rough prototype is in a dev branch here:
>
>  http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/
>
> Please let me know what you think (here on the list, preferably, not on the talk
> page there, at least for now).
>
> Note that we *definitely* need this ability for Wikidata. We could do it
> differently, but I think this would be the cleanest solution, and would have a
> lot of mid- and long term benefits, even if it's a short term pain. I'm
> presenting my plan here to find out if I'm on the right track, and whether it is
> feasible to put this on the road map for 1.20. It would be my (and the Wikidata
> team's) priority to implement this and see it through before Wikimania. I'm
> convinced we have the manpower to get it done.
>
> Cheers,
> Daniel
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at pobox

Mar 26, 2012, 1:02 PM

Post #4 of 17 (409 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

I'm generally in favor of this plan. I haven't looked over the specific
code experiments yet but the plan sounds solid. A few notes:

* over time we'll want to do things like migrate File: pages from 'plain
wikitext that happens to have an associated file' to 'structured data about
a file'. This will be magnificent.

* I wouldn't overmuch emphasize things like "oh you could have pages in
markdown or tex!", though it does sound neat and all. :)

* we need to make sure that import/export round-trips things consistently,
including for "non-wikitext" stuff. Either that means making import/export
content-aware, or shipping the serialized form through the export XML?


As for timing; Daniel's hoping for something in the neighborhood of an
August deployment. I think if we keep things minimal that should be
feasible; it's somewhat similar to the migration of Image stuff with
MediaHandler classes.

I'm a bit uncertain about the idea of 'multipart' pages, though attached
data YES YES in some clean way is needed.

-- brion


On Mon, Mar 26, 2012 at 7:45 AM, Daniel Kinzler <daniel [at] brightbyte>wrote:

> Hi all. I have a bold proposal (read: evil plan).
>
> To put it briefly: I want to remove the assumption that MediaWiki pages
> contain
> always wikitext. Instead, I propose a pluggable handler system for
> different
> types of content, similar to what we have for file uploads. So, I propose
> to
> associate a "content model" identifier with each page, and have handlers
> for
> each model that provide serialization, rendering, an editor, etc.
>
> The background is that the Wikidata project needs a way to store
> structured data
> (JSON) on wiki pages instead of wikitext. Having a pluggable system would
> solve
> that problem along with several others, like doing away with the special
> cases
> for JS/CSS, the ability to maintain categories etc separate from body text,
> manage Gadgets sanely on a wiki page, or several other things (see the
> link below).
>
> I have described my plans in more detail on meta:
>
> http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler
>
> A very rough prototype is in a dev branch here:
>
> http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/
>
> Please let me know what you think (here on the list, preferably, not on
> the talk
> page there, at least for now).
>
> Note that we *definitely* need this ability for Wikidata. We could do it
> differently, but I think this would be the cleanest solution, and would
> have a
> lot of mid- and long term benefits, even if it's a short term pain. I'm
> presenting my plan here to find out if I'm on the right track, and whether
> it is
> feasible to put this on the road map for 1.20. It would be my (and the
> Wikidata
> team's) priority to implement this and see it through before Wikimania. I'm
> convinced we have the manpower to get it done.
>
> Cheers,
> Daniel
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


daniel at brightbyte

Mar 26, 2012, 1:26 PM

Post #5 of 17 (414 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

On 26.03.2012 22:02, Brion Vibber wrote:
> I'm generally in favor of this plan. I haven't looked over the specific
> code experiments yet but the plan sounds solid.

YAY!

> * over time we'll want to do things like migrate File: pages from 'plain
> wikitext that happens to have an associated file' to 'structured data about
> a file'. This will be magnificent.

I hope to get the WMNL guys excited about this idea, this would really rock for
GLAM applications.

> * I wouldn't overmuch emphasize things like "oh you could have pages in
> markdown or tex!", though it does sound neat and all. :)

Yes. For the records, i do *not* want to move Wikipedia format to another
syntax. (Well, I wish it *used* another syntax, but that's a completely separate
discussion).

> * we need to make sure that import/export round-trips things consistently,
> including for "non-wikitext" stuff. Either that means making import/export
> content-aware, or shipping the serialized form through the export XML?

I intend the importer/exporter to use the serialized form, and to be aware only
of the additional revision attributes specifying the content model and
serialization format.

How a wiki should react when importing content for an unknown handler is an open
issue, though. Fail? Import a blank page? Import as wikitext?...

But we don't need to solve that here and now.

> As for timing; Daniel's hoping for something in the neighborhood of an
> August deployment. I think if we keep things minimal that should be
> feasible; it's somewhat similar to the migration of Image stuff with
> MediaHandler classes.

This is because of Wikidata's tight timeline. We'll be working hard on getting
this ready soon.

> I'm a bit uncertain about the idea of 'multipart' pages, though attached
> data YES YES in some clean way is needed.

That bit is mostly idle musing - "multipart" and "attachments" are *not* needed
for Wikidata, though they open up several neat use cases.

Thanks for the feedback Brion!

-- daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


daniel at brightbyte

Mar 26, 2012, 1:51 PM

Post #6 of 17 (409 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

On 26.03.2012 18:18, Alex Brollo wrote:
> I agree that's hyronical to play with a powerful database-built project,
> and to have no access nor encouragement to organize our data as should be
> organized. But - we do use normal pages as data repository too, simply
> marking some specific areas of pages as "data areas". More, we use the same
> page both as normal wikitext container and "data container". Why not?

Because it is not sufficient. There is no way to query such data efficiently,
and there is no standard web API to access this data, not URLs to reference it
(without the text around it).

The proposal allows for structured data as page content, as well as any other
type of page content, and it also potentially allows multiple types of data to
exist as part of the same page (using some mechanism of "attachment" or
"multipart").

-- daniel


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Mar 26, 2012, 3:09 PM

Post #7 of 17 (409 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

I like the general idea (haven't gone through the detailed pages).


> On 26.03.2012 22:02, Brion Vibber wrote:
>> * over time we'll want to do things like migrate File: pages from 'plain
>> wikitext that happens to have an associated file' to 'structured data about
>> a file'. This will be magnificent.
I think that File: pages that happen to be svg is a much easier approach.


>> I'm a bit uncertain about the idea of 'multipart' pages, though attached
>> data YES YES in some clean way is needed.
>
> That bit is mostly idle musing - "multipart" and "attachments" are *not* needed
> for Wikidata, though they open up several neat use cases.

It's just something to take into account when designing the extensibility.


> A very rough prototype is in a dev branch here:
>
> http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

It looks really evil publishing that svn branch just days after git
migration :)
I think that branch -created months ago- should be migrated to git, so
we could all despair..^W benefit from git wonderful branching abilities.

Best regards


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Mar 26, 2012, 3:33 PM

Post #8 of 17 (409 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

On 27/03/12 01:45, Daniel Kinzler wrote:
> Hi all. I have a bold proposal (read: evil plan).
>
> To put it briefly: I want to remove the assumption that MediaWiki pages contain
> always wikitext. Instead, I propose a pluggable handler system for different
> types of content, similar to what we have for file uploads. So, I propose to
> associate a "content model" identifier with each page, and have handlers for
> each model that provide serialization, rendering, an editor, etc.

For the record: we've discussed this previously and I'm fine with it.
It's a well thought-out proposal, and the only request I had was to
ensure that the DB schema supports some similar projects that we have
in the idea pile, like multiple parser versions.

On 27/03/12 09:37, MZMcBride wrote:
> For example, would the diff engine need to be rewritten so that people can
> monitor these pages for vandalism? Will these pages be editable in the same
> way as current wikitext pages? If not, will there be special editors for the
> various data types?

These questions are all answered on the notes page that Daniel linked
to. The answers are yes, no and yes.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


z at mzmcbride

Mar 26, 2012, 3:37 PM

Post #9 of 17 (408 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

Daniel Kinzler wrote:
> To put it briefly: I want to remove the assumption that MediaWiki pages
> contain always wikitext. Instead, I propose a pluggable handler system for
> different types of content, similar to what we have for file uploads. So, I
> propose to associate a "content model" identifier with each page, and have
> handlers for each model that provide serialization, rendering, an editor, etc.

It's an ancient assumption that's built in to many parts of MediaWiki (and
many outside tools and scripts). Is there any kind of assessment about the
level of impact this would have?

For example, would the diff engine need to be rewritten so that people can
monitor these pages for vandalism? Will these pages be editable in the same
way as current wikitext pages? If not, will there be special editors for the
various data types? What other parts of the MediaWiki codebase will be
affected and to what extent? Will text still go in the text table or will
separate tables and infrastructure be used?

I'm reminded a little of LiquidThreads for some reason. This idea sounds
good, but I'm worried about the implementation details, particularly as the
assumption you seek to upend is so old and ingrained.

> The background is that the Wikidata project needs a way to store structured
> data (JSON) on wiki pages instead of wikitext. Having a pluggable system would
> solve that problem along with several others, like doing away with the special
> cases for JS/CSS, the ability to maintain categories etc separate from body
> text, manage Gadgets sanely on a wiki page, or several other things (see the
> link below).

How would this affect categories being stored in wikitext (alongside the
rest of the page content text)? That part doesn't make any sense to me.

MZMcBride



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


daniel at brightbyte

Mar 26, 2012, 11:39 PM

Post #10 of 17 (409 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

On 27.03.2012 00:09, Platonides wrote:
> It looks really evil publishing that svn branch just days after git
> migration :)
> I think that branch -created months ago- should be migrated to git, so
> we could all despair..^W benefit from git wonderful branching abilities.

Indeed - when I asked Chad about that, he said "ask me again once the dust has
settled". I'd be happy to have this in git.

Or... well, maybe I'll just make a patch from that branch, make a fresh branch
in git, and cherry pick the changes, trying to keep things minimal. Yea, that's
probably the best thing to do.

-- daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


daniel at brightbyte

Mar 27, 2012, 12:18 AM

Post #11 of 17 (409 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

On 27.03.2012 00:33, Tim Starling wrote:
> For the record: we've discussed this previously and I'm fine with it.
> It's a well thought-out proposal, and the only request I had was to
> ensure that the DB schema supports some similar projects that we have
> in the idea pile, like multiple parser versions.

Thanks Tim! The one important bit I'd like to hear from you is... do you think
it is feasible to get this not only implemented but also reviewed and deployed
by August?... We are on a tight schedule with Wikidata, and this functionality
is a major blocker.

I think implementing ContentHandlers for MediaWiki would have a lot of benefits
for the future, but if it's not feasible to get it in quickly, I have to think
of an alternative way to implement structured data storage.

Thanks
Daniel


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


daniel at brightbyte

Mar 27, 2012, 12:35 AM

Post #12 of 17 (411 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

On 27.03.2012 00:37, MZMcBride wrote:
> It's an ancient assumption that's built in to many parts of MediaWiki (and
> many outside tools and scripts). Is there any kind of assessment about the
> level of impact this would have?

Not formally, just my own poking at the code base. There is a lot of places in
the code that access revision text, and do something with it, not all can easily
be found or changed (especially true for extensions).

My proposal covers a compatibility layer that will cause legacy code to just see
an empty page when trying to access the contents of a non-wikitext page. Only
code aware of content models will see any non-wikitext content. This should
avoid most problems, and should ensure that things will work as before at least
for everything that is wikitext.

> For example, would the diff engine need to be rewritten so that people can
> monitor these pages for vandalism?

A diff engine needs to be implemented for each content model. The existing
engine(s) does not need to be rewritten, it will be used for all wikitext pages.

> Will these pages be editable in the same
> way as current wikitext pages?

No. The entire point of this proposal is to be able to neatly supply specialized
display, editing and diffing of different kinds of content.

> If not, will there be special editors for the
> various data types?

Indeed.

> What other parts of the MediaWiki codebase will be
> affected and to what extent?

A few classes (like Revision or WikiPage) need some major additions or changes,
see the proposal on meta. Lots of places should eventually be changed to become
aware of content models, but don't need to be adapted immediately (see above).

> Will text still go in the text table or will
> separate tables and infrastructure be used?

Uh, did you read the proposal?...

All content is serialized just before storing it. It is stored into the text
table using the same code as before. The content model and serialization format
is recorded in the revision table.

Secondary data (index data, analogous to the link tables) may be extracted from
the content and stored in separate database tables, or in some other service, as
needed.

> I'm reminded a little of LiquidThreads for some reason. This idea sounds
> good, but I'm worried about the implementation details, particularly as the
> assumption you seek to upend is so old and ingrained.

It's more like the transition to using MediaHandlers instead of assuming
uploaded files to be images: existing concepts and actions are generalized to
apply to more types of content.

LiquidThreads introduces new concepts (threads, conversations) and interactions
(re-arranging, summarazing, etc) and tries to integrate them with the concepts
used for wiki pages. This seems far more complicated to me.

>> The background is that the Wikidata project needs a way to store structured
>> data (JSON) on wiki pages instead of wikitext. Having a pluggable system would
>> solve that problem along with several others, like doing away with the special
>> cases for JS/CSS, the ability to maintain categories etc separate from body
>> text, manage Gadgets sanely on a wiki page, or several other things (see the
>> link below).
>
> How would this affect categories being stored in wikitext (alongside the
> rest of the page content text)? That part doesn't make any sense to me.

Imagine a data model that works like mime/multipart email: you have a wrapper
that contains the "main" text as well as "attachments". The whole shebang gets
serialized and stored in the text table, as usual. For displaying, editing and
visualizing, you have code that is aware of the multipart nature of the content,
and puts the parts together nicely.

However, the category stuff is a use case I'm just mentioning because it has bee
requested so often in the past (namely, editing categories, interlanguage links,
etc separately from the wiki text); this mechanism is not essential to the
concept of ContentHandlers, and not something I plan to implement for the
Wikidata project. It'S just somethign that will become much easier once we have
ContentHandlers.

-- daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


alex.brollo at gmail

Mar 27, 2012, 12:47 AM

Post #13 of 17 (408 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

I can't understand details of this talk, but if you like take a look to the
raw code of any ns0 page into it.wikisource and consider that "area dati"
is removed from wikitext as soon as an user opens the page in edit mode,
and re-builded as the user saves it; or take a look here:
http://it.wikisource.org/wiki/MediaWiki:Variabili.js where date used into
automation/help of edit are collected as js objects.


Alex brollo
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


daniel at brightbyte

Mar 27, 2012, 12:56 AM

Post #14 of 17 (415 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

On 27.03.2012 09:47, Alex Brollo wrote:
> I can't understand details of this talk, but if you like take a look to the
> raw code of any ns0 page into it.wikisource and consider that "area dati"
> is removed from wikitext as soon as an user opens the page in edit mode,
> and re-builded as the user saves it; or take a look here:
> http://it.wikisource.org/wiki/MediaWiki:Variabili.js where date used into
> automation/help of edit are collected as js objects.

Yes. Basically, the ContentHandler proposal would introduce native support for
this kind of thing into MediaWiki, instead of implementing it as a hack with
JavaScript. Wouldn't it be nice to get input forms for this data, or have nice
diffs of the structure, or good search results for data records?... Not to
mention the ability to actually query for individual data fields :)

-- daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


hashar+wmf at free

Mar 27, 2012, 2:26 AM

Post #15 of 17 (411 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

Daniel Kinzler wrote:
> A very rough prototype is in a dev branch here:
>
> http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

I guess we could have that migrated to Gerrit and review the project there.

--
Antoine "hashar" Musso


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


daniel at brightbyte

Mar 27, 2012, 2:43 AM

Post #16 of 17 (407 views)
Permalink
Re: Cutting MediaWiki loose from wikitext [In reply to]

On 27.03.2012 11:26, Antoine Musso wrote:
> Daniel Kinzler wrote:
>> A very rough prototype is in a dev branch here:
>>
>> http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/
>
> I guess we could have that migrated to Gerrit and review the project there.

Sure, fine with me :) Though I will likely make a new branch and merge my
changes again more cleanly. What's there now is really a proof of concept. But
sure, have a look!

-- daniel


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


daniel at brightbyte

Apr 30, 2012, 1:05 AM

Post #17 of 17 (383 views)
Permalink
Cutting MediaWiki loose from wikitext [In reply to]

Hi all

Moving forward, I have just committed a first patch for review:

https://gerrit.wikimedia.org/r/#change,6101

Please have a look if you are interested.

-- daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.