Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

We need to make it easy to fork and leave

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


dgerard at gmail

Aug 12, 2011, 3:55 AM

Post #1 of 8 (593 views)
Permalink
We need to make it easy to fork and leave

[.posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere]


THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to
fork the projects, so as to preserve them.

This is the single point of failure problem. The reasons for it having
happened are obvious, but it's still a problem. Blog posts (please
excuse me linking these yet again):

* http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/
* http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/

I dream of the encyclopedia being meaningfully backed up. This will
require technical attention specifically to making the projects -
particularly that huge encyclopedia in English - meaningfully
forkable.

Yes, we should be making ourselves forkable. That way people don't
*have* to trust us.

We're digital natives - we know the most effective way to keep
something safe is to make sure there's lots of copies around.

How easy is it to set up a copy of English Wikipedia - all text, all
pictures, all software, all extensions and customisations to the
software? What bits are hard? If a sizable chunk of the community
wanted to fork, how can we make it *easy* for them to do so?

And I ask all this knowing that we don't have the paid tech resources
to look into it - tech is a huge chunk of the WMF budget and we're
still flat-out just keeping the lights on. But I do think it needs
serious consideration for long-term preservation of all this work.


- d.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


jj5 at jj5

Aug 12, 2011, 4:36 AM

Post #2 of 8 (581 views)
Permalink
Re: We need to make it easy to fork and leave [In reply to]

On 12/08/2011 8:55 PM, David Gerard wrote:
> THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to
> fork the projects, so as to preserve them.

I have an idea that might be practical and go some way toward solving
your problem.

Wikipedia is an impressive undertaking, and as you mentioned on your
blog it has become part of the background as a venerable institution,
however it is still dwarfed by the institution that is the World Wide
Web (which, by the way, runs on web-standards like HTML5 :).

To give a little context concerning the start of the art, a bit over a
week ago I decided to start a club. Within a matter of days I had a
fully functioning web-site for my club, with two CRM systems (a wiki and
a blog), and a number of other administrative facilities, all due to the
power and availability of open-source software. As time goes by there
are only going to be more, not less, people like me. People who have the
capacity to run their own content management systems out of their own
garages (mine's actually in a slicehost.net datacenter, but it *used* to
be in my garage, and by rights it could be, except that I don't actually
*have* a garage any more, but that's another story).

The thing about me, is that there can be hundreds of thousands of people
like me, and when you add up all our contributions, you have a
formidable force. I can't host Wikipedia, but there could be facilities
in place for me to be able to easily mirror the parts of it that are
relevant to me. For instance, on my Network administration page, I have
a number of links to other sites, several of which are links to Wikipedia:

http://www.progclub.org/wiki/Network_administration#Links

Links such as:

http://en.wikipedia.org/wiki/Subversion

Now by rights there could be a registry in my MediaWiki installation
that recorded en.wikipedia.org as being another wiki with a particular
content distribution policy, such as a policy permitting local
mirroring. MediaWiki, when it noticed that I had linked to such a
facility, could replace the link, changing it to a link on my local
system, e.g.

http://www.progclub.org/wiki/Wikepedia:Subversion

There could then be a facility in place to periodically update the
mirrored copies in my own system. Attribution for these copies would be
given to a 'system user', such as the 'Interwiki Update Service'. The
edit history for the version on my system would only show versions for
each time the update service had updated the content. Links for the
'edit' button could be wired up so that when someone tried to edit,

http://www.progclub.org/wiki/Wikipedia:Subversion

on my server, they were redirected to the Wikipedia edit facility,
assuming that such a facility was still available. In the case that
Wikipedia was no more, it would be possible to turn off mirroring, and
in that case the 'edit' facility would allow for edits of the local content.

That's probably a far more practical approach to take than say,
something like distributing the entire English database via BitTorrent.
By all means do that too, but I'd suggest that if you're looking for an
anarchically-scalable distributed hypermedia solution, you won't have to
look much past the web.

John.








_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at pobox

Aug 12, 2011, 4:44 AM

Post #3 of 8 (578 views)
Permalink
Re: We need to make it easy to fork and leave [In reply to]

On Fri, Aug 12, 2011 at 6:55 AM, David Gerard <dgerard [at] gmail> wrote:
>
> [.posted to foundation-l and wikitech-l, thread fork of a discussion
elsewhere]
>
>
> THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to
> fork the projects, so as to preserve them.
>
> This is the single point of failure problem. The reasons for it having
> happened are obvious, but it's still a problem. Blog posts (please
> excuse me linking these yet again):
>
> * http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/
> * http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/
>
> I dream of the encyclopedia being meaningfully backed up. This will
> require technical attention specifically to making the projects -
> particularly that huge encyclopedia in English - meaningfully
> forkable.
>
> Yes, we should be making ourselves forkable. That way people don't
> *have* to trust us.
>
> We're digital natives - we know the most effective way to keep
> something safe is to make sure there's lots of copies around.
>
> How easy is it to set up a copy of English Wikipedia - all text, all
> pictures, all software, all extensions and customisations to the
> software? What bits are hard? If a sizable chunk of the community
> wanted to fork, how can we make it *easy* for them to do so?

Software and customizations are pretty easy -- that's all in SVN, and most
of the config files are also made visible on noc.wikimedia.org.

If you're running a large site there'll be more 'tips and tricks' in the
actual setup that you may need to learn; most documentation on the setups
should be on wikitech.wikimedia.org, and do feel free to ask for details on
anything that might seem missing -- it should be reasonably complete. But to
just keep a data set, it's mostly a matter of disk space, bandwidth, and
getting timely updates.

For data there are three parts:

* page data -- everything that's not deleted/oversighted is in the public
dumps at download.wikimedia.org, but may be a bit slow to build/process due
to the dump system's history; it doesn't scale as well as we really want
with current data size.

More to the point, getting data isn't enough for a "working" fork - a wiki
without a community is an empty thing, so being able to move data around
between different sites (merging changes, distributing new articles) would
be a big plus.

This is a bit awkward with today's MediaWiki (though I tjimk I've seen some
exts aiming to help); DVCSs like git show good ways to do this sort of thing
-- forking a project on/from a git hoster like github or gitorious is
usually the first step to contributing upstream! This is healthy and should
be encouraged for wikis, too.

* media files -- these are freely copiable but I'm not sure the state of
easily obtaing them in bulk. As the data set moved into TB it became
impractical to just build .tar dumps. There are batch downloader tools
available, and the metadata's all in dumps and api.

* user data -- watchlists, emails, passwords, prefs are not exported in
bulk, but you can always obtain your own info so an account migration tool
would not be hard to devise.

> And I ask all this knowing that we don't have the paid tech resources
> to look into it - tech is a huge chunk of the WMF budget and we're
> still flat-out just keeping the lights on. But I do think it needs
> serious consideration for long-term preservation of all this work.

This is part of WMF's purpose, actually, so I'll disagree on that point.
That's why for instance we insist on using so much open source -- we *want*
everything we do to be able to be reused or rebuilt independently of us.

-- brion

>
>
> - d.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dgerard at gmail

Aug 12, 2011, 5:29 AM

Post #4 of 8 (581 views)
Permalink
Re: We need to make it easy to fork and leave [In reply to]

On 12 August 2011 12:44, Brion Vibber <brion [at] pobox> wrote:
> On Fri, Aug 12, 2011 at 6:55 AM, David Gerard <dgerard [at] gmail> wrote:

>> And I ask all this knowing that we don't have the paid tech resources
>> to look into it - tech is a huge chunk of the WMF budget and we're
>> still flat-out just keeping the lights on. But I do think it needs
>> serious consideration fo r long-term preservation of all this work.

> This is part of WMF's purpose, actually, so I'll disagree on that point.
> That's why for instance we insist on using so much open source -- we *want*
> everything we do to be able to be reused or rebuilt independently of us.


I'm speaking of making it happen, not whether it's an acknowledged
need, which I know it is :-) It's an obvious Right Thing. But we have
X dollars to do everything with, so more to this means less to
somewhere else. And this is a variety of technical debt, and tends to
get put in an eternal to-do list with the rest of the technical debt.

So it would need someone actively pushing it. I'm not even absolutely
sure myself it's a priority item that someone should take up as a
cause. I do think the communities need reminding of it from time to
time, however.


- d.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dgerard at gmail

Aug 12, 2011, 5:31 AM

Post #5 of 8 (580 views)
Permalink
Re: We need to make it easy to fork and leave [In reply to]

On 12 August 2011 12:44, Brion Vibber <brion [at] pobox> wrote:

> * user data -- watchlists, emails, passwords, prefs are not exported in
> bulk, but you can always obtain your own info so an account migration tool
> would not be hard to devise.


This one's tricky, because that's not free content, for good reason.
It would need to be present for correct attribution at the least. I
don't see anything intrinsically hard about that - have I missed
anything about it that makes it hard?


- d.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


jj5 at jj5

Aug 12, 2011, 5:44 AM

Post #6 of 8 (580 views)
Permalink
Re: We need to make it easy to fork and leave [In reply to]

On 12/08/2011 10:31 PM, David Gerard wrote:
> This one's tricky, because that's not free content, for good reason.
> It would need to be present for correct attribution at the least. I
> don't see anything intrinsically hard about that - have I missed
> anything about it that makes it hard?

Well you'd need to have namespaces for username's, and that's about it.
Or you could pursue something like OpenID as you mentioned.

Of course if you used the user database "as is" and pursued my proposed
model for content mirroring, you could have an 'Attribution' tab for
mirrored content up near the 'Page' and 'Discussion' tabs, and in that
page show a list of everyone who had contributed to the content. You
could update this list from time-to-time, at the same time as you did
your mirroring. You could go as far as mentioning the number of edits
particular users had made. It wouldn't be the same type of "blow by
blow" attribution that you get where you can see a log of specifically
what contributions particular users had made, but it would be a suitable
attribution nonetheless, similar to the attribution at:

http://en.wikipedia.org/wiki/Special:Version








_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


jj5 at jj5

Aug 12, 2011, 5:54 AM

Post #7 of 8 (581 views)
Permalink
Re: We need to make it easy to fork and leave [In reply to]

On 12/08/2011 10:44 PM, John Elliot wrote:
> It wouldn't be the same type of "blow by blow" attribution that you get
> where you can see a log of specifically what contributions particular
> users had made

Although I guess it would be possible to go all out and support that
too. You could leave the local user database as-is, and introduce a
remote user database that included a namespace, such as
en.wikipedia.org, for usernames. For mirrored content you'd reference
the remote user database, and for local content reference the local user
database.












_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


egil at wp

Aug 12, 2011, 12:20 PM

Post #8 of 8 (577 views)
Permalink
Re: We need to make it easy to fork and leave [In reply to]

John Elliot (2011-08-12 13:36):
> [...]
> The thing about me, is that there can be hundreds of thousands of people
> like me, and when you add up all our contributions, you have a
> formidable force. I can't host Wikipedia, but there could be facilities
> in place for me to be able to easily mirror the parts of it that are
> relevant to me. For instance, on my Network administration page, I have
> a number of links to other sites, several of which are links to Wikipedia:
>
> http://www.progclub.org/wiki/Network_administration#Links
>
> Links such as:
>
> http://en.wikipedia.org/wiki/Subversion
>
> Now by rights there could be a registry in my MediaWiki installation
> that recorded en.wikipedia.org as being another wiki with a particular
> content distribution policy, such as a policy permitting local
> mirroring. MediaWiki, when it noticed that I had linked to such a
> facility, could replace the link, changing it to a link on my local
> system, e.g.
>
> http://www.progclub.org/wiki/Wikepedia:Subversion
>
> ...
>
>

That's a very interesting idea... And it should be really hard to do.

Let's say you linked the Subversion article and you've set up that the
address:
http://en.wikipedia.org/wiki/$1
To be hosted as:
http://www.progclub.org/wiki/en-wiki:...

Now each time your user clicks on a link everything gets registered in
your installation as "to be downloaded" and upon given number of clicks
and/or given number of resources and/or at given time to be downloaded
to your site.

The tricky part would be that you not only need the article itself, but
also it's templates and that can be quite a lot with first articles you
get. Further more this extension would probably need to allow users to
opt-out of downloading images and maybe instead of getting wikicode just
host rendered HTML so that you don't really need to host templates.

And speaking of images - the problem with any of the solutions is - who
would really want to spend money to host all this data? There were times
when Wikipedia had many hold ups, but now I feel there are more chances
that your own server would choke on the data rather then Wikipedia
servers. Maybe ads added to self hosted articles would be worth it, but
I kinda doubt anyone would want to host images unless they had to.

BTW. I think a dynamic fork was already made by France Telecom. They
fork Polish Wikipedia and update articles in a matter of minutes (or at
least they did last time I've checked - they even hosted talk pages so
it was easy to test). You can see the fork here:
http://wikipedia.wp.pl/

Note that they don't host images though they host image pages.

Regards,
Nux.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.