Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

Wikipedia meets git

 

 

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


jamesmikedupont at googlemail

Oct 15, 2009, 11:55 AM

Post #1 of 37 (11268 views)
Permalink
Wikipedia meets git

Hallo,
I have gotten the wikipedia article for Kosovo in git.
It is fast, distributed, highly compressed, redundant, branchable and usable.

The blame function will show you who edited what version.

Here Blame on the up to date kosovo article!
http://github.com/h4ck3rm1k3/KosovoWikipedia/blame/master/Wiki/Kosovo/article.xml
git

I have checked in all the code to produce this here :
https://code.launchpad.net/~jamesmikedupont/+junk/wikiatransfer

thanks,
mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


gmaxwell at gmail

Oct 15, 2009, 1:16 PM

Post #2 of 37 (11084 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Thu, Oct 15, 2009 at 2:55 PM, jamesmikedupont [at] googlemail
<jamesmikedupont [at] googlemail> wrote:
> Hallo,
> I have gotten the wikipedia article for Kosovo in git.
> It is fast, distributed, highly compressed, redundant, branchable and usable.
>
> The blame function will show you who edited what version.
>
> Here Blame on the up to date kosovo article!
> http://github.com/h4ck3rm1k3/KosovoWikipedia/blame/master/Wiki/Kosovo/article.xml
> git
>
> I have checked in all the code to produce this here :
> https://code.launchpad.net/~jamesmikedupont/+junk/wikiatransfer

It is cool that you get the complete history.

But— it's a bit uncool that its about 14mbytes when the article is
100k; understandable given that the expanded uncompressed history is
about 337mbytes...

I repacked the repository using
git-pack-objects --progress --window=40000 --depth=40000
--compression=9 --all --delta-base-offset

(git-repack doesn't repack, really)

And now have 4168915 2009-10-15 16:12
KosovoWikipedia-ae859bbf9446ddcde4b17e09c99c28dcf594da89.pack, which
is more reasonable.

The number of revisions to a single article is a little bit outside of
the normal usage of git. ;)

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 15, 2009, 1:38 PM

Post #3 of 37 (11077 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Thu, Oct 15, 2009 at 10:16 PM, Gregory Maxwell <gmaxwell [at] gmail> wrote:
> It is cool that you get the complete history.
>
> But— it's a bit uncool that its about 14mbytes when the article is
> 100k; understandable given that the expanded uncompressed history is
> about 337mbytes...

I have the uncompressed history here at 550mb.
du -h history/
556M history/


if I bzip this, it is
29M 2009-10-15 22:35 total.tar.bz

14 mb is still smaller, and the upload is faster!
>
> The number of revisions to a single article is a little bit outside of
> the normal usage of git. ;)

There are ways to optimize all of this. Most users will not want to
download the full history.

This is just one days of work using git, we will be able to optimize this all.

I will be able to find other example of large repositories..

http://laserjock.wordpress.com/2008/05/09/bzr-git-and-hg-performance-on-the-linux-tree/


mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


gmaxwell at gmail

Oct 15, 2009, 2:33 PM

Post #4 of 37 (11072 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Thu, Oct 15, 2009 at 4:38 PM, jamesmikedupont [at] googlemail
<jamesmikedupont [at] googlemail> wrote:
> There are ways to optimize all of this. Most users will not want to
> download the full history.

Then why are you using git?

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 15, 2009, 9:40 PM

Post #5 of 37 (11089 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Thu, Oct 15, 2009 at 11:33 PM, Gregory Maxwell <gmaxwell [at] gmail> wrote:
> On Thu, Oct 15, 2009 at 4:38 PM, jamesmikedupont [at] googlemail
> <jamesmikedupont [at] googlemail> wrote:
>> There are ways to optimize all of this. Most users will not want to
>> download the full history.
>
> Then why are you using git?

I am not most users. I am using git because I think it is the best way
forward to implement many of the ideas discussed in the strategy wiki.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 15, 2009, 9:45 PM

Post #6 of 37 (11084 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Fri, Oct 16, 2009 at 6:40 AM, jamesmikedupont [at] googlemail
<jamesmikedupont [at] googlemail> wrote:
> On Thu, Oct 15, 2009 at 11:33 PM, Gregory Maxwell <gmaxwell [at] gmail> wrote:
>> On Thu, Oct 15, 2009 at 4:38 PM, jamesmikedupont [at] googlemail
>> <jamesmikedupont [at] googlemail> wrote:
>>> There are ways to optimize all of this. Most users will not want to
>>> download the full history.
>>
>> Then why are you using git?
>
> I am not most users. I am using git because I think it is the best way
> forward to implement many of the ideas discussed in the strategy wiki.


if you want only the last 3 revisions checked out , it takes about 10
seconds and produces 300k of data.

git clone --depth 3 git://github.com/h4ck3rm1k3/KosovoWikipedia.git

du -h gittest/
252K gittest/

Log file :

Initialized empty Git repository in
/home_data2/2009/10/KosovoWikipedia/gittest/KosovoWikipedia/.git/
remote: Counting objects: 21, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 21 (delta 3), reused 20 (delta 3)
Receiving objects: 100% (21/21), 40.98 KiB, done.
Resolving deltas: 100% (3/3), done.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 15, 2009, 10:18 PM

Post #7 of 37 (11099 views)
Permalink
Re: Wikipedia meets git [In reply to]

>> On Thu, Oct 15, 2009 at 11:33 PM, Gregory Maxwell <gmaxwell [at] gmail> wrote:
>>> Then why are you using git?

It turns out there are a few wikis built on top of git :

1. the git-wiki :
http://atonie.org/2008/02/git-wiki
http://github.com/jeffbski/git-wiki
git-wiki is a wiki that relies on git to keep pages' history and
Sinatra to serve them. (ruby)

Supports these markups :
* Creole= Creole is a Creole-to-HTML converter for Creole, the
lightwight markup
language (http://wikicreole.org/).
* Markdown= Discount Markdown Processor for Ruby
http://github.com/rtomayko/rdiscount
* Textile = RedCloth is a module for using the Textile markup
language in Ruby. http://redcloth.org/

2. gitit
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/gitit
gitit: Wiki using happstack, git or darcs, and pandoc. (haskell)

3.ikiwiki
http://ikiwiki.info/ Ikiwiki is a wiki compiler.
http://ikiwiki.info/ikiwiki/formatting/

4. wigit : the php git wiki
http://el-tramo.be/software/wigit

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


joshuagay at gmail

Oct 15, 2009, 10:19 PM

Post #8 of 37 (11100 views)
Permalink
Re: Wikipedia meets git [In reply to]

This is very awesome. I am in the early stages of trying to scope out a
small side project to do a mediawiki <-> git bridge; it is very
challenging. Being able to download the complete edit history in this
fashion is extremely useful. Thank you very much for sharing this work.

-Josh

On Fri, Oct 16, 2009 at 12:45 AM, jamesmikedupont [at] googlemail <
jamesmikedupont [at] googlemail> wrote:

> On Fri, Oct 16, 2009 at 6:40 AM, jamesmikedupont [at] googlemail
> <jamesmikedupont [at] googlemail> wrote:
> > On Thu, Oct 15, 2009 at 11:33 PM, Gregory Maxwell <gmaxwell [at] gmail>
> wrote:
> >> On Thu, Oct 15, 2009 at 4:38 PM, jamesmikedupont [at] googlemail
> >> <jamesmikedupont [at] googlemail> wrote:
> >>> There are ways to optimize all of this. Most users will not want to
> >>> download the full history.
> >>
> >> Then why are you using git?
> >
> > I am not most users. I am using git because I think it is the best way
> > forward to implement many of the ideas discussed in the strategy wiki.
>
>
> if you want only the last 3 revisions checked out , it takes about 10
> seconds and produces 300k of data.
>
> git clone --depth 3 git://github.com/h4ck3rm1k3/KosovoWikipedia.git
>
> du -h gittest/
> 252K gittest/
>
> Log file :
>
> Initialized empty Git repository in
> /home_data2/2009/10/KosovoWikipedia/gittest/KosovoWikipedia/.git/
> remote: Counting objects: 21, done.
> remote: Compressing objects: 100% (10/10), done.
> remote: Total 21 (delta 3), reused 20 (delta 3)
> Receiving objects: 100% (21/21), 40.98 KiB, done.
> Resolving deltas: 100% (3/3), done.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
I am running a marathon for the Leukemia & Lymphoma Society. Can you help me
reach my fundraising goals? Visit
http://pages.teamintraining.org/ma/pfchangs10/joshuagay
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


denny.vrandecic at kit

Oct 16, 2009, 12:45 AM

Post #9 of 37 (11073 views)
Permalink
Re: Wikipedia meets git [In reply to]

That is pretty cool. But wouldn't it make more sense to have a more-
fine grained blame, like the one in wikitrust, down to the character
level?

cheers,
denny


On Oct 15, 2009, at 20:55, jamesmikedupont [at] googlemail wrote:

> Hallo,
> I have gotten the wikipedia article for Kosovo in git.
> It is fast, distributed, highly compressed, redundant, branchable
> and usable.
>
> The blame function will show you who edited what version.
>
> Here Blame on the up to date kosovo article!
> http://github.com/h4ck3rm1k3/KosovoWikipedia/blame/master/Wiki/Kosovo/article.xml
> git
>
> I have checked in all the code to produce this here :
> https://code.launchpad.net/~jamesmikedupont/+junk/wikiatransfer
>
> thanks,
> mike
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 16, 2009, 1:30 AM

Post #10 of 37 (11073 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Fri, Oct 16, 2009 at 9:45 AM, Denny Vrandecic
<denny.vrandecic [at] kit> wrote:
> That is pretty cool. But wouldn't it make more sense to have a more-
> fine grained blame, like the one in wikitrust, down to the character
> level?

I don't know all these wikitools, but if the feature is missing from
git, then it will benefit all projects using it.

My fascination with using a real distribute version control system is
that it provides the features that we are missing from the mediawiki.

We can use standard tools to do good things, and not have to reinvent
the world all the time.

We don't need to have a centralized repository and only one point of
view, using a real VCS means that we can multiple hosts, multiple
points of view and a failsafe system.

My next steps are to work on the reader tool in creating latex output
and espeak output of the articles, I am adding in the unicode
character support right now. I would like to get that up to speed, to
use PDF / Audio rendering of the articles.

I will continue to just work with selected articles and improve the
import feature. It should be easy to have an import tool feed by an
rss feed for some articles that imports them on a regular basis.

Mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


denny.vrandecic at kit

Oct 16, 2009, 3:33 AM

Post #11 of 37 (11072 views)
Permalink
Re: Wikipedia meets git [In reply to]

Just another pointer, here is a distributed MediaWiki system developed
at INRIA. I haven't looked into it yet too deep, but their evaluation
looked very promising.

<http://m3p.gforge.inria.fr/pmwiki/pmwiki.php>

Best,
denny

On Oct 16, 2009, at 10:30, jamesmikedupont [at] googlemail wrote:

> On Fri, Oct 16, 2009 at 9:45 AM, Denny Vrandecic
> <denny.vrandecic [at] kit> wrote:
>> That is pretty cool. But wouldn't it make more sense to have a more-
>> fine grained blame, like the one in wikitrust, down to the character
>> level?
>
> I don't know all these wikitools, but if the feature is missing from
> git, then it will benefit all projects using it.
>
> My fascination with using a real distribute version control system is
> that it provides the features that we are missing from the mediawiki.
>
> We can use standard tools to do good things, and not have to reinvent
> the world all the time.
>
> We don't need to have a centralized repository and only one point of
> view, using a real VCS means that we can multiple hosts, multiple
> points of view and a failsafe system.
>
> My next steps are to work on the reader tool in creating latex output
> and espeak output of the articles, I am adding in the unicode
> character support right now. I would like to get that up to speed, to
> use PDF / Audio rendering of the articles.
>
> I will continue to just work with selected articles and improve the
> import feature. It should be easy to have an import tool feed by an
> rss feed for some articles that imports them on a regular basis.
>
> Mike
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 16, 2009, 3:39 AM

Post #12 of 37 (11063 views)
Permalink
Re: Wikipedia meets git [In reply to]

> On Oct 16, 2009, at 10:30, jamesmikedupont [at] googlemail wrote:

I have make two simple vlogs about what and why i did this

http://www.youtube.com/watch?v=jc9jo1ZFLqk

http://www.youtube.com/watch?v=7WfRuEuvIso

Mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 16, 2009, 4:40 AM

Post #13 of 37 (11044 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Fri, Oct 16, 2009 at 9:45 AM, Denny Vrandecic
<denny.vrandecic [at] kit> wrote:
> That is pretty cool. But wouldn't it make more sense to have a more-
> fine grained blame, like the one in wikitrust, down to the character
> level?

Can you please provide some example pages of wikitrust?
they seem to be AWOL:

In the meantime, you can look at our list of colored pages,
http://wikitrust.soe.ucsc.edu/index.php/Colored_pages -> Page not found

Thanks,
mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


gerard.meijssen at gmail

Oct 16, 2009, 5:08 AM

Post #14 of 37 (11067 views)
Permalink
Re: Wikipedia meets git [In reply to]

Hoi,
After a minute of googling I find http://wikitrust.soe.ucsc.edu/home .. I am
sure it is there for you as well.
Thanks,
GerardM

2009/10/16 jamesmikedupont [at] googlemail <jamesmikedupont [at] googlemail>

> On Fri, Oct 16, 2009 at 9:45 AM, Denny Vrandecic
> <denny.vrandecic [at] kit> wrote:
> > That is pretty cool. But wouldn't it make more sense to have a more-
> > fine grained blame, like the one in wikitrust, down to the character
> > level?
>
> Can you please provide some example pages of wikitrust?
> they seem to be AWOL:
>
> In the meantime, you can look at our list of colored pages,
> http://wikitrust.soe.ucsc.edu/index.php/Colored_pages -> Page not found
>
> Thanks,
> mike
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 16, 2009, 5:17 AM

Post #15 of 37 (11049 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Fri, Oct 16, 2009 at 2:08 PM, Gerard Meijssen
<gerard.meijssen [at] gmail> wrote:
> Hoi,
> After a minute of googling I find http://wikitrust.soe.ucsc.edu/home .. I am
> sure it is there for you as well.


Yes the page is there, it seems to be a good idea.

only I am missing some html pages so that we can see what it looks
like, a wordlevel blame.
the colorized pages are missing.

On this page: http://wikitrust.soe.ucsc.edu/home
it says : "In the meantime, you can look at our list of colored pages,
or look at screenshots of English Wikipedia pages analyzed by
WikiTrust. " and the colored pages link to
http://wikitrust.soe.ucsc.edu/index.php/Colored_pages which are
missing....

mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Oct 16, 2009, 7:31 AM

Post #16 of 37 (11075 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Fri, Oct 16, 2009 at 12:45 AM, jamesmikedupont [at] googlemail
> if you want only the last 3 revisions checked out , it takes about 10
> seconds and produces 300k of data.

10 seconds? That's horrible. Have you tried using svn?

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 16, 2009, 7:37 AM

Post #17 of 37 (11075 views)
Permalink
Re: Wikipedia meets git [In reply to]

I did not mean that literally,
let me check the exact time for you : 1.258s

time git clone --depth 3 git://github.com/h4ck3rm1k3/KosovoWikipedia.git
Initialized empty Git repository in
/home_data2/2009/10/KosovoWikipedia/gittest2/KosovoWikipedia/.git/
remote: Counting objects: 21, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 21 (delta 3), reused 20 (delta 3)
Receiving objects: 100% (21/21), 40.99 KiB, done.
Resolving deltas: 100% (3/3), done.

real 0m1.258s
user 0m0.024s
sys 0m0.024s

On Fri, Oct 16, 2009 at 4:31 PM, Anthony <wikimail [at] inbox> wrote:
> On Fri, Oct 16, 2009 at 12:45 AM, jamesmikedupont [at] googlemail
>> if you want only the last 3 revisions checked out , it takes about 10
>> seconds and produces 300k of data.
>
> 10 seconds?  That's horrible.  Have you tried using svn?
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


gmaxwell at gmail

Oct 17, 2009, 1:18 AM

Post #18 of 37 (11037 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Fri, Oct 16, 2009 at 10:31 AM, Anthony <wikimail [at] inbox> wrote:
> On Fri, Oct 16, 2009 at 12:45 AM, jamesmikedupont [at] googlemail
>> if you want only the last 3 revisions checked out , it takes about 10
>> seconds and produces 300k of data.
>
> 10 seconds?  That's horrible.  Have you tried using svn?

On a reasonably fast network it actually only about 10 seconds to pull
the entire edit history from his repo, it would take less if the
history has been repacked as I described— but that kind of tight
repacking makes it take longer when you only want a portion of the
history.

Still— much of the neat things that can be done by having the article
in git are only possible if you have the complete history, for
example: generating a blame map needs the entire history.

It would be nice if the git archival format was more efficient for the
kinds of changes made in Wikipedia articles: Source code changes tends
to have short lines and changes tend to change a significant portion
of the lines, while edits on Wikipedia are far more likely to change
only part of a very long line (really, a paragraph).... so working
with line level deltas is efficient for source code while inefficient
for Wikipedia data.

On this repository a git fast-export --all | lzma -9 produces a
900kbyte output (505783 bytes if you want to be silly and use
PAQ8HP12, which is pretty much the state of the art for English text,
instead of LZMA). These methods don't provide fast random access but
it's still clear that there is a lot of room for improvement. ;) I'm
not sure if anyone is working on improved compression for git for
these kinds of documents.

Getting the entire history of a frequently edited article like this
down to ~1-2mb is roughly where I think it's reasonable for someone
doing continued non-trivial work on the article to fetch the entire
history and thus gain access to functionality that needs most of the
history.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 17, 2009, 1:40 AM

Post #19 of 37 (11037 views)
Permalink
Re: Wikipedia meets git [In reply to]

I have

On Sat, Oct 17, 2009 at 10:18 AM, Gregory Maxwell <gmaxwell [at] gmail> wrote:
> On Fri, Oct 16, 2009 at 10:31 AM, Anthony <wikimail [at] inbox> wrote:
>> On Fri, Oct 16, 2009 at 12:45 AM, jamesmikedupont [at] googlemail
>>> if you want only the last 3 revisions checked out , it takes about 10
>>> seconds and produces 300k of data.
>>
>> 10 seconds?  That's horrible.  Have you tried using svn?
>

> Still— much of the neat things that can be done by having the article
> in git are only possible if you have the complete history, for
> example: generating a blame map needs the entire history.

yes, and if you just want to view and edit then you need one revision.
if you want to do more, you can pull the history.

>
> It would be nice if the git archival format was more efficient for the
> kinds of changes made in Wikipedia articles: Source code changes tends
> to have short lines and changes tend to change a significant portion
> of the lines, while edits on Wikipedia are far more likely to change
> only part of a very long line (really, a paragraph).... so working
> with line level deltas is efficient for source code while inefficient
> for Wikipedia data.

I have started to work on the blame code
to bring it down to the char level and learn about it.
I am willing to invest some time to learn how to make git better for WMF.
it is much more interesting than hacking php code.

Also, I have been able to use the wm-render code on the git archive, you
can see the results of new version of my reader script here : 2 hours
of reading the full article :

http://www.archive.org/details/KosovoWikipediaArticlesVideo

I am thinking to store the wikipedia articles in the intermediate xml
parse tree format from mw-render, if that would help the diff toos.

Another idea would be to allow editing of the articles with open
office for example, and provide tracibility in the document structure
back to the original article. it could be marked up with blame
information, even more, the blame information could be embedded in
each word, with an xml attribute. that would allow for exact tracking
of where the edits come from.

mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Oct 17, 2009, 7:05 AM

Post #20 of 37 (11014 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Sat, Oct 17, 2009 at 4:40 AM, jamesmikedupont [at] googlemail
<jamesmikedupont [at] googlemail> wrote:
>> It would be nice if the git archival format was more efficient for the
>> kinds of changes made in Wikipedia articles: Source code changes tends
>> to have short lines and changes tend to change a significant portion
>> of the lines, while edits on Wikipedia are far more likely to change
>> only part of a very long line (really, a paragraph).... so working
>> with line level deltas is efficient for source code while inefficient
>> for Wikipedia data.
>
> I have started to work on the blame code
> to bring it down to the char level and learn about it.

Char level would probably make it too inefficient to merge deltas.
Treating a period followed by a space as a line separator would
probably be more efficient.

The key to efficiency is to use skip deltas, though. You build a
binary tree so accessing any revision requires the application of only
log(n) deltas.

I asked whether or not you tried svn, because svn already uses skip deltas.

Is the idea that the entire file would need to be transferred over the
Internet, though? If so, I guess you wouldn't want to use skip deltas
- they greatly increase access time to early revisions, but at a
slight space penalty.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jayvdb at gmail

Oct 17, 2009, 8:04 AM

Post #21 of 37 (11028 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Sun, Oct 18, 2009 at 1:05 AM, Anthony <wikimail [at] inbox> wrote:
> On Sat, Oct 17, 2009 at 4:40 AM, jamesmikedupont [at] googlemail
> <jamesmikedupont [at] googlemail> wrote:
>>> It would be nice if the git archival format was more efficient for the
>>> kinds of changes made in Wikipedia articles: Source code changes tends
>>> to have short lines and changes tend to change a significant portion
>>> of the lines, while edits on Wikipedia are far more likely to change
>>> only part of a very long line (really, a paragraph).... so working
>>> with line level deltas is efficient for source code while inefficient
>>> for Wikipedia data.
>>
>> I have started to work on the blame code
>> to bring it down to the char level and learn about it.
>
> Char level would probably make it too inefficient to merge deltas.
> Treating a period followed by a space as a line separator would
> probably be more efficient.
>
> The key to efficiency is to use skip deltas, though.  You build a
> binary tree so accessing any revision requires the application of only
> log(n) deltas.
>
> I asked whether or not you tried svn, because svn already uses skip deltas.

svn would be daft, for so many reasons.

> Is the idea that the entire file would need to be transferred over the
> Internet, though?  If so, I guess you wouldn't want to use skip deltas
> - they greatly increase access time to early revisions, but at a
> slight space penalty.

With git, parts of the checkout can be shallow clones.

--
John Vandenberg

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Oct 17, 2009, 8:23 AM

Post #22 of 37 (11009 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Sat, Oct 17, 2009 at 11:04 AM, John Vandenberg <jayvdb [at] gmail> wrote:
> On Sun, Oct 18, 2009 at 1:05 AM, Anthony <wikimail [at] inbox> wrote:
>> I asked whether or not you tried svn, because svn already uses skip deltas.
>
> svn would be daft, for so many reasons.

Doesn't mean you can't learn from it.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 17, 2009, 9:39 AM

Post #23 of 37 (11016 views)
Permalink
Re: Wikipedia meets git [In reply to]

see my new blogpost word leve blaming for wikipedia via git and perl ...
http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.html


Next step is ready :

1. I have a single script that will pull a given article and check in
the revisions into git,
it is not perfect, but works.

http://bazaar.launchpad.net/~jamesmikedupont/+junk/wikiatransfer/revision/8
you run it like this,from inside a git repo :

perl GetRevisions.pl "Article_Name"

git blame Article_Name/Article.xml
git push origin master

The code that splits up the line is in Process File, this splits all
spaces into newlines.
that way we get a word level blame.

if ($insidetext)
{
## split all lines on the space
s/(\ )/\\\n/g;


print OUT $_;
}


The Article is here:
http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/article.xml


here are the blame results.
http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/wordblame.txt


Problem is that github does not like this amount of processor power
begin used and kills the process, you can do a local git blame.

Now we have the tool to easily create a repository from wikipedia, or
any other export enabled mediawiki.

mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jayvdb at gmail

Oct 17, 2009, 9:53 AM

Post #24 of 37 (11011 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Sun, Oct 18, 2009 at 3:39 AM, jamesmikedupont [at] googlemail
<jamesmikedupont [at] googlemail> wrote:
> see my new blogpost word leve blaming for wikipedia via git and perl ...
> http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.html
> ...
> Problem is that github does not like this amount of processor power
> begin used and kills the process, you can do a local git blame.
>
> Now we have the tool to easily create a repository from wikipedia, or
> any other export enabled mediawiki.

Fantastic!

If you need more processing power, the toolserver may be willing to
give you an account in order to host it, if you can keep the repo
small enough, especially if you can provide a wikiblame tool which is
usable.

http://meta.wikimedia.org/wiki/Toolserver

https://wiki.toolserver.org/view/Account_approval_process

--
John Vandenberg

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 17, 2009, 10:11 AM

Post #25 of 37 (11015 views)
Permalink
Re: Wikipedia meets git [In reply to]

Thanks,
I will apply for an account when It is ready for integration.

this is still in experimentation mode.
The git replaces the mysql database.

But there is alot more work to do to make this viable.

thanks for all your encouragement and support.

mike


On Sat, Oct 17, 2009 at 6:53 PM, John Vandenberg <jayvdb [at] gmail> wrote:
> On Sun, Oct 18, 2009 at 3:39 AM, jamesmikedupont [at] googlemail
> <jamesmikedupont [at] googlemail> wrote:
>> see my new blogpost word leve blaming for wikipedia via git and perl ...
>> http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.html
>> ...
>> Problem is that github does not like this amount of processor power
>> begin used and kills the process, you can do a local git blame.
>>
>> Now we have the tool to easily create a repository from wikipedia, or
>> any other export enabled mediawiki.
>
> Fantastic!
>
> If you need more processing power, the toolserver may be willing to
> give you an account in order to host it, if you can keep the repo
> small enough, especially if you can provide a wikiblame tool which is
> usable.
>
> http://meta.wikimedia.org/wiki/Toolserver
>
> https://wiki.toolserver.org/view/Account_approval_process
>
> --
> John Vandenberg
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


luca at dealfaro

Oct 17, 2009, 4:15 PM

Post #26 of 37 (3680 views)
Permalink
Re: Wikipedia meets git [In reply to]

Yes, I just sent a message to the quality mailing list.
Install the WikiTrust add-on, and visit it.wikipedia.org for instance.
pt.wikipedia.org should also work.

Please see the message I just sent to the Quality mailing list for more
information.

Luca

On Fri, Oct 16, 2009 at 4:40 AM, jamesmikedupont [at] googlemail <
jamesmikedupont [at] googlemail> wrote:

> On Fri, Oct 16, 2009 at 9:45 AM, Denny Vrandecic
> <denny.vrandecic [at] kit> wrote:
> > That is pretty cool. But wouldn't it make more sense to have a more-
> > fine grained blame, like the one in wikitrust, down to the character
> > level?
>
> Can you please provide some example pages of wikitrust?
> they seem to be AWOL:
>
> In the meantime, you can look at our list of colored pages,
> http://wikitrust.soe.ucsc.edu/index.php/Colored_pages -> Page not found
>
> Thanks,
> mike
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


luca at dealfaro

Oct 17, 2009, 4:19 PM

Post #27 of 37 (3657 views)
Permalink
Re: Wikipedia meets git [In reply to]

I am very sorry. We needed to reconfigure a server, so we moved out the
WikiTrust home page, and we put it on google sites, so we could redo the
server configuration.
There is a CNAME, but if you are caching the old name, the DNS change may
not have propagated to you.
In that case, please go to https://sites.google.com/site/ucscwikitrust/

Best,

Luca

On Fri, Oct 16, 2009 at 4:40 AM, jamesmikedupont [at] googlemail <
jamesmikedupont [at] googlemail> wrote:

> On Fri, Oct 16, 2009 at 9:45 AM, Denny Vrandecic
> <denny.vrandecic [at] kit> wrote:
> > That is pretty cool. But wouldn't it make more sense to have a more-
> > fine grained blame, like the one in wikitrust, down to the character
> > level?
>
> Can you please provide some example pages of wikitrust?
> they seem to be AWOL:
>
> In the meantime, you can look at our list of colored pages,
> http://wikitrust.soe.ucsc.edu/index.php/Colored_pages -> Page not found
>
> Thanks,
> mike
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


luca at dealfaro

Oct 17, 2009, 4:48 PM

Post #28 of 37 (3665 views)
Permalink
Re: Wikipedia meets git [In reply to]

Dear James,

you are absolutely right that we were lacking demos: we worked flat out to
produce some, and if you visit http://wikitrust.soe.ucsc.edu/ , you can see
that there are now a couple of Wikipedias on which you can try this.

We wrote our own text analysis engine. The reason is that the typical diff
algorithms you find in git, svn, etc, are very fragile for the analysis of
wiki text:

- They are typically not able to deal with text reordering. If you swap
the order of two paragraphs, it will look to them as if you inserted one of
the two paragraphs. We wanted to be able to trace text across block moves.
- They typically analyze text across the two last revisions only. We
wanted to be able to remember which text used to be present, and has
subsequently been deleted, so that if the text is later reinserted, we can
still correctly attribute it to the original author. Otherwise, if I want
to look like the author of text, I can simply delete (or replace) the
content of a page, do a few quick-fire edits to confuse the system, and then
reinsert the content with some minor changes.

We took a lot of pain to make sure that the text attribution system works in
a robust way with respect to these kind of phenomena. I am sure it is not
perfect yet, and we welcome all feedback.

Luca

On Fri, Oct 16, 2009 at 5:17 AM, jamesmikedupont [at] googlemail <
jamesmikedupont [at] googlemail> wrote:

> On Fri, Oct 16, 2009 at 2:08 PM, Gerard Meijssen
> <gerard.meijssen [at] gmail> wrote:
> > Hoi,
> > After a minute of googling I find http://wikitrust.soe.ucsc.edu/home ..
> I am
> > sure it is there for you as well.
>
>
> Yes the page is there, it seems to be a good idea.
>
> only I am missing some html pages so that we can see what it looks
> like, a wordlevel blame.
> the colorized pages are missing.
>
> On this page: http://wikitrust.soe.ucsc.edu/home
> it says : "In the meantime, you can look at our list of colored pages,
> or look at screenshots of English Wikipedia pages analyzed by
> WikiTrust. " and the colored pages link to
> http://wikitrust.soe.ucsc.edu/index.php/Colored_pages which are
> missing....
>
> mike
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


luca at dealfaro

Oct 17, 2009, 5:26 PM

Post #29 of 37 (3675 views)
Permalink
Re: Wikipedia meets git [In reply to]

Whoops, sorry, due to a glitch of a DNS setting, the demo on
pt.wikipedia.org will be up later today or tomorrow. The demo on
it.wikipedia.org is up now.

Luca

On Sat, Oct 17, 2009 at 4:48 PM, Luca de Alfaro <luca [at] dealfaro> wrote:

> Dear James,
>
> you are absolutely right that we were lacking demos: we worked flat out to
> produce some, and if you visit http://wikitrust.soe.ucsc.edu/ , you can
> see that there are now a couple of Wikipedias on which you can try this.
>
> We wrote our own text analysis engine. The reason is that the typical diff
> algorithms you find in git, svn, etc, are very fragile for the analysis of
> wiki text:
>
> - They are typically not able to deal with text reordering. If you
> swap the order of two paragraphs, it will look to them as if you inserted
> one of the two paragraphs. We wanted to be able to trace text across block
> moves.
> - They typically analyze text across the two last revisions only. We
> wanted to be able to remember which text used to be present, and has
> subsequently been deleted, so that if the text is later reinserted, we can
> still correctly attribute it to the original author. Otherwise, if I want
> to look like the author of text, I can simply delete (or replace) the
> content of a page, do a few quick-fire edits to confuse the system, and then
> reinsert the content with some minor changes.
>
> We took a lot of pain to make sure that the text attribution system works
> in a robust way with respect to these kind of phenomena. I am sure it is
> not perfect yet, and we welcome all feedback.
>
> Luca
>
>
> On Fri, Oct 16, 2009 at 5:17 AM, jamesmikedupont [at] googlemail <
> jamesmikedupont [at] googlemail> wrote:
>
>> On Fri, Oct 16, 2009 at 2:08 PM, Gerard Meijssen
>> <gerard.meijssen [at] gmail> wrote:
>> > Hoi,
>> > After a minute of googling I find http://wikitrust.soe.ucsc.edu/home ..
>> I am
>> > sure it is there for you as well.
>>
>>
>> Yes the page is there, it seems to be a good idea.
>>
>> only I am missing some html pages so that we can see what it looks
>> like, a wordlevel blame.
>> the colorized pages are missing.
>>
>> On this page: http://wikitrust.soe.ucsc.edu/home
>> it says : "In the meantime, you can look at our list of colored pages,
>> or look at screenshots of English Wikipedia pages analyzed by
>> WikiTrust. " and the colored pages link to
>> http://wikitrust.soe.ucsc.edu/index.php/Colored_pages which are
>> missing....
>>
>> mike
>>
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l [at] lists
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
>
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


joshuagay at gmail

Oct 19, 2009, 8:30 AM

Post #30 of 37 (3649 views)
Permalink
Re: Wikipedia meets git [In reply to]

> I will apply for an account when It is ready for integration.
>
> this is still in experimentation mode.
> The git replaces the mysql database.
>
> But there is alot more work to do to make this viable.
>
> thanks for all your encouragement and support.
>
>
Since there are other people out there, perhaps we can start a mediawiki-git
discussion list and/or wiki discussion page? I'd love to post the work I'm
doing, too as it starts to come together.

-Josh
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


meta.sj at gmail

Oct 21, 2009, 5:43 AM

Post #31 of 37 (3620 views)
Permalink
Re: Wikipedia meets git [In reply to]

That sounds like a great idea. I know a few other people who have worked on
git-based wikis and toyed with making them compatible with mediawiki
(copying bernie innocenti, one of the most eloquent :).

SJ

On Mon, Oct 19, 2009 at 11:30 AM, Joshua Gay <joshuagay [at] gmail> wrote:

> > I will apply for an account when It is ready for integration.
> >
> > this is still in experimentation mode.
> > The git replaces the mysql database.
> >
> > But there is alot more work to do to make this viable.
> >
> > thanks for all your encouragement and support.
> >
> >
> Since there are other people out there, perhaps we can start a
> mediawiki-git
> discussion list and/or wiki discussion page? I'd love to post the work I'm
> doing, too as it starts to come together.
>
> -Josh
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 21, 2009, 6:03 AM

Post #32 of 37 (3631 views)
Permalink
Re: Wikipedia meets git [In reply to]

On Mon, Oct 19, 2009 at 5:30 PM, Joshua Gay <joshuagay [at] gmail> wrote:
> Since there are other people out there, perhaps we can start a mediawiki-git
> discussion list and/or wiki discussion page? I'd love to post the work I'm
> doing, too as it starts to come together.
Sounds great Josh,
lets collaborate.

We can setup a google groups in a minute...

mike

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 21, 2009, 1:08 PM

Post #33 of 37 (3599 views)
Permalink
Re: Wikipedia meets git [In reply to]

Wow,
I am impressed.
Let me remind you of one thing,
most people are working on very small subsets of the data. Very few
people will want to have all the data, think about getting all the
versions from all the git repos, it would be the same.
My idea is for smaller chapters who want to get started easily, or
towns, regions to host their own branches of relevant data.
Given a world full of such servers, the sum would be great but the
individual branches needed at one time would be small.

mike

On Wed, Oct 21, 2009 at 9:49 PM, Bernie Innocenti <bernie [at] codewiz> wrote:
> [cc+=git [at] vger]
>
> El Wed, 21-10-2009 a las 08:43 -0400, Samuel Klein escribió:
>> That sounds like a great idea.  I know a few other people who have
>> worked on git-based wikis and toyed with making them compatible with
>> mediawiki (copying bernie innocenti, one of the most eloquent :).
>
> Then I'll do my best to sound as eloquent as expected :)
>
> While I think git's internal structure is wonderfully simple and
> elegant, I'm a little worried about its scalability in the wiki usecase.
>
> The scenario for which git's repository format was designed is "patch
> oriented" revision control of a filesystem tree. The central object of a
> git tree is the "commit", which represents a set of changes on multiple
> files. I'll disregard all the juicy details on how the changes are
> actually packed together to save disk space, making git's repository
> format amazingly compact.
>
> Commits are linked to each other in order to represent the history. Git
> can efficiently represent a highly non-linear history with thousands of
> branches, each containing hundreds of thousands revisions. Branching and
> merging huge trees is so fast that one is left wondering if anything has
> happened at all.
>
> So far, so good. This commit-oriented design is great if you want to
> track the history *the whole tree* at once, applying related changes to
> multiple files atomically. In Git, as well as most other version control
> systems, there's no such thing as a *file* revision! Git manages entire
> trees. Trees are assigned unique revision numbers (in fact, ugly sha-1
> hashes), and can optionally by tagged or branched at will.
>
> And here's the the catch: the history of individual files is not
> directly represented in a git repository. It is typically scattered
> across thousands of commit objects, with no direct links to help find
> them. If you want to retrieve the log of a file that was changed only 6
> times in the entire history of the Linux kernel, you'd have to dig
> through *all* of the 170K revisions in the "master" branch.
>
> And it takes some time even if git is blazingly fast:
>
>  bernie [at] giskar:~/src/kernel/linux-2.6$ time git log  --pretty=oneline REPORTING-BUGS  | wc -l
>  6
>
>  real   0m1.668s
>  user   0m1.416s
>  sys    0m0.210s
>
> (my laptop has a low-power CPU. A fast server would be 8-10x faster).
>
>
> Now, the English Wikipedia seems to have slightly more than 3M articles,
> with--how many? tenths of millions of revisions for sure. Going through
> them *every time* one needs to consult the history of a file would be
> 100x slower. Tens of seconds. Not acceptable, uh?
>
> It seems to me that the typical usage pattern of an encyclopedia is to
> change each article individually. Perhaps I'm underestimating the role
> of bots here. Anyway, there's no consistency *requirement* for mass
> changes to be applied atomically throughout all the encyclopedia, right?
>
> In conclusion, the "tree at a time" design is going to be a performance
> bottleneck for a large wiki, with no useful application. Unless of
> course the concept of changesets was exposed in the UI, which would be
> an interesting idea to explore.
>
> Mercurial (Hg) seems to have a better repository layout for the "one
> file at a time" access pattern... Unfortunately, it's also much slower
> than git for almost any other purpose, sometimes by an order of
> magnitude. I'm not even sure how well Hg would cope with a repository
> containing 3M files and some 30M revisions. The largest Hg tree I've
> dealt with is the "mozilla central" repo, which is already unbearably
> slow to work with.
>
> It would be interesting to compare notes with the other DSCM hackers,
> too.
>
> --
>   // Bernie Innocenti - http://codewiz.org/
>  \X/  Sugar Labs       - http://sugarlabs.org/
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo [at] vger
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


dgerard at gmail

Oct 21, 2009, 4:36 PM

Post #34 of 37 (3608 views)
Permalink
Re: Wikipedia meets git [In reply to]

2009/10/21 jamesmikedupont [at] googlemail <jamesmikedupont [at] googlemail>:

> most people are working on very small subsets of the data. Very few
> people will want to have all the data, think about getting all the
> versions from all the git repos, it would be the same.
> My idea is for smaller chapters who want to get started easily, or
> towns, regions to host their own branches of relevant data.
> Given a world full of such servers, the sum would be great but the
> individual branches needed at one time would be small.


A distributed backend is a nice idea anyway - imagine a meteor hitting
the Florida data centres ...

And there are third-party users who could benefit from a highly
distributed backend, such as Wikileaks.

This thread should probably move to mediawiki-l ...


- d.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jamesmikedupont at googlemail

Oct 21, 2009, 11:27 PM

Post #35 of 37 (3632 views)
Permalink
Re: Wikipedia meets git [In reply to]

Ok,
I have started a google group called mediawiki-vcs



http://groups.google.com/group/mediawiki-vcs

We should just move the discussion there.
Additionaly, I did not name it git, but vcs, for the reason that we
should support multiple backends via a plugin. I am interested in
using git because i think git is great, but others should be free to
use cvs if they feel it is needed.

mike


On Thu, Oct 22, 2009 at 1:36 AM, David Gerard <dgerard [at] gmail> wrote:
> 2009/10/21 jamesmikedupont [at] googlemail <jamesmikedupont [at] googlemail>:
>
>> most people are working on very small subsets of the data. Very few
>> people will want to have all the data, think about getting all the
>> versions from all the git repos, it would be the same.
>> My idea is for smaller chapters who want to get started easily, or
>> towns, regions to host their own branches of relevant data.
>> Given a world full of such servers, the sum would be great but the
>> individual branches needed at one time would be small.
>
>
> A distributed backend is a nice idea anyway - imagine a meteor hitting
> the Florida data centres ...
>
> And there are third-party users who could benefit from a highly
> distributed backend, such as Wikileaks.
>
> This thread should probably move to mediawiki-l ...
>
>
> - d.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo [at] vger
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


midom.lists at gmail

Oct 23, 2009, 1:02 AM

Post #36 of 37 (3574 views)
Permalink
Re: Wikipedia meets git [In reply to]

> A distributed backend is a nice idea anyway - imagine a meteor hitting
> the Florida data centres ...

give me that stuff you all just had, I want it too.

Domas

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


millosh at gmail

Oct 23, 2009, 7:22 AM

Post #37 of 37 (3566 views)
Permalink
Re: Wikipedia meets git [In reply to]

For a couple of years I have an idea with another approach: Use
MediaWiki as a software repository. Actually, I am already doing so on
my local MediaWiki installation. I even had some python scripts (based
on pywikipediabot) which are dealing with importing and exporting
source code (but I don't know where are they now).

It is a perfect choice while we are talking about classical
implementations of VCSs. With distributed systems in the game, it is
necessary to make some more clever method for using MediaWiki for that
purpose. Actually, there is no need for a lot of coding. By using
flagged revisions and some control pages on wiki, it is possible to
have coherent system for version control.

For example, I may list page revisions which makes version
3.43.77-millosh at a separate page and I may download exactly that
version of some software.

I think that it is much better approach than using git or whatever for
software development purpose. Coding may be done by using some editor
or IDE on local system, but everything around: writing documentation,
comments, discussion and similar -- may be don on wiki. And I think
that we have all necessary tools for giving to MediaWiki such purpose,
as well as we don't need a lot of time for that: a couple of months at
most. Also, there wouldn't be a need for making API for git for the
basic purposes.

It would solve distributed Wikipedia issue, too. By using some naming
system, code/text may have "dynamic" (on MediaWiki installation) and
"static" (on file system) repositories (actually, both of them would
be dynamic).

It is possible that I've missed something. If it is so, please let me
know that :)

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.