Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Article revision numbers

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


awight at wikimedia

Jul 16, 2012, 3:22 PM

Post #1 of 15 (1126 views)
Permalink
Article revision numbers

Hello comrades,
I've run into a challenge too interesting to keep to myself ;) My
immediate goal is to prototype an "offline" wikipedia, similar to Kiwix,
which allows the end-user to make edits and synchronize them back to a
central repository like enwiki.

The catch is, how to insert these changes without edit conflicts? With
linear revision numbering, I can't imagine a natural representation of
the data, only some kind of ad-hoc sandbox solution.

Extending the article revision numbering to represent a branching
history would be the natural way to handle optimistic replication.

Non-linear revisioning might also facilitate simpler models for page
protection, and would allow the formation of multiple, independent
consensuses.

-Adam Wight

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jul 16, 2012, 4:10 PM

Post #2 of 15 (1053 views)
Permalink
Re: Article revision numbers [In reply to]

On 17/07/12 00:22, Adam Wight wrote:
> Hello comrades,
> I've run into a challenge too interesting to keep to myself ;) My
> immediate goal is to prototype an "offline" wikipedia, similar to Kiwix,
> which allows the end-user to make edits and synchronize them back to a
> central repository like enwiki.
>
> The catch is, how to insert these changes without edit conflicts? With
> linear revision numbering, I can't imagine a natural representation of
> the data, only some kind of ad-hoc sandbox solution.
>
> Extending the article revision numbering to represent a branching
> history would be the natural way to handle optimistic replication.
>
> Non-linear revisioning might also facilitate simpler models for page
> protection, and would allow the formation of multiple, independent
> consensuses.
>
> -Adam Wight

Actually, the revision table allows for non-linear development (it
stores from which version you edited the article). You could even make
to "win" a version different than the one with the latest timestamp (by
changing page_rev) one.
You will need to change the way of viewing history, however, and add a
system to keep track of "heads" and "merges".
There may be some assumtions accross the codebase about the latest
revision being the active one, too.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


awight at wikimedia

Jul 16, 2012, 4:49 PM

Post #3 of 15 (1052 views)
Permalink
Re: Article revision numbers [In reply to]

On 07/16/2012 04:10 PM, Platonides wrote:
> On 17/07/12 00:22, Adam Wight wrote:
>> Hello comrades,
>> I've run into a challenge too interesting to keep to myself ;) My
>> immediate goal is to prototype an "offline" wikipedia, similar to Kiwix,
>> which allows the end-user to make edits and synchronize them back to a
>> central repository like enwiki.
>>
>> The catch is, how to insert these changes without edit conflicts? With
>> linear revision numbering, I can't imagine a natural representation of
>> the data, only some kind of ad-hoc sandbox solution.
>>
>> Extending the article revision numbering to represent a branching
>> history would be the natural way to handle optimistic replication.
>>
>> Non-linear revisioning might also facilitate simpler models for page
>> protection, and would allow the formation of multiple, independent
>> consensuses.
>>
>> -Adam Wight
> Actually, the revision table allows for non-linear development (it
> stores from which version you edited the article). You could even make
> to "win" a version different than the one with the latest timestamp (by
> changing page_rev) one.
> You will need to change the way of viewing history, however, and add a
> system to keep track of "heads" and "merges".
> There may be some assumtions accross the codebase about the latest
> revision being the active one, too.
>
Cool! That's a nice solution because it's transparent to the end-user's
system. However, if we use the current schema as you're describing, we
would have to reconcile rev_id conflicts during the merge. This seems
like a nasty problem if the merge is asynchronous, for example a batched
changeset sent in email.
-adam

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jul 16, 2012, 5:08 PM

Post #4 of 15 (1049 views)
Permalink
Re: Article revision numbers [In reply to]

On 17/07/12 01:49, Adam Wight wrote:
>> Actually, the revision table allows for non-linear development (it
>> stores from which version you edited the article). You could even make
>> to "win" a version different than the one with the latest timestamp (by
>> changing page_rev) one.
>> You will need to change the way of viewing history, however, and add a
>> system to keep track of "heads" and "merges".
>> There may be some assumtions accross the codebase about the latest
>> revision being the active one, too.
>>
> Cool! That's a nice solution because it's transparent to the end-user's
> system. However, if we use the current schema as you're describing, we
> would have to reconcile rev_id conflicts during the merge. This seems
> like a nasty problem if the merge is asynchronous, for example a batched
> changeset sent in email.
> -adam

Not really. The would be lost in favour of the target ones. You keep a
list of rev_ids in the source wiki and the ones it gets in the target
wiki, adjunting following rev_parent_id to the target wiki numbers.
It could be a problem for merges after the first one, but it's good
enough for the first version.

The nasty problem I see is how to determine the winner in a version
conflict:
B
/
A
\
C

B and C both are revisions with common parent A. How do you handle the
merge? What revision should be shown in the title?


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


datzrott at alizeepathology

Jul 17, 2012, 4:32 AM

Post #5 of 15 (1047 views)
Permalink
Re: Article revision numbers [In reply to]

>> Actually, the revision table allows for non-linear development (it
>> stores from which version you edited the article). You could even
>> make to "win" a version different than the one with the latest
>> timestamp (by changing page_rev) one.
>> You will need to change the way of viewing history, however, and add
>> a system to keep track of "heads" and "merges".
>> There may be some assumtions accross the codebase about the latest
>> revision being the active one, too.
>>
> Cool! That's a nice solution because it's transparent to the
> end-user's system. However, if we use the current schema as you're
> describing, we would have to reconcile rev_id conflicts during the
> merge. This seems like a nasty problem if the merge is asynchronous,
> for example a batched changeset sent in email.
> -adam

This is all a fantastic idea. Distributing Wikipedia in a fashion similar
to git will make it a lot easier to use in areas where Internet connections
are not so common.

I wonder could this sort of feature be implemented in the existing Kiwix
codebase? That would be ideal I think.

Thank you,
Derric Atzrott


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


cmcmahon at wikimedia

Jul 17, 2012, 6:58 AM

Post #6 of 15 (1051 views)
Permalink
Re: Article revision numbers [In reply to]

>
>
> This is all a fantastic idea. Distributing Wikipedia in a fashion similar
> to git will make it a lot easier to use in areas where Internet connections
> are not so common.
>
> I wonder could this sort of feature be implemented in the existing Kiwix
> codebase? That would be ideal I think.
>
>
Ward is working on it. :) http://wardcunningham.github.com/
https://github.com/WardCunningham/Smallest-Federated-Wiki
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tbayer at wikimedia

Jul 17, 2012, 8:48 AM

Post #7 of 15 (1047 views)
Permalink
Re: Article revision numbers [In reply to]

On Tue, Jul 17, 2012 at 4:32 AM, Derric Atzrott <
datzrott [at] alizeepathology> wrote:

> >> Actually, the revision table allows for non-linear development (it
> >> stores from which version you edited the article). You could even
> >> make to "win" a version different than the one with the latest
> >> timestamp (by changing page_rev) one.
> >> You will need to change the way of viewing history, however, and add
> >> a system to keep track of "heads" and "merges".
> >> There may be some assumtions accross the codebase about the latest
> >> revision being the active one, too.
> >>
> > Cool! That's a nice solution because it's transparent to the
> > end-user's system. However, if we use the current schema as you're
> > describing, we would have to reconcile rev_id conflicts during the
> > merge. This seems like a nasty problem if the merge is asynchronous,
> > for example a batched changeset sent in email.
> > -adam
>
> This is all a fantastic idea. Distributing Wikipedia in a fashion similar
> to git will make it a lot easier to use in areas where Internet connections
> are not so common.
>
I have added this thread to
https://en.wikipedia.org/wiki/User:HaeB/Timeline_of_distributed_Wikipedia_proposals
.

>
> I wonder could this sort of feature be implemented in the existing Kiwix
> codebase? That would be ideal I think.
>
> Thank you,
> Derric Atzrott
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
Tilman Bayer
Senior Operations Analyst (Movement Communications)
Wikimedia Foundation
IRC (Freenode): HaeB
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


wicke at wikidev

Jul 17, 2012, 11:06 AM

Post #8 of 15 (1048 views)
Permalink
Re: Article revision numbers [In reply to]

On 07/16/2012 04:49 PM, Adam Wight wrote:
> Cool! That's a nice solution because it's transparent to the end-user's
> system. However, if we use the current schema as you're describing, we
> would have to reconcile rev_id conflicts during the merge. This seems
> like a nasty problem if the merge is asynchronous, for example a batched
> changeset sent in email.

And that would be the core problem of asynchronous optimistic
replication ;) Simple last-write-wins or union (for shopping carts..)
strategies are still manageable, but merging textual changes is harder.
Manual intervention will often be needed.

The editor rather than some unsuspecting reader should be best equipped
to resolve these conflicts, so some degree of synchrony in the 'push'
stage might make sense to provide an opportunity for editor-guided merging.

Gabriel


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


spam at ludd

Jul 20, 2012, 10:01 AM

Post #9 of 15 (1037 views)
Permalink
Re: Article revision numbers [In reply to]

wicke [at] wikidev:
> On 07/16/2012 04:49 PM, Adam Wight wrote:
> > Cool! That's a nice solution because it's transparent to the end-user's
> > system. However, if we use the current schema as you're describing, we
> > would have to reconcile rev_id conflicts during the merge. This seems
> > like a nasty problem if the merge is asynchronous, for example a batched
> > changeset sent in email.
>
> And that would be the core problem of asynchronous optimistic
> replication ;) Simple last-write-wins or union (for shopping carts..)
> strategies are still manageable, but merging textual changes is harder.
> Manual intervention will often be needed.
>
> The editor rather than some unsuspecting reader should be best equipped
> to resolve these conflicts, so some degree of synchrony in the 'push'
> stage might make sense to provide an opportunity for editor-guided merging.
>
> Gabriel

Although it might be simpler for the original editor to merge their
own changes, that's not always what we want. The most flexible
arrangement would be to separate the process into three workflows:
edit, synchronize, and merge. Different people could perform each
stage, or they can be folded together when appropriate.

On protected pages, for example, we specifically want some amount of
peer review before deciding to merge. This could be seen as positive
feedback also, if each successfully merged change comes with a bit of
validation by the community.

Even a simple branching model will offer some delicious low-hanging
fruit, for example, editors could "Save Draft" for any article and
resume editing later.

-adam

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


lars at aronsson

Jul 20, 2012, 2:22 PM

Post #10 of 15 (1037 views)
Permalink
Re: Article revision numbers [In reply to]

On 2012-07-17 07:32, Derric Atzrott wrote:
> This is all a fantastic idea. Distributing Wikipedia in a fashion
> similar to git will make it a lot easier to use in areas where
> Internet connections are not so common.

It always surprises me when people express enthusiasm for
this kind of idea, since my instinct assumption is the exact
opposite: that this couldn't possibly be feasible or practical.

Just out of curiosity, how large are the git-managed projects
that you have successfully handled this way? Number of
files, lines of code, bytes or commits per day? Did you ever
run into a software project where a fully decentralized git
solution was impractical, e.g. because pulling in the daily
updates took more than an hour on your available bandwidth?


--
Lars Aronsson (lars [at] aronsson)
Aronsson Datateknik - http://aronsson.se


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


datzrott at alizeepathology

Jul 23, 2012, 4:25 AM

Post #11 of 15 (1023 views)
Permalink
Re: Article revision numbers [In reply to]

>> This is all a fantastic idea. Distributing Wikipedia in a fashion
>> similar to git will make it a lot easier to use in areas where
>> Internet connections are not so common.
>
>It always surprises me when people express enthusiasm for
>this kind of idea, since my instinct assumption is the exact
>opposite: that this couldn't possibly be feasible or practical.
>
>Just out of curiosity, how large are the git-managed projects
>that you have successfully handled this way? Number of files,
>lines of code, bytes or commits per day? Did you ever run into
>a software project where a fully decentralized git solution was
>impractical, e.g. because pulling in the daily updates took
>more than an hour on your available bandwidth?

I can't say that I've handled an large git-managed projects this way, but I
am to understand that this is the very thing for which git was designed.
Given this I would hope that a git like model would be good for
decentralized editing.

Thank you,
Derric Atzrott


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


innocentkiller at gmail

Jul 23, 2012, 4:32 AM

Post #12 of 15 (1019 views)
Permalink
Re: Article revision numbers [In reply to]

On Mon, Jul 23, 2012 at 7:25 AM, Derric Atzrott
<datzrott [at] alizeepathology> wrote:
>>> This is all a fantastic idea. Distributing Wikipedia in a fashion
>>> similar to git will make it a lot easier to use in areas where
>>> Internet connections are not so common.
>>
>>It always surprises me when people express enthusiasm for
>>this kind of idea, since my instinct assumption is the exact
>>opposite: that this couldn't possibly be feasible or practical.
>>
>>Just out of curiosity, how large are the git-managed projects
>>that you have successfully handled this way? Number of files,
>>lines of code, bytes or commits per day? Did you ever run into
>>a software project where a fully decentralized git solution was
>>impractical, e.g. because pulling in the daily updates took
>>more than an hour on your available bandwidth?
>
> I can't say that I've handled an large git-managed projects this way, but I
> am to understand that this is the very thing for which git was designed.
> Given this I would hope that a git like model would be good for
> decentralized editing.
>

It's really not. Things that are (relatively) simple in the database tend
to require walking the entire revision tree in Git in order to figure the
same data out.

Git is awesome for software development, but trying to use it as an
article development tool is really a bad solution in search of a
problem. We could've had the same argument years ago and said
"why use a database, SVN stores information in a linear history
that's useful for articles." Having diverging articles may be cool/
desired, but using Git is not the answer.

-Chad

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


datzrott at alizeepathology

Jul 23, 2012, 5:13 AM

Post #13 of 15 (1017 views)
Permalink
Re: Article revision numbers [In reply to]

>It's really not. Things that are (relatively) simple in the
>database tend to require walking the entire revision
>tree in Git in order to figure the same data out.
>
>Git is awesome for software development, but trying
>to use it as an article development tool is really a bad
>solution in search of a problem. We could've had the
>same argument years ago and said "why use a database,
>SVN stores information in a linear history that's useful
>for articles." Having diverging articles may be cool/
>desired, but using Git is not the answer.
>
>-Chad

Fair enough. I learn something new every day. I definitely think that
distributed article editing is a great idea, even if a git-like system is
not the answer to it.

Thank you,
Derric Atzrott


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


spam at ludd

Jul 23, 2012, 9:59 AM

Post #14 of 15 (1016 views)
Permalink
Re: Article revision numbers [In reply to]

> >It's really not. Things that are (relatively) simple in the
> >database tend to require walking the entire revision
> >tree in Git in order to figure the same data out.
> >
> >Git is awesome for software development, but trying
> >to use it as an article development tool is really a bad
> >solution in search of a problem. We could've had the
> >same argument years ago and said "why use a database,
> >SVN stores information in a linear history that's useful
> >for articles." Having diverging articles may be cool/
> >desired, but using Git is not the answer.
> >
> >-Chad
>
> Fair enough. I learn something new every day. I definitely think that
> distributed article editing is a great idea, even if a git-like system is
> not the answer to it.
>
> Thank you,
> Derric Atzrott

Git is almost never used in a truly decentralized fashion, so it isn't
optimized for that type of use. See git "hub", for example.
Actual peer-to-peer is infinitely more scalable ;) because you don't
have one poor enterprise Java server getting hit by everyone in the
world, instead individuals are distributing the load among themselves.

That would be a difficult model for Wikipedia however, because
maintaining an authoritative edition would require centralized
cryptography, at the least.

Allowing articles on our central server to diverge temporarily is
easily achievable, with very little overhead. In fact, when you
consider the savings in revert wars, maybe there is a net gain.

I'm interested in writing a mediawiki extension to allow us to
experiment with this idea.

-Adam

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


lists at asheesh

Jul 25, 2012, 11:37 PM

Post #15 of 15 (1002 views)
Permalink
Re: Article revision numbers [In reply to]

Excerpts from Adam Wight's message of Mon Jul 16 18:22:22 -0400 2012:
> Hello comrades,
> I've run into a challenge too interesting to keep to myself ;) My
> immediate goal is to prototype an "offline" wikipedia, similar to Kiwix,
> which allows the end-user to make edits and synchronize them back to a
> central repository like enwiki.
>
> The catch is, how to insert these changes without edit conflicts? With
> linear revision numbering, I can't imagine a natural representation of
> the data, only some kind of ad-hoc sandbox solution.
>
> Extending the article revision numbering to represent a branching
> history would be the natural way to handle optimistic replication.
>
> Non-linear revisioning might also facilitate simpler models for page
> protection, and would allow the formation of multiple, independent
> consensuses.

There is a tool for managing non-linear history in mediawiki data sets.
It's actually a combination of git, the version control system, and
the MediaWiki API. It's called git-remote-mediawiki.

First, I'll quote its documentation:

<quote>
Getting started with Git-Mediawiki

Then, the first operation you should do is cloning the remote mediawiki. To do so, run the command

git clone mediawiki::http://yourwikiadress.com

You can commit your changes locally as usual with the command

git commit
</quote>

You can read more here: https://github.com/Bibzball/Git-Mediawiki/wiki/User-manual

I've been enjoying it lately, though it has some rough edges. It is under
periodic development, and in the near future I plan to make more of a user
community around it.

It is probably entirely unwiedly to use on English Wikipedia directly, but it
could be adjusted to permit the importing of database dumps, and then let people
branch off those.

-- Asheesh.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.