
jamesmikedupont at googlemail
Oct 17, 2009, 1:40 AM
Post #19 of 37
(5181 views)
Permalink
|
I have On Sat, Oct 17, 2009 at 10:18 AM, Gregory Maxwell <gmaxwell [at] gmail> wrote: > On Fri, Oct 16, 2009 at 10:31 AM, Anthony <wikimail [at] inbox> wrote: >> On Fri, Oct 16, 2009 at 12:45 AM, jamesmikedupont [at] googlemail >>> if you want only the last 3 revisions checked out , it takes about 10 >>> seconds and produces 300k of data. >> >> 10 seconds? That's horrible. Have you tried using svn? > > Still— much of the neat things that can be done by having the article > in git are only possible if you have the complete history, for > example: generating a blame map needs the entire history. yes, and if you just want to view and edit then you need one revision. if you want to do more, you can pull the history. > > It would be nice if the git archival format was more efficient for the > kinds of changes made in Wikipedia articles: Source code changes tends > to have short lines and changes tend to change a significant portion > of the lines, while edits on Wikipedia are far more likely to change > only part of a very long line (really, a paragraph).... so working > with line level deltas is efficient for source code while inefficient > for Wikipedia data. I have started to work on the blame code to bring it down to the char level and learn about it. I am willing to invest some time to learn how to make git better for WMF. it is much more interesting than hacking php code. Also, I have been able to use the wm-render code on the git archive, you can see the results of new version of my reader script here : 2 hours of reading the full article : http://www.archive.org/details/KosovoWikipediaArticlesVideo I am thinking to store the wikipedia articles in the intermediate xml parse tree format from mw-render, if that would help the diff toos. Another idea would be to allow editing of the articles with open office for example, and provide tracibility in the document structure back to the original article. it could be marked up with blame information, even more, the blame information could be embedded in each word, with an xml attribute. that would allow for exact tracking of where the edits come from. mike _______________________________________________ foundation-l mailing list foundation-l [at] lists Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
|