Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

[Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

 

 

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


kim at bruning

May 16, 2012, 7:01 PM

Post #1 of 13 (384 views)
Permalink
[Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

>
> We at Archive Team are attempting to download all the 700,000 Knols.[3] For
> the sake of history. Join us, #archiveteam EFNET.
>

I did some followup. I'm not sure I can help out with Knol
anymore, but I discovered that AT is having some trouble
making good archives of wikimedia sites.

Theoretically, wikipedia et al SHOULD be easy to
reconstitute, right? That's why we're using CC licenses
and all. Else if we drop the ball, WP will be gone.
This seems like a priority to me!

The main problem seems to be obtaining commons images:
http://archiveteam.org/index.php?title=Wikiteam

So at the very least, we don't appear to have very good
documentation. Who could best help Archive Team out? Has
anyone done/written documentation on completely restoring 1
or more wikimedia wikis from 'public backup' [1]?

What can we do to help them?

sincerely,
Kim Bruning

[1] "Real Men don't make backups. They upload it via ftp and
let the world mirror it." - Linus Torvalds

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


phoenixoverride at gmail

May 16, 2012, 8:11 PM

Post #2 of 13 (378 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

I know from experience that a wiki can be re-built from any one of the
dumps that are provided, (pages-meta-current) for example contains
everything needed to reboot a site except its user database
(names/passwords ect). see
http://www.mediawiki.org/wiki/Manual:Moving_a_wiki

On Wed, May 16, 2012 at 10:01 PM, Kim Bruning <kim [at] bruning> wrote:

> >
> > We at Archive Team are attempting to download all the 700,000 Knols.[3]
> For
> > the sake of history. Join us, #archiveteam EFNET.
> >
>
> I did some followup. I'm not sure I can help out with Knol
> anymore, but I discovered that AT is having some trouble
> making good archives of wikimedia sites.
>
> Theoretically, wikipedia et al SHOULD be easy to
> reconstitute, right? That's why we're using CC licenses
> and all. Else if we drop the ball, WP will be gone.
> This seems like a priority to me!
>
> The main problem seems to be obtaining commons images:
> http://archiveteam.org/index.php?title=Wikiteam
>
> So at the very least, we don't appear to have very good
> documentation. Who could best help Archive Team out? Has
> anyone done/written documentation on completely restoring 1
> or more wikimedia wikis from 'public backup' [1]?
>
> What can we do to help them?
>
> sincerely,
> Kim Bruning
>
> [1] "Real Men don't make backups. They upload it via ftp and
> let the world mirror it." - Linus Torvalds
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>
_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


jayvdb at gmail

May 16, 2012, 8:14 PM

Post #3 of 13 (382 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

On Thu, May 17, 2012 at 1:11 PM, John <phoenixoverride [at] gmail> wrote:
>
> I know from experience that a wiki can be re-built from any one of the
> dumps that are provided, (pages-meta-current) for example contains
> everything needed to reboot a site except its user database
> (names/passwords ect). see
> http://www.mediawiki.org/wiki/Manual:Moving_a_wiki

How would we regain control of our existing usernames in the event
that the user database was lost in the move?

--
John Vandenberg

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


p858snake at gmail

May 16, 2012, 10:03 PM

Post #4 of 13 (375 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

On Thu, May 17, 2012 at 1:14 PM, John Vandenberg <jayvdb [at] gmail> wrote:
> How would we regain control of our existing usernames in the event
> that the user database was lost in the move?

That would be up to the end project to decide, Although ideally they
shouldn't unless you can prove some how it was you otherwise there is
possible issues with mis-attribution if someone else managed to regain
the account.

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


jamesofur at gmail

May 16, 2012, 10:10 PM

Post #5 of 13 (374 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

If both are accessible I've seen an extension that allowed you to claim
your username. Saw it in action when Wowpedia forked from the Wikia Wowwiki
and they let people claim their old usernames with an edit (and code in
edit summary iirc) on the other wiki.

James

On Wed, May 16, 2012 at 10:03 PM, K. Peachey <p858snake [at] gmail> wrote:

> On Thu, May 17, 2012 at 1:14 PM, John Vandenberg <jayvdb [at] gmail> wrote:
> > How would we regain control of our existing usernames in the event
> > that the user database was lost in the move?
>
> That would be up to the end project to decide, Although ideally they
> shouldn't unless you can prove some how it was you otherwise there is
> possible issues with mis-attribution if someone else managed to regain
> the account.
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>
_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


emijrp at gmail

May 17, 2012, 5:01 AM

Post #6 of 13 (370 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

The only issues for Wikimedia projects perservation/forking by third
parties are the missing image dumps (which are being created since some
days ago, thanks Ariel) and the usernames/passwords table (not a big
problem in an apocalyptic scenario, where articles and images have top
priority).

We at WikiTeam are uploading wiki dumps to Internet Archive, and recently
some official mirrors of Wikimedia dumps (articles + images) are being
created around the globe (currently in 3 different locations).

I think we are taking great steps in the last year.

2012/5/17 Kim Bruning <kim [at] bruning>

> >
> > We at Archive Team are attempting to download all the 700,000 Knols.[3]
> For
> > the sake of history. Join us, #archiveteam EFNET.
> >
>
> I did some followup. I'm not sure I can help out with Knol
> anymore, but I discovered that AT is having some trouble
> making good archives of wikimedia sites.
>
> Theoretically, wikipedia et al SHOULD be easy to
> reconstitute, right? That's why we're using CC licenses
> and all. Else if we drop the ball, WP will be gone.
> This seems like a priority to me!
>
> The main problem seems to be obtaining commons images:
> http://archiveteam.org/index.php?title=Wikiteam
>
> So at the very least, we don't appear to have very good
> documentation. Who could best help Archive Team out? Has
> anyone done/written documentation on completely restoring 1
> or more wikimedia wikis from 'public backup' [1]?
>
> What can we do to help them?
>
> sincerely,
> Kim Bruning
>
> [1] "Real Men don't make backups. They upload it via ftp and
> let the world mirror it." - Linus Torvalds
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>



--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


emijrp at gmail

May 17, 2012, 5:22 AM

Post #7 of 13 (369 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

They are XML dumps. Why did you say they are semi-useless?

I'm not sure if all the MediaWiki revision table parameters are available
in the XML dumps, but most of them are.

2012/5/17 Anthony <wikimail [at] inbox>

> On Thu, May 17, 2012 at 8:01 AM, emijrp <emijrp [at] gmail> wrote:
> > We at WikiTeam are uploading wiki dumps to Internet Archive, and recently
> > some official mirrors of Wikimedia dumps (articles + images) are being
> > created around the globe (currently in 3 different locations).
>
> Are these actual database dumps, or are they those semi-useless XML dumps?
>



--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


tom at tommorris

May 17, 2012, 5:31 AM

Post #8 of 13 (369 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

WARNING: The following post is a work of technical fantasy rather than practical reality.

On the usernames and passwords thing, if we imagine our doomsday scenario (meteor hits the WMF data centre, the Foundation turn into evil psychopathic Nazis, whatever), one thing that might be useful and archive-oriented developers might want to consider would be some way of 'namespacing' usernames. That way, we could have it so a fork/new version could specify that, say, all the usernames on all the existing content are usernames on en.wikipedia.org, and distinguish those from the usernames on post-apocalyptic Wikipedia. That way we can keep the attribution chain to the old usernames without the issue of identity theft.

It'd also be a good step towards attribution in distributed wikis. This might be for something like a future attempt at Citizendium (or perhaps someone wants to make a version of Wikipedia with pending changes or the image filter or one of the other many things the community cannot agree on).

In addition, it would be useful to be able to distinguish with usernames on sites that reuse Commons images (if I upload an image to Commons with the username 'Tom Morris' and then some non-WMF wiki reuses it, it may be attributing it to the local user 'Tom Morris' rather than the Commons user).

Finally, it'd be potentially useful for wikis which use some Wikipedia content combined with some local content. For instance, I know wikiqueer.org uses Wikipedia content with attribution, and combines the encyclopaedic content of Wikipedia with non-encyclopedic community content that wouldn't meet up with Wikipedia's mission or NPOV (they have the supposedly very controversial POV that LGBT people deserve equal rights).

In all these cases, as well as our potential doomsday scenario, being able to clearly distinguish between local usernames and usernames on other wikis might be quite useful. The inner semantic web dork suggests that perhaps we could consider using something like a uniform resource indicator (URI) to identify users. ;-)

We could also consider the possibility of allowing users to use OpenID or OAuth or whatever the web identity mechanism du jour is to allow loose affiliation of usernames between MediaWiki installs. That way you can establish the link between identities across wikis (of course, if you don't want to, you don't have to).

--
Tom Morris
<http://tommorris.org/>



_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


wikimail at inbox

May 17, 2012, 5:32 AM

Post #9 of 13 (370 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

On Thu, May 17, 2012 at 8:22 AM, emijrp <emijrp [at] gmail> wrote:
> They are XML dumps. Why did you say they are semi-useless?

Because they are XML dumps, mainly. The data in the WMF database is
compressed in a format which can be easily randomly accessed. The
dump procedure is to uncompress it, convert it to XML. and then
recompress it, in a format which can't be easily randomly accessed.
The import procedure is to uncompress the "dump", convert it from XML,
and then recompress it in a format which is easily randomly accessed.

There are some hacks to get around this with the bz2 version of the
"dump", but this is far less efficient than the format which the data
already is in before the "dump" process takes place.

> I'm not sure if all the MediaWiki revision table parameters are available in
> the XML dumps, but most of them are.

The main problem is that they are compressed in a format which is
terrible for actual use. The missing information (mostly, indexes),
is a secondary problem, however.

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


wikimail at inbox

May 17, 2012, 5:34 AM

Post #10 of 13 (370 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

On Thu, May 17, 2012 at 8:31 AM, Tom Morris <tom [at] tommorris> wrote:
> We could also consider the possibility of allowing users to use OpenID or OAuth or whatever the web identity mechanism du jour is to allow loose affiliation of usernames between MediaWiki installs. That way you can establish the link between identities across wikis (of course, if you don't want to, you don't have to).

Also, there's http://en.wikipedia.org/wiki/Template:User_committed_identity

But most people don't seem to care about these things.

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


tom at tommorris

May 17, 2012, 5:37 AM

Post #11 of 13 (373 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

On Thursday, 17 May 2012 at 13:34, Anthony wrote:
> On Thu, May 17, 2012 at 8:31 AM, Tom Morris <tom [at] tommorris (mailto:tom [at] tommorris)> wrote:
> > We could also consider the possibility of allowing users to use OpenID or OAuth or whatever the web identity mechanism du jour is to allow loose affiliation of usernames between MediaWiki installs. That way you can establish the link between identities across wikis (of course, if you don't want to, you don't have to).
>
>
> Also, there's http://en.wikipedia.org/wiki/Template:User_committed_identity
>
> But most people don't seem to care about these things.

Sure, the use cases of Committed Identities are slightly different.

--
Tom Morris
<http://tommorris.org/>



_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


thomas.dalton at gmail

May 17, 2012, 5:38 AM

Post #12 of 13 (372 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

On 17 May 2012 13:32, Anthony <wikimail [at] inbox> wrote:
> Because they are XML dumps, mainly.  The data in the WMF database is
> compressed in a format which can be easily randomly accessed.

It's a dump. It's not supposed to be randomly accessed. We're talking
about archives, not mirrors.

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l


wikimail at inbox

May 17, 2012, 5:42 AM

Post #13 of 13 (371 views)
Permalink
Re: [Wikimedia-l] Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow ) [In reply to]

On Thu, May 17, 2012 at 8:38 AM, Thomas Dalton <thomas.dalton [at] gmail> wrote:
> On 17 May 2012 13:32, Anthony <wikimail [at] inbox> wrote:
>> Because they are XML dumps, mainly.  The data in the WMF database is
>> compressed in a format which can be easily randomly accessed.
>
> It's a dump.

Not really. Yes, it's called that. And historically, it was that,
but the XML "dumps" aren't really dumps at all.

> It's not supposed to be randomly accessed. We're talking
> about archives, not mirrors.

That's why I said they're semi-useless (i.e. half-useless), not useless.

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.