
jamesmikedupont at googlemail
May 18, 2012, 3:12 AM
Post #28 of 28
(61 views)
Permalink
|
|
Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )
[In reply to]
|
|
there is no 10gb limit, but it is the recommended bucket size if you want to split up the file, according to my recent discussion with the archive.org team, and they have been helping me optimize the storage. the idea of mine is to make smaller blocks that can be fetched quickly and that people for example reading an article could just load the data needed to display would be availab le via json(p) or xml/text from a file. we can make the wikipedia in a read only mode hosted totallz on the archive org without a database server by encoding the search binary trees in json data stored also on archive org, the clients can perform the searches themselves. that is my current research on fosm.org and i hope it can apply to the wikipedia as well. mike On Fri, May 18, 2012 at 9:41 AM, emijrp <emijrp [at] gmail> wrote: > There is no such 10GB limit, > http://archive.org/details/ARCHIVETEAM-YV-6360017-6399947 (238 GB example) > > ArchiveTeam/WikiTeam is uploading some dumps to Internet Archive, if you > want to join the effort use the mailing list > https://groups.google.com/group/wikiteam-discuss to avoid wasting resources. > > 2012/5/18 Mike Dupont <jamesmikedupont [at] googlemail> > >> Hello People, >> I have completed my first set in uploading the osm/fosm dataset (350gb >> unpacked) to archive.org >> http://osmopenlayers.blogspot.de/2012/05/upload-finished.html >> >> We can do something similar with wikipedia, the bucket size of >> archive.org is 10gb, we need to split up the data in a way that it is >> useful. I have done this by putting each object on one line and each >> file contains the full data records and the parts that belong to the >> previous block and next block, so you are able to process the blocks >> almost stand alone. >> >> mike >> >> _______________________________________________ >> Wikimedia-l mailing list >> Wikimedia-l [at] lists >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l >> > > > > -- > Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com > Pre-doctoral student at the University of Cádiz (Spain) > Projects: AVBOT <http://code.google.com/p/avbot/> | > StatMediaWiki<http://statmediawiki.forja.rediris.es> > | WikiEvidens <http://code.google.com/p/wikievidens/> | > WikiPapers<http://wikipapers.referata.com> > | WikiTeam <http://code.google.com/p/wikiteam/> > Personal website: https://sites.google.com/site/emijrp/ > _______________________________________________ > Wikimedia-l mailing list > Wikimedia-l [at] lists > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l -- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 _______________________________________________ Wikimedia-l mailing list Wikimedia-l [at] lists Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
|