
howachen at gmail
Apr 18, 2008, 11:35 PM
Post #4 of 6
(899 views)
Permalink
|
|
Re: On good faith: why don't we use mysqldump?
[In reply to]
|
|
Hi On Sat, Apr 19, 2008 at 2:45 AM, Brion Vibber <brion[at]wikimedia.org> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > Felipe Ortega wrote: > > Yesterday, I was moving around mysqldump files of our processed > > databases from parsed Wikipedia dumps, and this simple question came > > to my mind. > > > > Is there any special reason to use an "ad-hoc" XML schema for > > Wikipedia dumps? > > 1) The format is relatively stable, unlike our database schema. > > 2) Our databases are spread over dozens of servers, in mixes of internal > binary compression formats whose interpretation is dependent on our > configuration and custom code. > > 3) Our internal databases mix public and private information, which we > have to separate for external dumps. Thus only completely public tables > are dumped with mysqldump. > > Thus, we use a stable, safe data schema for public page dumps. Dumping > raw SQL of these tables would be unstable, insecure, and useless for > most people. > I agree dump to SQL statements is a little bit useless, but how about CSV ? mysqldump allow you to dump to CSV file instead of raw sql statements (you can specify the fieds your want), they are pretty safe, and storage efficient for download. Even better, mysqlimport can import those CSV at a very high speed. Of course many people are already using the XML file already, so I am not asking you to change, but provide another set of dump in CSV format, which can save many people in term of file downloading, XML parsing ect. What do you think? Thanks. Howard _______________________________________________ Wikitech-l mailing list Wikitech-l[at]lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
|