Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

On good faith: why don't we use mysqldump?

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


glimmer_phoenix at yahoo

Apr 18, 2008, 11:19 AM

Post #1 of 6 (1701 views)
Permalink
On good faith: why don't we use mysqldump?

Hello.

Yesterday, I was moving around mysqldump files of our processed databases from parsed Wikipedia dumps, and this simple question came to my mind.

Is there any special reason to use an "ad-hoc" XML schema for Wikipedia dumps?
Could a mysqldump on every language edition slow down the Wikipedia MySQL server?

I guess some problem could arise, and that's why we don't use it. Otherwise, perhaps we could consider creating such mysqldump, to speed up the import process back to our local servers, instead of having to parse a huge XML file.

That's specially true for the very large meta-history.xml versions. And you still can filter out sensible tables (user, etc.).

Regards,

Felipe.


---------------------------------

Enviado desde Correo Yahoo!
La bandeja de entrada más inteligente.
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at wikimedia

Apr 18, 2008, 11:45 AM

Post #2 of 6 (1683 views)
Permalink
Re: On good faith: why don't we use mysqldump? [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Felipe Ortega wrote:
> Yesterday, I was moving around mysqldump files of our processed
> databases from parsed Wikipedia dumps, and this simple question came
> to my mind.
>
> Is there any special reason to use an "ad-hoc" XML schema for
> Wikipedia dumps?

1) The format is relatively stable, unlike our database schema.

2) Our databases are spread over dozens of servers, in mixes of internal
binary compression formats whose interpretation is dependent on our
configuration and custom code.

3) Our internal databases mix public and private information, which we
have to separate for external dumps. Thus only completely public tables
are dumped with mysqldump.

Thus, we use a stable, safe data schema for public page dumps. Dumping
raw SQL of these tables would be unstable, insecure, and useless for
most people.

- -- brion vibber (brion @ wikimedia.org)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgI7FcACgkQwRnhpk1wk46jWwCfSEAayLMoFIokCrEMuvdlcBUC
ht4An3M+t1Xo0kjv6vS6NRTOsYkYPi+G
=2bU3
-----END PGP SIGNATURE-----

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


glimmer_phoenix at yahoo

Apr 18, 2008, 2:37 PM

Post #3 of 6 (1676 views)
Permalink
Re: On good faith: why don't we use mysqldump? [In reply to]

Brion Vibber <brion [at] wikimedia> escribió:
1) The format is relatively stable, unlike our database schema.

Sure, that's why we also have to update tables.sql (sometimes) to load the data back to the server :) .

2) Our databases are spread over dozens of servers, in mixes of internal
binary compression formats whose interpretation is dependent on our
configuration and custom code.

3) Our internal databases mix public and private information, which we
have to separate for external dumps. Thus only completely public tables
are dumped with mysqldump.

That makes sense, Brion. Thank you for this clarification.

Regards,

Felipe.


- -- brion vibber (brion @ wikimedia.org)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgI7FcACgkQwRnhpk1wk46jWwCfSEAayLMoFIokCrEMuvdlcBUC
ht4An3M+t1Xo0kjv6vS6NRTOsYkYPi+G
=2bU3
-----END PGP SIGNATURE-----

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



---------------------------------

Enviado desde Correo Yahoo!
La bandeja de entrada más inteligente.
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


howachen at gmail

Apr 18, 2008, 11:35 PM

Post #4 of 6 (1685 views)
Permalink
Re: On good faith: why don't we use mysqldump? [In reply to]

Hi

On Sat, Apr 19, 2008 at 2:45 AM, Brion Vibber <brion [at] wikimedia> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Felipe Ortega wrote:
> > Yesterday, I was moving around mysqldump files of our processed
> > databases from parsed Wikipedia dumps, and this simple question came
> > to my mind.
> >
> > Is there any special reason to use an "ad-hoc" XML schema for
> > Wikipedia dumps?
>
> 1) The format is relatively stable, unlike our database schema.
>
> 2) Our databases are spread over dozens of servers, in mixes of internal
> binary compression formats whose interpretation is dependent on our
> configuration and custom code.
>
> 3) Our internal databases mix public and private information, which we
> have to separate for external dumps. Thus only completely public tables
> are dumped with mysqldump.
>
> Thus, we use a stable, safe data schema for public page dumps. Dumping
> raw SQL of these tables would be unstable, insecure, and useless for
> most people.
>

I agree dump to SQL statements is a little bit useless, but how about CSV ?

mysqldump allow you to dump to CSV file instead of raw sql statements
(you can specify the fieds your want), they are pretty safe, and
storage efficient for download.

Even better, mysqlimport can import those CSV at a very high speed.

Of course many people are already using the XML file already, so I am
not asking you to change, but provide another set of dump in CSV
format, which can save many people in term of file downloading, XML
parsing ect.

What do you think?


Thanks.
Howard

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


minuteelectron at googlemail

Apr 19, 2008, 2:03 AM

Post #5 of 6 (1678 views)
Permalink
Re: On good faith: why don't we use mysqldump? [In reply to]

howard chen wrote:
> Hi
>
> On Sat, Apr 19, 2008 at 2:45 AM, Brion Vibber <brion [at] wikimedia> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>> Felipe Ortega wrote:
>> > Yesterday, I was moving around mysqldump files of our processed
>> > databases from parsed Wikipedia dumps, and this simple question came
>> > to my mind.
>> >
>> > Is there any special reason to use an "ad-hoc" XML schema for
>> > Wikipedia dumps?
>>
>> 1) The format is relatively stable, unlike our database schema.
>>
>> 2) Our databases are spread over dozens of servers, in mixes of internal
>> binary compression formats whose interpretation is dependent on our
>> configuration and custom code.
>>
>> 3) Our internal databases mix public and private information, which we
>> have to separate for external dumps. Thus only completely public tables
>> are dumped with mysqldump.
>>
>> Thus, we use a stable, safe data schema for public page dumps. Dumping
>> raw SQL of these tables would be unstable, insecure, and useless for
>> most people.
>>
>>
>
> I agree dump to SQL statements is a little bit useless, but how about CSV ?
>
> mysqldump allow you to dump to CSV file instead of raw sql statements
> (you can specify the fieds your want), they are pretty safe, and
> storage efficient for download.
>
> Even better, mysqlimport can import those CSV at a very high speed.
>
Issue 1, 2 and 3 that apply to SQL also apply to any other form of dump
done via MySQL, including CVS. There is no feasible way of providing a
CVS dump for the same reasons that a SQL one cannot be. The problem here
is not the format, but the process it is created via.

MinuteElectron.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


leon at leonweber

Apr 19, 2008, 2:03 AM

Post #6 of 6 (1685 views)
Permalink
Re: On good faith: why don't we use mysqldump? [In reply to]

On 19.04.2008 14:35:18, howard chen wrote:
> I agree dump to SQL statements is a little bit useless, but how about CSV ?

Bah, CSV is deprecated. XML is a much more flexible, and even
human-readable format. It is so flexible that if you want CSV, you can
easily transform the XML into CSV using XSLT.

Leon

--
Leon Weber, leon [at] leonweber 0x8E04D7FC
blog: https://leonweber.de/blog
jabber: leon [at] jabber (icq: 261067046)
--
Sagt der Richter: Die Zeugin hat entbunden. Sie kann neu geladen werden.

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.