
michael.monnerie at is
Dec 21, 2008, 12:01 PM
Post #3 of 7
(645 views)
Permalink
|
|
Re: cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail
[In reply to]
|
|
On Mittwoch 17 Dezember 2008 Paul J Stevens wrote: > Michael Monnerie wrote: > > I'm trying to optimize the db using the CLUSTER command from > > postgres, which physically resorts records on disk according to an > > index. The goal is to physically place records together on disk > > which you might read together. > > cool. Well so far it seems response times improved. A lot less (almost no) slow queries logged now. These were the commands I used: CLUSTER dbmail_mailboxes_owner_name_idx ON dbmail_mailboxes ; CLUSTER dbmail_headername_1 ON dbmail_headername ; CLUSTER dbmail_messages_7 ON dbmail_messages ; CLUSTER dbmail_physmessage_pkey ON dbmail_physmessage ; CLUSTER dbmail_aliases_domain_alias ON dbmail_aliases ; CLUSTER dbmail_users_domain ON dbmail_users ; CLUSTER dbmail_subscription_pkey ON dbmail_subscription ; CLUSTER dbmail_acl_pkey ON dbmail_acl ; CLUSTER dbmail_datefield_2 ON dbmail_datefield ; # Not sure if this should be used. Is there ever any chance that # headervalues get read sequentially? I think no, so skip it: CLUSTER dbmail_headervalue_1 ON dbmail_headervalue ; # This one takes rather long: CLUSTER dbmail_messageblks_physmessage_is_header_idx ON dbmail_messageblks ; Maybe someone has a better idea of which tables are read in which order, then you can try with another index to cluster. The second thing is that after using cluster, a pg_dump throws out all rows sorted by the index, which makes it easier to find records manually if you need to. > > Best example is "dbmail_messageblks". > Well, they are always ordered by messageblk_idnr. > So, > select messageblk from dbmail_messageblks where physmessage_id=? > order by messageblk_idnr; > will give you the complete message. But there's no index (physmessage_id,messageblk_idnr), which you would need to use cluster. I think using (physmessage_id,is_header) as the sorting key is not too bad, as there are quite some searches for this index. And if a message is read, even if it's not sorted by messageblk_idnr, there's a big chance that RAID-controller readahead and/or Linux kernel readahead might have read that messageblock so we still have it all in cache and read very quick. Might it be worth to create an extra index just for the cluster command? Or other way round: Do you use above select to read a message? Then it could be effective to create such an index anyway, and then we could also let cluster sort by that index. What do you think Paul? Also, for stats: Is there a way to log from dbmail which DB accesses it starts? I would like to know the distribution which commands were run, so we can see which indexes are used and which not. > > messageblk > It's in bytea because it's basically raw 8bit data. Most of the time > it's nicely encoded text data (utf8, or latin1, or whatever), but > most of the time doesn't hack it. Also, the data needs to be stored > as-is, without being transcoded into whatever encoding your database > happens to be in. So, there you are. > > So we could > > DROP INDEX dbmail_messageblks_physmessage_idx ; > > and should receive higher speed and less disk space usage, right? > > Sounds good to me. Try it, and let us know how it works for you. I did, and then lots of these were logged which made me a bit nervous: query failed [DELETE FROM dbmail_physmessage WHERE id = 1676958] : [.ERROR: could not open relation with OID 102021 CONTEXT: SQL statement "DELETE FROM ONLY "public"."dbmail_messageblks" WHERE "physmessage_id" = $1" ] But I got the tip from the pgsql-admin mailing list, to just restart all db connections (aka restart dbmail), and this worked. So far nobody seems to miss that index. mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
|