Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DBMail: users

cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail

 

 

DBMail users RSS feed   Index | Next | Previous | View Threaded


michael.monnerie at is

Dec 16, 2008, 5:13 PM

Post #1 of 7 (695 views)
Permalink
cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail

I'm trying to optimize the db using the CLUSTER command from postgres,
which physically resorts records on disk according to an index. The goal
is to physically place records together on disk which you might read
together.

Best example is "dbmail_messageblks". When you read a message, you read
all messageblks belonging to a message in order. So it's best to sort it
after physmessage_id. (Shouldn't there be a "piece number" counter or
how are the pieces placed in correct order? Via messageblk? Why is that
bytea? ). With that knowledge, you can run

CLUSTER dbmail_messageblks_physmessage_is_header_idx ON
dbmail_messageblks ;

And will have all message blocks in order physmessage_id, is_header.
That should result in higher speed. We'll see.

But, as a side point during my research, I found some indexes that can
be removed because they are duplicates.

# \d dbmail_messageblks_physmessage_idx
Index »public.dbmail_messageblks_physmessage_idx«
Spalte | Typ
----------------+--------
physmessage_id | bigint
btree, für Tabelle »public.dbmail_messageblks«

dbmail=# \d dbmail_messageblks_physmessage_is_header_idx
Index »public.dbmail_messageblks_physmessage_is_header_idx«
Spalte | Typ
----------------+----------
physmessage_id | bigint
is_header | smallint
btree, für Tabelle »public.dbmail_messageblks«

Those two indexes are nearly the same, they have the same primary sort
key. The _is_header_idx has an additional field in the end. So we could
savely remove the first index, right? That means that for every insert
we save an update to that index and the disk space, which should result
in higher speed at the end.

So we could
DROP INDEX dbmail_messageblks_physmessage_idx ;
and should receive higher speed and less disk space usage, right?

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net Key-ID: 1C1209B4
Attachments: signature.asc (0.19 KB)


paul at nfg

Dec 17, 2008, 12:20 AM

Post #2 of 7 (658 views)
Permalink
Re: cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail [In reply to]

Michael Monnerie wrote:
> I'm trying to optimize the db using the CLUSTER command from postgres,
> which physically resorts records on disk according to an index. The goal
> is to physically place records together on disk which you might read
> together.

cool.

>
> Best example is "dbmail_messageblks". When you read a message, you read
> all messageblks belonging to a message in order. So it's best to sort it
> after physmessage_id. (Shouldn't there be a "piece number" counter or
> how are the pieces placed in correct order?

Well, they are always ordered by messageblk_idnr.
So,
select messageblk from dbmail_messageblks where physmessage_id=? order
by messageblk_idnr;
will give you the complete message.

> Via messageblk? Why is that
> bytea? ). With that knowledge, you can run

It's in bytea because it's basically raw 8bit data. Most of the time
it's nicely encoded text data (utf8, or latin1, or whatever), but most
of the time doesn't hack it. Also, the data needs to be stored as-is,
without being transcoded into whatever encoding your database happens to
be in. So, there you are.


> So we could
> DROP INDEX dbmail_messageblks_physmessage_idx ;
> and should receive higher speed and less disk space usage, right?

Sounds good to me. Try it, and let us know how it works for you.

---
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmail[at]dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail


michael.monnerie at is

Dec 21, 2008, 12:01 PM

Post #3 of 7 (645 views)
Permalink
Re: cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail [In reply to]

On Mittwoch 17 Dezember 2008 Paul J Stevens wrote:
> Michael Monnerie wrote:
> > I'm trying to optimize the db using the CLUSTER command from
> > postgres, which physically resorts records on disk according to an
> > index. The goal is to physically place records together on disk
> > which you might read together.
>
> cool.

Well so far it seems response times improved. A lot less (almost no)
slow queries logged now. These were the commands I used:

CLUSTER dbmail_mailboxes_owner_name_idx ON dbmail_mailboxes ;
CLUSTER dbmail_headername_1 ON dbmail_headername ;
CLUSTER dbmail_messages_7 ON dbmail_messages ;
CLUSTER dbmail_physmessage_pkey ON dbmail_physmessage ;
CLUSTER dbmail_aliases_domain_alias ON dbmail_aliases ;
CLUSTER dbmail_users_domain ON dbmail_users ;
CLUSTER dbmail_subscription_pkey ON dbmail_subscription ;
CLUSTER dbmail_acl_pkey ON dbmail_acl ;
CLUSTER dbmail_datefield_2 ON dbmail_datefield ;
# Not sure if this should be used. Is there ever any chance that
# headervalues get read sequentially? I think no, so skip it:
CLUSTER dbmail_headervalue_1 ON dbmail_headervalue ;
# This one takes rather long:
CLUSTER dbmail_messageblks_physmessage_is_header_idx ON
dbmail_messageblks ;

Maybe someone has a better idea of which tables are read in which order,
then you can try with another index to cluster.
The second thing is that after using cluster, a pg_dump throws out all
rows sorted by the index, which makes it easier to find records manually
if you need to.

> > Best example is "dbmail_messageblks".
> Well, they are always ordered by messageblk_idnr.
> So,
> select messageblk from dbmail_messageblks where physmessage_id=?
> order by messageblk_idnr;
> will give you the complete message.

But there's no index (physmessage_id,messageblk_idnr), which you would
need to use cluster. I think using (physmessage_id,is_header) as the
sorting key is not too bad, as there are quite some searches for this
index. And if a message is read, even if it's not sorted by
messageblk_idnr, there's a big chance that RAID-controller readahead
and/or Linux kernel readahead might have read that messageblock so we
still have it all in cache and read very quick.
Might it be worth to create an extra index just for the cluster command?
Or other way round: Do you use above select to read a message? Then it
could be effective to create such an index anyway, and then we could
also let cluster sort by that index. What do you think Paul?

Also, for stats: Is there a way to log from dbmail which DB accesses it
starts? I would like to know the distribution which commands were run,
so we can see which indexes are used and which not.

> > messageblk
> It's in bytea because it's basically raw 8bit data. Most of the time
> it's nicely encoded text data (utf8, or latin1, or whatever), but
> most of the time doesn't hack it. Also, the data needs to be stored
> as-is, without being transcoded into whatever encoding your database
> happens to be in. So, there you are.

> > So we could
> > DROP INDEX dbmail_messageblks_physmessage_idx ;
> > and should receive higher speed and less disk space usage, right?
>
> Sounds good to me. Try it, and let us know how it works for you.

I did, and then lots of these were logged which made me a bit nervous:
query failed [DELETE FROM dbmail_physmessage WHERE id = 1676958] :
[.ERROR: could not open relation with OID 102021 CONTEXT: SQL statement
"DELETE FROM ONLY "public"."dbmail_messageblks" WHERE "physmessage_id" =
$1" ]

But I got the tip from the pgsql-admin mailing list, to just restart all
db connections (aka restart dbmail), and this worked. So far nobody
seems to miss that index.

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
Attachments: signature.asc (0.19 KB)


paul at nfg

Dec 22, 2008, 1:31 PM

Post #4 of 7 (644 views)
Permalink
Re: cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail [In reply to]

Michael Monnerie wrote:
> On Mittwoch 17 Dezember 2008 Paul J Stevens wrote:
>> Michael Monnerie wrote:
>>> I'm trying to optimize the db using the CLUSTER command from
>>> postgres, which physically resorts records on disk according to an
>>> index. The goal is to physically place records together on disk
>>> which you might read together.
>> cool.
>
> Well so far it seems response times improved. A lot less (almost no)
> slow queries logged now. These were the commands I used:
>

Very good. Could you post it on the wiki? Makes it easier to find for
others. Any stats on how long these commands will lock the tables would
be nice. Perhaps something to include (optionally) in the
dbmail-maintenance framework.


>>> Best example is "dbmail_messageblks".
>> Well, they are always ordered by messageblk_idnr.
>> So,
>> select messageblk from dbmail_messageblks where physmessage_id=?
>> order by messageblk_idnr;
>> will give you the complete message.
>
> But there's no index (physmessage_id,messageblk_idnr), which you would
> need to use cluster. I think using (physmessage_id,is_header) as the
> sorting key is not too bad, as there are quite some searches for this
> index. And if a message is read, even if it's not sorted by
> messageblk_idnr, there's a big chance that RAID-controller readahead
> and/or Linux kernel readahead might have read that messageblock so we
> still have it all in cache and read very quick.
> Might it be worth to create an extra index just for the cluster command?
> Or other way round: Do you use above select to read a message? Then it
> could be effective to create such an index anyway, and then we could
> also let cluster sort by that index. What do you think Paul?

Well, at least for 2.2 that sounds viable to me. But my focus is on 2.3
which uses a different model. I'd be surprised if this could be done at
all for 2.3.

>
> Also, for stats: Is there a way to log from dbmail which DB accesses it
> starts? I would like to know the distribution which commands were run,
> so we can see which indexes are used and which not.

Why not use the stats available on the sql server side? Or maybe I
misread you.


>>> DROP INDEX dbmail_messageblks_physmessage_idx ;
>>> and should receive higher speed and less disk space usage, right?
>> Sounds good to me. Try it, and let us know how it works for you.
>
> I did, and then lots of these were logged which made me a bit nervous:
> query failed [DELETE FROM dbmail_physmessage WHERE id = 1676958] :
> [.ERROR: could not open relation with OID 102021 CONTEXT: SQL statement
> "DELETE FROM ONLY "public"."dbmail_messageblks" WHERE "physmessage_id" =
> $1" ]
>
> But I got the tip from the pgsql-admin mailing list, to just restart all
> db connections (aka restart dbmail), and this worked. So far nobody
> seems to miss that index.

Noted. More for that wiki entry perhaps?



--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmail[at]dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail


michael.monnerie at is

Jan 12, 2009, 12:34 AM

Post #5 of 7 (547 views)
Permalink
Re: cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail [In reply to]

On Montag 22 Dezember 2008 Paul J Stevens wrote:
> Very good. Could you post it on the wiki? Makes it easier to find for
> others. Any stats on how long these commands will lock the tables
> would be nice. Perhaps something to include (optionally) in the
> dbmail-maintenance framework.

I did release the article yesterday.
http://www.dbmail.org/dokuwiki/doku.php?id=performance

> > But there's no index (physmessage_id,messageblk_idnr), which you
> > would need to use cluster. I think using (physmessage_id,is_header)
> > as the sorting key is not too bad, as there are quite some searches
> > for this index. And if a message is read, even if it's not sorted
> > by messageblk_idnr, there's a big chance that RAID-controller
> > readahead and/or Linux kernel readahead might have read that
> > messageblock so we still have it all in cache and read very quick.
> > Might it be worth to create an extra index just for the cluster
> > command? Or other way round: Do you use above select to read a
> > message? Then it could be effective to create such an index anyway,
> > and then we could also let cluster sort by that index. What do you
> > think Paul?
>
> Well, at least for 2.2 that sounds viable to me. But my focus is on
> 2.3 which uses a different model. I'd be surprised if this could be
> done at all for 2.3.

CREATE UNIQUE INDEX dbmail_messageblks_1 ON dbmail_messageblks
(physmessage_id,messageblk_idnr) ;
CLUSTER dbmail_messageblks_1 ON dbmail_messageblks ;

Improved things a lot - no more "slow query" log entries for the whole
database.

> Noted. More for that wiki entry perhaps?

Yep, is in it.

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
Attachments: signature.asc (0.19 KB)


paul at nfg

Jan 12, 2009, 11:20 AM

Post #6 of 7 (544 views)
Permalink
Re: cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail [In reply to]

Michael Monnerie wrote:

>> Well, at least for 2.2 that sounds viable to me. But my focus is on
>> 2.3 which uses a different model. I'd be surprised if this could be
>> done at all for 2.3.
>
> CREATE UNIQUE INDEX dbmail_messageblks_1 ON dbmail_messageblks
> (physmessage_id,messageblk_idnr) ;
> CLUSTER dbmail_messageblks_1 ON dbmail_messageblks ;

2.3 doesn't use messageblks for new messages.

--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmail[at]dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail


michael.monnerie at is

Jan 19, 2009, 6:59 AM

Post #7 of 7 (528 views)
Permalink
Re: cleaning up dbmail indexes, and using CLUSTER command on PostgreSQL with dbmail [In reply to]

On Montag 12 Januar 2009 Paul Stevens wrote:
> > CREATE UNIQUE INDEX dbmail_messageblks_1 ON dbmail_messageblks
> > (physmessage_id,messageblk_idnr) ;
> > CLUSTER dbmail_messageblks_1 ON dbmail_messageblks ;
>
> 2.3 doesn't use messageblks for new messages.

Of course I meant that for 2.2. I should have inserted the text in a
better place, it was confusing I see.

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4

_______________________________________________
DBmail mailing list
DBmail[at]dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail

DBMail users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.