Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DBMail: users

analyzed: moving messages via IMAP to another folder

 

 

DBMail users RSS feed   Index | Next | Previous | View Threaded


michael.monnerie at is

Jan 16, 2009, 4:26 AM

Post #1 of 3 (327 views)
Permalink
analyzed: moving messages via IMAP to another folder

I migrated my internal server from cyrus to dbmail 2.2 this week. I
increased postgresql logs to see what happens when you move a single
message from one mailbox to another. It's a total of 72 (!) SQL
commands: http://zmi.at/x/dbmail-move-message-2-another-folder.log

I did move the mail via dbmail. The following IMAP commands are run from
kmail when moving a single message:
593 UID FETCH 3458457 (UID RFC822.SIZE FLAGS BODY.PEEK[])
594 LIST "" "zzz_OLD/trash/trash 2007"
595 UID COPY 3458453 "zzz_OLD/trash/trash 2007"
596 UID STORE 3458453 "zzz_OLD/trash/trash 2007"
597 LIST "" "zzz_OLD/trash/trash 2007"
598 NOOP
599 SELECT "zzz_OLD/trash/trash 2007"
600 UID FETCH 3612741:* (UID RFC822.SIZE FLAGS ENVELOPE
BODY.PEEK[HEADER.FIELDS (REFERENCES)])
601 NOOP

Maybe this could be done a bit nicer, faster, both in kmail and dbmail.
I only looked into the SQL part. For example, the third command is a
SELECT COUNT(*) AS nr_children FROM dbmail_mailboxes WHERE owner_idnr =
1527 AND name ILIKE 'zzz\\_OLD/trash/%'
and you should NEVER EVER do count(*), because that makes the database
read the whole row of data. A "SELECT COUNT(1)" is much faster. There
are other COUNT(*) commands later, they could be optimized as well.

Then there are 2 commands directly neighboured that ask the same table,
but only a single data each. Could be made one SELECT:
SELECT owner_idnr FROM dbmail_mailboxes WHERE mailbox_idnr = 3305
SELECT no_select FROM dbmail_mailboxes WHERE mailbox_idnr = 3305

Then the same for the message_idnr:
SELECT message_idnr+1 FROM dbmail_messages WHERE mailbox_idnr=3305 ORDER
BY message_idnr DESC LIMIT 1
SELECT message_idnr FROM dbmail_messages WHERE mailbox_idnr = 3305 AND
status IN (0,1) ORDER BY message_idnr

And why is there a SELECT with "message_idnr BETWEEN x AND y"? Doesn't
dbmail know the message_idnr exactly at that time?
SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag,
recent_flag, TO_CHAR(internal_date, 'YYYY-MM-DD HH24:MI:SS' ), rfcsize,
message_idnr FROM dbmail_messages msg, dbmail_physmessage pm WHERE pm.id
= msg.physmessage_id AND message_idnr BETWEEN 3458410 AND 3596775 AND
mailbox_idnr = 3305 AND status IN (0,1,2) ORDER BY message_idnr ASC

But I don't know if I'm wasting my time as Paul is on 2.3 and nobody
else is coding for dbmail. I'm willing to help optimize the SQL part,
maybe there's a coder out there who could create patches for Paul? Or
would it be better I test things with 2.3? Would Paul want to have that
done? Or is it too early and you still need to code "the big thing" and
there's no time to optimize anyway, and commands will still be changed
greatly?

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4




_______________________________________________
DBmail mailing list
DBmail[at]dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail


paul at nfg

Jan 16, 2009, 4:48 AM

Post #2 of 3 (298 views)
Permalink
Re: analyzed: moving messages via IMAP to another folder [In reply to]

Michael Monnerie wrote:
> I migrated my internal server from cyrus to dbmail 2.2 this week. I
> increased postgresql logs to see what happens when you move a single
> message from one mailbox to another. It's a total of 72 (!) SQL
> commands: http://zmi.at/x/dbmail-move-message-2-another-folder.log

well, 72 is not that bad :-)

>
> I did move the mail via dbmail. The following IMAP commands are run from
> kmail when moving a single message:
> 593 UID FETCH 3458457 (UID RFC822.SIZE FLAGS BODY.PEEK[])
> 594 LIST "" "zzz_OLD/trash/trash 2007"
> 595 UID COPY 3458453 "zzz_OLD/trash/trash 2007"
> 596 UID STORE 3458453 "zzz_OLD/trash/trash 2007"
> 597 LIST "" "zzz_OLD/trash/trash 2007"
> 598 NOOP
> 599 SELECT "zzz_OLD/trash/trash 2007"
> 600 UID FETCH 3612741:* (UID RFC822.SIZE FLAGS ENVELOPE
> BODY.PEEK[HEADER.FIELDS (REFERENCES)])
> 601 NOOP
>
> Maybe this could be done a bit nicer, faster, both in kmail and dbmail.

Maybe, maybe not some much better.

> I only looked into the SQL part. For example, the third command is a
> SELECT COUNT(*) AS nr_children FROM dbmail_mailboxes WHERE owner_idnr =
> 1527 AND name ILIKE 'zzz\\_OLD/trash/%'
> and you should NEVER EVER do count(*), because that makes the database
> read the whole row of data. A "SELECT COUNT(1)" is much faster. There
> are other COUNT(*) commands later, they could be optimized as well.


dbmail=# explain analyze select count(*) from dbmail_messages;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------
Aggregate (cost=14.38..14.39 rows=1 width=0) (actual time=0.009..0.010
rows=1 loops=1)
-> Seq Scan on dbmail_messages (cost=0.00..13.50 rows=350 width=0)
(actual time=0.002..0.002 rows=0 loops=1)
Total runtime: 0.068 ms
(3 rows)

dbmail=# explain analyze select count(1) from dbmail_messages;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------
Aggregate (cost=14.38..14.39 rows=1 width=0) (actual time=0.008..0.009
rows=1 loops=1)
-> Seq Scan on dbmail_messages (cost=0.00..13.50 rows=350 width=0)
(actual time=0.002..0.002 rows=0 loops=1)
Total runtime: 0.065 ms
(3 rows)


looks the same to me. Ok, this is on a small database (development
setup), but how do they compare on a production database?



>
> Then there are 2 commands directly neighboured that ask the same table,
> but only a single data each. Could be made one SELECT:
> SELECT owner_idnr FROM dbmail_mailboxes WHERE mailbox_idnr = 3305
> SELECT no_select FROM dbmail_mailboxes WHERE mailbox_idnr = 3305

Two queries for accessing two distinct attributes. Maybe these can be
combined, but in general: creating maintainable code that is *not*
riddled with flavour of the day optimizations does have it's advantages.


> Then the same for the message_idnr:
> SELECT message_idnr+1 FROM dbmail_messages WHERE mailbox_idnr=3305 ORDER
> BY message_idnr DESC LIMIT 1
> SELECT message_idnr FROM dbmail_messages WHERE mailbox_idnr = 3305 AND
> status IN (0,1) ORDER BY message_idnr

two completely different queries: the first for UIDNEXT, the latter for
retrieving a list of all message ids.

>
> And why is there a SELECT with "message_idnr BETWEEN x AND y"? Doesn't
> dbmail know the message_idnr exactly at that time?
> SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag,
> recent_flag, TO_CHAR(internal_date, 'YYYY-MM-DD HH24:MI:SS' ), rfcsize,
> message_idnr FROM dbmail_messages msg, dbmail_physmessage pm WHERE pm.id
> = msg.physmessage_id AND message_idnr BETWEEN 3458410 AND 3596775 AND
> mailbox_idnr = 3305 AND status IN (0,1,2) ORDER BY message_idnr ASC

explained by commands such as this:
600 UID FETCH 3612741:* (UID RFC822.SIZE FLAGS ENVELOPE
where a range of messages needs to be accessed.

>
> But I don't know if I'm wasting my time as Paul is on 2.3 and nobody
> else is coding for dbmail. I'm willing to help optimize the SQL part,
> maybe there's a coder out there who could create patches for Paul? Or
> would it be better I test things with 2.3? Would Paul want to have that
> done? Or is it too early and you still need to code "the big thing" and
> there's no time to optimize anyway, and commands will still be changed
> greatly?

Don't waste your time on 2.2 unless you're willing to take over
maintenance of that branch. And no: I will not release a 2.2 update that
has the remotest chance of breaking backward compatibility.

A lot of this stuff is still there in 2.3, albeit in different places,
using different (libzdb) interfaces, etc. But in general the imap code
hasn't changed fundamentally, iirc. Just a lot of cleanups, no real
optimizations.

And no: there are no major changes in the pipeline. Dbmail is not making
me any money. I just want to stabilize the current 2.3 codebase and get
2.4 out. After that, well, we'll see.


--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmail[at]dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail


niki at guldbrand

Jan 16, 2009, 4:02 PM

Post #3 of 3 (297 views)
Permalink
Re: analyzed: moving messages via IMAP to another folder [In reply to]

On Fri, 2009-01-16 at 13:48 +0100, Paul J Stevens wrote:
>
> dbmail=# explain analyze select count(*) from dbmail_messages;
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=14.38..14.39 rows=1 width=0) (actual time=0.009..0.010
> rows=1 loops=1)
> -> Seq Scan on dbmail_messages (cost=0.00..13.50 rows=350 width=0)
> (actual time=0.002..0.002 rows=0 loops=1)
> Total runtime: 0.068 ms
> (3 rows)
>
> dbmail=# explain analyze select count(1) from dbmail_messages;
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=14.38..14.39 rows=1 width=0) (actual time=0.008..0.009
> rows=1 loops=1)
> -> Seq Scan on dbmail_messages (cost=0.00..13.50 rows=350 width=0)
> (actual time=0.002..0.002 rows=0 loops=1)
> Total runtime: 0.065 ms
> (3 rows)
>
>
> looks the same to me. Ok, this is on a small database (development
> setup), but how do they compare on a production database?
>

Here's a run on my system of the 2 you posted, 1 original , and 1
additional one:

system=# explain analyze select count(*) from dbmail_messages;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7955.91..7955.93 rows=1 width=0) (actual
time=728.540..728.541 rows=1 loops=1)
-> Seq Scan on dbmail_messages (cost=0.00..7216.13 rows=295913
width=0) (actual time=0.011..374.648 rows=298157 loops=1)
Total runtime: 728.587 ms
(3 rows)

system=# explain analyze select count(1) from dbmail_messages;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7955.91..7955.93 rows=1 width=0) (actual
time=732.474..732.475 rows=1 loops=1)
-> Seq Scan on dbmail_messages (cost=0.00..7216.13 rows=295913
width=0) (actual time=0.012..374.017 rows=298157 loops=1)
Total runtime: 732.519 ms
(3 rows)

system=# explain analyze SELECT message_idnr+1 FROM dbmail_messages
WHERE mailbox_idnr=2123212 ORDER BY message_idnr DESC LIMIT 1;

QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..20.91 rows=1 width=8) (actual time=133.064..133.064
rows=0 loops=1)
-> Index Scan Backward using dbmail_messages_pkey on dbmail_messages
(cost=0.00..23626.15 rows=1130 width=8) (actual time=133.060..133.060
rows=0 loops=1)
Filter: (mailbox_idnr = 2123212)
Total runtime: 133.096 ms
(4 rows)

system=# explain analyze SELECT MAX(message_idnr)+1 FROM
dbmail_messages;

QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.08..0.09 rows=1 width=0) (actual time=0.033..0.034
rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..0.08 rows=1 width=8) (actual
time=0.024..0.025 rows=1 loops=1)
-> Index Scan Backward using dbmail_messages_pkey on
dbmail_messages (cost=0.00..22883.54 rows=295913 width=8) (actual
time=0.021..0.021 rows=1 loops=1)
Filter: (message_idnr IS NOT NULL)
Total runtime: 0.067 ms
(6 rows)

This is with about 298161 rows in the table


Best regards

Niki Guldbrand
Attachments: signature.asc (0.82 KB)

DBMail users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.