
paul at nfg
Jan 16, 2009, 4:48 AM
Post #2 of 3
(297 views)
Permalink
|
|
Re: analyzed: moving messages via IMAP to another folder
[In reply to]
|
|
Michael Monnerie wrote: > I migrated my internal server from cyrus to dbmail 2.2 this week. I > increased postgresql logs to see what happens when you move a single > message from one mailbox to another. It's a total of 72 (!) SQL > commands: http://zmi.at/x/dbmail-move-message-2-another-folder.log well, 72 is not that bad :-) > > I did move the mail via dbmail. The following IMAP commands are run from > kmail when moving a single message: > 593 UID FETCH 3458457 (UID RFC822.SIZE FLAGS BODY.PEEK[]) > 594 LIST "" "zzz_OLD/trash/trash 2007" > 595 UID COPY 3458453 "zzz_OLD/trash/trash 2007" > 596 UID STORE 3458453 "zzz_OLD/trash/trash 2007" > 597 LIST "" "zzz_OLD/trash/trash 2007" > 598 NOOP > 599 SELECT "zzz_OLD/trash/trash 2007" > 600 UID FETCH 3612741:* (UID RFC822.SIZE FLAGS ENVELOPE > BODY.PEEK[HEADER.FIELDS (REFERENCES)]) > 601 NOOP > > Maybe this could be done a bit nicer, faster, both in kmail and dbmail. Maybe, maybe not some much better. > I only looked into the SQL part. For example, the third command is a > SELECT COUNT(*) AS nr_children FROM dbmail_mailboxes WHERE owner_idnr = > 1527 AND name ILIKE 'zzz\\_OLD/trash/%' > and you should NEVER EVER do count(*), because that makes the database > read the whole row of data. A "SELECT COUNT(1)" is much faster. There > are other COUNT(*) commands later, they could be optimized as well. dbmail=# explain analyze select count(*) from dbmail_messages; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Aggregate (cost=14.38..14.39 rows=1 width=0) (actual time=0.009..0.010 rows=1 loops=1) -> Seq Scan on dbmail_messages (cost=0.00..13.50 rows=350 width=0) (actual time=0.002..0.002 rows=0 loops=1) Total runtime: 0.068 ms (3 rows) dbmail=# explain analyze select count(1) from dbmail_messages; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Aggregate (cost=14.38..14.39 rows=1 width=0) (actual time=0.008..0.009 rows=1 loops=1) -> Seq Scan on dbmail_messages (cost=0.00..13.50 rows=350 width=0) (actual time=0.002..0.002 rows=0 loops=1) Total runtime: 0.065 ms (3 rows) looks the same to me. Ok, this is on a small database (development setup), but how do they compare on a production database? > > Then there are 2 commands directly neighboured that ask the same table, > but only a single data each. Could be made one SELECT: > SELECT owner_idnr FROM dbmail_mailboxes WHERE mailbox_idnr = 3305 > SELECT no_select FROM dbmail_mailboxes WHERE mailbox_idnr = 3305 Two queries for accessing two distinct attributes. Maybe these can be combined, but in general: creating maintainable code that is *not* riddled with flavour of the day optimizations does have it's advantages. > Then the same for the message_idnr: > SELECT message_idnr+1 FROM dbmail_messages WHERE mailbox_idnr=3305 ORDER > BY message_idnr DESC LIMIT 1 > SELECT message_idnr FROM dbmail_messages WHERE mailbox_idnr = 3305 AND > status IN (0,1) ORDER BY message_idnr two completely different queries: the first for UIDNEXT, the latter for retrieving a list of all message ids. > > And why is there a SELECT with "message_idnr BETWEEN x AND y"? Doesn't > dbmail know the message_idnr exactly at that time? > SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag, > recent_flag, TO_CHAR(internal_date, 'YYYY-MM-DD HH24:MI:SS' ), rfcsize, > message_idnr FROM dbmail_messages msg, dbmail_physmessage pm WHERE pm.id > = msg.physmessage_id AND message_idnr BETWEEN 3458410 AND 3596775 AND > mailbox_idnr = 3305 AND status IN (0,1,2) ORDER BY message_idnr ASC explained by commands such as this: 600 UID FETCH 3612741:* (UID RFC822.SIZE FLAGS ENVELOPE where a range of messages needs to be accessed. > > But I don't know if I'm wasting my time as Paul is on 2.3 and nobody > else is coding for dbmail. I'm willing to help optimize the SQL part, > maybe there's a coder out there who could create patches for Paul? Or > would it be better I test things with 2.3? Would Paul want to have that > done? Or is it too early and you still need to code "the big thing" and > there's no time to optimize anyway, and commands will still be changed > greatly? Don't waste your time on 2.2 unless you're willing to take over maintenance of that branch. And no: I will not release a 2.2 update that has the remotest chance of breaking backward compatibility. A lot of this stuff is still there in 2.3, albeit in different places, using different (libzdb) interfaces, etc. But in general the imap code hasn't changed fundamentally, iirc. Just a lot of cleanups, no real optimizations. And no: there are no major changes in the pipeline. Dbmail is not making me any money. I just want to stabilize the current 2.3 codebase and get 2.4 out. After that, well, we'll see. -- ________________________________________________________________ Paul Stevens paul at nfg.nl NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 The Netherlands________________________________http://www.nfg.nl _______________________________________________ DBmail mailing list DBmail[at]dbmail.org https://mailman.fastxs.nl/mailman/listinfo/dbmail
|