Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DBMail: users
marking a mailbox as read is slow, suggesting improvement
 

Index | Next | Previous | View Flat


michael.monnerie at is

Mar 3, 2009, 3:47 PM


Views: 513
Permalink
marking a mailbox as read is slow, suggesting improvement

I just had a mailbox with 5k messages, about 4k unread, which was quite slow.
So I tested with a smaller mailbox: In kmail 1.10.3, I say
"mark whole mailbox as read". That generates these commands:

15232 UID STORE 3651501,3651521,3651555,3651557,3651579,3651706,3651745 -FLAGS.SILENT (\SEEN \ANSWERED \FLAGGED \DRAFT)
15233 UID STORE 3651501,3651521,3651555,3651557,3651579,3651706,3651745 +FLAGS.SILENT (\SEEN)

Those are the queries generated to Postgres:

2009-03-04 00:26:48 CET pid=4609 25/120872 idle: LOG: statement: SELECT message_idnr FROM dbmail_messages WHERE mailbox_idnr = 3324 AND status IN (0,1) ORDER BY message_idnr
2009-03-04 00:26:48 CET pid=4609 25/120873 idle: LOG: statement: SELECT message_idnr FROM dbmail_messages WHERE mailbox_idnr = 3324 AND status IN (0,1) ORDER BY message_idnr
2009-03-04 00:26:48 CET pid=4609 25/120874 idle: LOG: statement: SELECT permission,seen_flag,answered_flag,deleted_flag,flagged_flag,recent_flag,draft_flag FROM dbmail_mailboxes WHERE mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120875 idle: LOG: statement: SELECT 'a',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr=3324 AND (status < 2) UNION SELECT 'b',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr=3324 AND (status < 2)
AND seen_flag=1 UNION SELECT 'c',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr=3324 AND (status < 2) AND recent_flag=1
2009-03-04 00:26:48 CET pid=4609 25/120876 idle: LOG: statement: SELECT max(message_idnr)+1 FROM dbmail_messages WHERE mailbox_idnr=3324
2009-03-04 00:26:48 CET pid=4609 25/120877 idle: LOG: statement: SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag, recent_flag, TO_CHAR(internal_date, 'YYYY-MM-DD HH24:MI:SS' ), rfcsize, message_idnr FROM dbmail_
messages msg, dbmail_physmessage pm WHERE pm.id = msg.physmessage_id AND message_idnr BETWEEN 3635587 AND 3651745 AND mailbox_idnr = 3324 AND status IN (0,1,2) ORDER BY message_idnr ASC
2009-03-04 00:26:48 CET pid=4609 25/120878 idle: LOG: statement: SELECT permission,seen_flag,answered_flag,deleted_flag,flagged_flag,recent_flag,draft_flag FROM dbmail_mailboxes WHERE mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120879 idle: LOG: statement: SELECT permission,seen_flag,answered_flag,deleted_flag,flagged_flag,recent_flag,draft_flag FROM dbmail_mailboxes WHERE mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120880 idle: LOG: statement: SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag, recent_flag, TO_CHAR(internal_date, 'YYYY-MM-DD HH24:MI:SS' ), rfcsize, message_idnr FROM dbmail_
messages msg, dbmail_physmessage pm WHERE pm.id = msg.physmessage_id AND message_idnr BETWEEN 3635587 AND 3651745 AND mailbox_idnr = 3324 AND status IN (0,1,2) ORDER BY message_idnr ASC

(are all those SELECTs really neccessary until here?)

2009-03-04 00:26:48 CET pid=4609 25/120881 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr = 3651501 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120882 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr = 3651521 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120883 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr = 3651555 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120884 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr = 3651557 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120885 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr = 3651579 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120886 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr = 3651706 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120887 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr = 3651745 AND status < 2 AND mailbox_idnr = 3324
(here seems to be the 2nd commands start)
2009-03-04 00:26:48 CET pid=4609 25/120888 idle: LOG: statement: SELECT permission,seen_flag,answered_flag,deleted_flag,flagged_flag,recent_flag,draft_flag FROM dbmail_mailboxes WHERE mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120889 idle: LOG: statement: SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag, recent_flag, TO_CHAR(internal_date, 'YYYY-MM-DD HH24:MI:SS' ), rfcsize, message_idnr FROM dbmail_
messages msg, dbmail_physmessage pm WHERE pm.id = msg.physmessage_id AND message_idnr BETWEEN 3635587 AND 3651745 AND mailbox_idnr = 3324 AND status IN (0,1,2) ORDER BY message_idnr ASC
(again, why are those 2 SELECTs needed?)
2009-03-04 00:26:48 CET pid=4609 25/120890 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=1 WHERE message_idnr = 3651501 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120891 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=1 WHERE message_idnr = 3651521 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120892 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=1 WHERE message_idnr = 3651555 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120893 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=1 WHERE message_idnr = 3651557 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120894 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=1 WHERE message_idnr = 3651579 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120895 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=1 WHERE message_idnr = 3651706 AND status < 2 AND mailbox_idnr = 3324
2009-03-04 00:26:48 CET pid=4609 25/120896 idle: LOG: statement: UPDATE dbmail_messages SET recent_flag=0, seen_flag=1 WHERE message_idnr = 3651745 AND status < 2 AND mailbox_idnr = 3324

Now I know that development to dbmail-2.2 has stopped, but I guess the
process is not a lot different in 2.3 so I want to suggest two improvements:

1) Currently for *every single* message a separate UPDATE is done,
this could easily be replaced by a multi-message UPDATE. And I
guess lots of SELECTs could be dropped also.

dbmail=> EXPLAIN UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr IN (3651501,3651521,3651555) AND status < 2 AND mailbox_idnr = 3324;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on dbmail_messages (cost=19.22..23.24 rows=1 width=67)
Recheck Cond: ((mailbox_idnr = 3324) AND (message_idnr = ANY ('{3651501,3651521,3651555}'::bigint[])))
Filter: (status < 2)
-> BitmapAnd (cost=19.22..19.22 rows=1 width=0)
-> Bitmap Index Scan on dbmail_messages_1 (cost=0.00..6.17 rows=253 width=0)
Index Cond: (mailbox_idnr = 3324)
-> Bitmap Index Scan on dbmail_messages_pkey (cost=0.00..12.80 rows=3 width=0)
Index Cond: (message_idnr = ANY ('{3651501,3651521,3651555}'::bigint[]))
(8 Zeilen)

But I found that leaving "AND mailbox_idnr = 3324" is not needed anyway, and makes the query slower,
so let it away:

dbmail=> EXPLAIN UPDATE dbmail_messages SET recent_flag=0, seen_flag=0, answered_flag=0, flagged_flag=0, draft_flag=0 WHERE message_idnr IN (3651501,3651521,3651555) AND status < 2;
QUERY PLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on dbmail_messages (cost=12.80..24.56 rows=3 width=67)
Recheck Cond: (message_idnr = ANY ('{3651501,3651521,3651555}'::bigint[]))
Filter: (status < 2)
-> Bitmap Index Scan on dbmail_messages_pkey (cost=0.00..12.80 rows=3 width=0)
Index Cond: (message_idnr = ANY ('{3651501,3651521,3651555}'::bigint[]))
(5 Zeilen)

2)
When an IMAP command "UID STORE" arrives, could dbmail wait
to see if a 2nd "UID STORE" arrives next? Currently there are
2 updates, first setting all flags=0, later seen=1. That could
be grouped. Doesn't matter with only some users, but once
you have 500+ concurrent users, or WAL logs (which generate
a record for every single transaction), there's a big difference.


mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4

_______________________________________________
DBmail mailing list
DBmail[at]dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Subject User Time
marking a mailbox as read is slow, suggesting improvement michael.monnerie at is Mar 3, 2009, 3:47 PM
    Re: marking a mailbox as read is slow, suggesting improvement paul at nfg Mar 4, 2009, 1:48 PM
        Re: marking a mailbox as read is slow, suggesting improvement michael.monnerie at is Mar 9, 2009, 3:12 AM
            Re: marking a mailbox as read is slow, suggesting improvement paul at nfg Mar 10, 2009, 1:21 AM

  Index | Next | Previous | View Flat
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.