Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DBMail: users

Improvement for dbmail_ccfield

 

 

DBMail users RSS feed   Index | Next | Previous | View Threaded


michael.monnerie at is

Dec 16, 2008, 5:47 PM

Post #1 of 5 (1107 views)
Permalink
Improvement for dbmail_ccfield

Just now I looked into a dump and found this:

COPY dbmail_ccfield (physmessage_id, id, ccname, ccaddr) FROM stdin;
1666579 246607 markus.zauner [at] jwt
1666579 246608 tamara [at] balag
1666579 246609 Armin Mitterbauer amitterbauer [at] gmail

There seem to be white spaces at the beginning of "ccname", shouldn't
that be removed? Because
1) it's useless information
2) the same e-mail adress "tamara [at] balag" could be stored ten times,
each with a different number of blanks in the beginning, thus wasting
disk space and access times.

Maybe this goes for other tables too?

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net Key-ID: 1C1209B4
Attachments: signature.asc (0.19 KB)


paul at nfg

Dec 17, 2008, 12:27 AM

Post #2 of 5 (1022 views)
Permalink
Re: Improvement for dbmail_ccfield [In reply to]

Michael Monnerie wrote:
> Just now I looked into a dump and found this:
>
> COPY dbmail_ccfield (physmessage_id, id, ccname, ccaddr) FROM stdin;
> 1666579 246607 markus.zauner [at] jwt
> 1666579 246608 tamara [at] balag
> 1666579 246609 Armin Mitterbauer amitterbauer [at] gmail
>
> There seem to be white spaces at the beginning of "ccname", shouldn't
> that be removed? Because
> 1) it's useless information
> 2) the same e-mail adress "tamara [at] balag" could be stored ten times,
> each with a different number of blanks in the beginning, thus wasting
> disk space and access times.

You have a point here about the whitespace, if that was the case. But it
looks to me like this is a side effect of the copy command
(tab-delimited with empty ccname fields). I don't see any whitespace in
my own ccfield table.

But the second argument won't hold, because those tables are not
single-instance. Each address for every physmessage is stored
separately, again and again. That in itself is some point of concern
because those tables (esp headervalue) grow too fast. But that is not
due to lack of whitespace stripping.

--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmail [at] dbmail
https://mailman.fastxs.nl/mailman/listinfo/dbmail


michael.monnerie at is

Dec 17, 2008, 5:21 AM

Post #3 of 5 (1032 views)
Permalink
Re: Improvement for dbmail_ccfield [In reply to]

On Mittwoch 17 Dezember 2008 Paul J Stevens wrote:
> You have a point here about the whitespace, if that was the case. But
> it looks to me like this is a side effect of the copy command
> (tab-delimited with empty ccname fields). I don't see any whitespace
> in my own ccfield table.

Ah, right, I didn't see that at 2:47 in the morning ;-)

> But the second argument won't hold, because those tables are not
> single-instance. Each address for every physmessage is stored
> separately, again and again. That in itself is some point of concern
> because those tables (esp headervalue) grow too fast. But that is not
> due to lack of whitespace stripping.

I see. Will that be better in 2.3? Or is single-instance only for the
(message|mime)(part)s? Yes, those cache tables are very large, the
dbmail_headervalue contains 750.000 entries for only 35.000 messages, a
bit of overhead I'd say. Is that table cleaned up by dbmail-util, or
would I have to manually delete stuff and it's rebuilt by dbmail-util?

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net Key-ID: 1C1209B4
Attachments: signature.asc (0.19 KB)


paul at nfg

Dec 17, 2008, 7:45 AM

Post #4 of 5 (1032 views)
Permalink
Re: Improvement for dbmail_ccfield [In reply to]

Michael Monnerie wrote:
>
> I see. Will that be better in 2.3? Or is single-instance only for the
> (message|mime)(part)s?

No, and yes.

> Yes, those cache tables are very large, the
> dbmail_headervalue contains 750.000 entries for only 35.000 messages, a
> bit of overhead I'd say. Is that table cleaned up by dbmail-util, or
> would I have to manually delete stuff and it's rebuilt by dbmail-util?

I would clean out any headervalues that don't interest you: the ones
that will most likely never be used by a imap search - which is the
primary purpose of the headervalue table.

I keep a view around for just that purpose:


CREATE VIEW headername_stats AS
select h.id AS id, h.headername AS headername,count(v.id) AS frequency
from (
dbmail_headername h
join dbmail_headervalue v on h.id = v.headername_id
) group by h.id


mysql> select * from headername_stats order by frequency desc limit 10;
+------+-----------------+-----------+
| id | headername | frequency |
+------+-----------------+-----------+
| 3159 | received | 1595230 |
| 3 | Delivered-To | 429751 |
| 5 | Subject | 339542 |
| 13 | Message-Id | 338982 |
| 7 | From | 334860 |
| 8 | Date | 334493 |
| 6 | To | 334370 |
| 25 | Content-Type | 305462 |
| 1 | Return-Path | 298227 |
| 3453 | x-virus-scanned | 226004 |
+------+-----------------+-----------+
10 rows in set (6 min 59.40 sec)

mysql> delete from dbmail_headervalue where headername_id=3159;

etc.

--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmail [at] dbmail
https://mailman.fastxs.nl/mailman/listinfo/dbmail


michael.monnerie at is

Jan 19, 2009, 8:13 AM

Post #5 of 5 (915 views)
Permalink
Re: Improvement for dbmail_ccfield [In reply to]

On Mittwoch 17 Dezember 2008 Paul J Stevens wrote:
> CREATE VIEW headername_stats AS
>  select h.id AS id, h.headername AS headername,count(v.id) AS
> frequency from (
>   dbmail_headername h
>   join dbmail_headervalue v on h.id = v.headername_id
>  ) group by h.id

I just tested this, it causes

ERROR: column "h.headername" must appear in the GROUP BY clause or be
used in an aggregate function

The solution (PostgreSQL) is to put min() to the headername:

CREATE VIEW headername_stats AS
select
h.id AS id,
min(h.headername) AS headername,
count(v.id) AS frequency
from dbmail_headername h
join dbmail_headervalue v on h.id = v.headername_id
group by h.id;

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4

_______________________________________________
DBmail mailing list
DBmail [at] dbmail
https://mailman.fastxs.nl/mailman/listinfo/dbmail

DBMail users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.