Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DBMail: dev

sort-subject: wrong table, null strings

 

 

DBMail dev RSS feed   Index | Next | Previous | View Threaded


talanchor at mail

Aug 5, 2013, 3:11 AM

Post #1 of 7 (40 views)
Permalink
sort-subject: wrong table, null strings

In the latest version sorting order is changed to "sortfield", but table (view) is left the same: "subjectfield".
Probably it should be changed to "sortfield" too.

else if ( MATCH(key, "subject") ) {
    _append_join(value->table, "subjectfield"); // --> _append_join(value->table, "sortfield");
    _append_sort(value->order, "sortfield", reverse);
    (*idx)++;
}

Also according RFC 5256: "If the associated RFC-822 header for a particular
criterion is absent, it is treated as the empty string. The empty
string always collates before non-empty strings."
But with PostgreSQL as backend db dbmail puts NULL strings always at the end of the resulting set.
To fix this we may apply COALESCE function on sortfield:
    _append_sort(value->order, "COALESCE(sortfield, '')", reverse);

It helps to pass imaptest's sort-subject test a bit further.

The only issue left is that PostgreSQL puts header string "_" before all other non empty strings, whereas imaptest puts it at the end after all headers in test.
I haven't found out how to fix this yet (maybe it's imaptest's issue).


paul at nfg

Aug 5, 2013, 4:11 AM

Post #2 of 7 (39 views)
Permalink
Re: sort-subject: wrong table, null strings [In reply to]

On 08/05/2013 12:11 PM, . . wrote:
> In the latest version sorting order is changed to "sortfield", but table
> (view) is left the same: "subjectfield".
> Probably it should be changed to "sortfield" too.

No the view was not left unchanged.

If you did a full configure/make/make install, dbmail will automatically
run any necessary schema migrations.

You can tell be looking at the tables. You should have a
dbmail_upgrade_steps table, and the dbmail_subjectfield view now also
contains a sortfield field.

> Also according RFC 5256: "If the associated RFC-822
> <http://tools.ietf.org/html/rfc822> header for a particular criterion is
> absent, it is treated as the empty string. The empty string always
> collates before non-empty strings."

Correct. That is why the sort-subject tests still fail.

> But with PostgreSQL as backend db dbmail puts NULL strings always at the
> end of the resulting set.
> To fix this we may apply COALESCE function on sortfield:
> _append_sort(value->order, "COALESCE(sortfield, '')", reverse);

That is one way to do it, but DBMail should always insert an empty
string for messages that don't have a subject header.

> It helps to pass imaptest's sort-subject test a bit further.
>
> The only issue left is that PostgreSQL puts header string "_" before all
> other non empty strings, whereas imaptest puts it at the end after all
> headers in test.
> I haven't found out how to fix this yet (maybe it's imaptest's issue).

It looks like imaptest is at fault here. Doing a simple test with
sort(1) confirms postgresql's behavior.

$~> cat > /tmp/test.txt << EOF
a_a
a
_
a a
EOF
$~> cat /tmp/test.txt|sort -f

_
a
a a
a_a
$~>

But maybe such characters have to be treated differently for some reason.
--
________________________________________________________________
Paul J Stevens pjstevns @ gmail, twitter, skype, linkedin

* Premium Hosting Services and Web Application Consultancy *

www.nfg.nl/info [at] nfg/+31.85.877.99.97
________________________________________________________________
_______________________________________________
Dbmail-dev mailing list
Dbmail-dev [at] dbmail
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev


talanchor at mail

Aug 5, 2013, 5:11 AM

Post #3 of 7 (38 views)
Permalink
Re: sort-subject: wrong table, null string s [In reply to]

I've updated db schema, thanks for the tip.

It seams like dbmail doesn't insert empty string. There is no such string in dbmail_headervalue table after passing sort-subject test (all other values are there). So COALESCE workaround may be of some help here.

I guess I figured out why sorting fails for "_". It's a locale-specific issue, and when I exported LC_ALL=C sort output was like this:
 a
a a
a_a
_

Unfortunately it's not possible to change existing PostgreSQL db's locale, it can be set only while creating db (initdb --locale=C).
But whether C-locale is required or not is not clear. RFC says: "the SORT and THREAD
extensions MUST collate strings according to the i;unicode-casemap
collation described in [ UNICASEMAP ]."


Понедельник, 5 августа 2013, 13:11 +02:00 от Paul J Stevens <paul [at] nfg>:
>On 08/05/2013 12:11 PM, . . wrote:
>> In the latest version sorting order is changed to "sortfield", but table
>> (view) is left the same: "subjectfield".
>> Probably it should be changed to "sortfield" too.
>
>No the view was not left unchanged.
>
>If you did a full configure/make/make install, dbmail will automatically
>run any necessary schema migrations.
>
>You can tell be looking at the tables. You should have a
>dbmail_upgrade_steps table, and the dbmail_subjectfield view now also
>contains a sortfield field.
>
>> Also according RFC 5256: "If the associated RFC-822
>> < http://tools.ietf.org/html/rfc822 > header for a particular criterion is
>> absent, it is treated as the empty string. The empty string always
>> collates before non-empty strings."
>
>Correct. That is why the sort-subject tests still fail.
>
>> But with PostgreSQL as backend db dbmail puts NULL strings always at the
>> end of the resulting set.
>> To fix this we may apply COALESCE function on sortfield:
>> _append_sort(value->order, "COALESCE(sortfield, '')", reverse);
>
>That is one way to do it, but DBMail should always insert an empty
>string for messages that don't have a subject header.
>
>> It helps to pass imaptest's sort-subject test a bit further.
>>
>> The only issue left is that PostgreSQL puts header string "_" before all
>> other non empty strings, whereas imaptest puts it at the end after all
>> headers in test.
>> I haven't found out how to fix this yet (maybe it's imaptest's issue).
>
>It looks like imaptest is at fault here. Doing a simple test with
>sort(1) confirms postgresql's behavior.
>
>$~> cat > /tmp/test.txt << EOF
>a_a
> a
>_
>a a
>EOF
>$~> cat /tmp/test.txt|sort -f
>
>_
> a
>a a
>a_a
>$~>
>
>But maybe such characters have to be treated differently for some reason.
>--
>________________________________________________________________
>Paul J Stevens pjstevns @ gmail, twitter, skype, linkedin
>
>  * Premium Hosting Services and Web Application Consultancy *
>
>            www.nfg.nl/info [at] nfg/+31.85.877.99.97
>________________________________________________________________


--
. .


talanchor at mail

Aug 5, 2013, 8:22 AM

Post #4 of 7 (36 views)
Permalink
Re: sort-subject: wrong table, null string s [In reply to]

IMAP sort requires input string collation according to the i;unicode-casemap, which describes string conversion to "titlecased
canonicalized UTF-8".
Conversion consists of sequential conversion of string's unicode codepoints to "titlecase property" (which is normally the same as the
uppercase property according to RFC).
This conversion shouldn't be locale-dependant, so I came up with this fix (works with PostgreSQL, hasn't been tested with other dbs):

    else if ( MATCH(key, "subject") ) {
        _append_join(value->table, "subjectfield");
        _append_sort(value->order, "UPPER(COALESCE(sortfield, '')) COLLATE\"C\"", reverse);
        (*idx)++;
    }

COLLATE "C" is necessary because of locale issues (database could be created with different locales, and PostgreSQL's sorting depends on the initial locale).

It works fine except that null strings won't sort correctly in reversed order (they mess with empty strings).


talanchor at mail

Aug 5, 2013, 8:41 AM

Post #5 of 7 (36 views)
Permalink
Re: sort-subject: wrong table, null string s [In reply to]

Same code with version for MySQL:

    else if ( MATCH(key, "subject") ) {
        _append_join(value->table, _subjectfield);
        if (MATCH(db_params.driver, "postgresql")) {
            _append_sort(value->order, "UPPER(COALESCE(sortfield, '')) COLLATE\"C\"", reverse);
        } else if (MATCH(db_params.driver, "mysql")) {
            _append_sort(value->order, "UPPER(COALESCE(sortfield, '')) COLLATE latin1_general_ci", reverse);
        } else {
            _append_sort(value->order, "UPPER(COALESCE(sortfield, ''))", reverse);
        }
        (*idx)++;
    }



Понедельник, 5 августа 2013, 19:22 +04:00 от ". ." <talanchor [at] mail>:
>IMAP sort requires input string collation according to the i;unicode-casemap, which describes string conversion to "titlecased
canonicalized UTF-8".
>Conversion consists of sequential conversion of string's unicode codepoints to "titlecase property" (which is normally the same as the
uppercase property according to RFC).
>This conversion shouldn't be locale-dependant, so I came up with this fix (works with PostgreSQL, hasn't been tested with other dbs):
>
>    else if ( MATCH(key, "subject") ) {
>        _append_join(value->table, "subjectfield");
>        _append_sort(value->order, "UPPER(COALESCE(sortfield, '')) COLLATE\"C\"", reverse);
>        (*idx)++;
>    }
>
>COLLATE "C" is necessary because of locale issues (database could be created with different locales, and PostgreSQL's sorting depends on the initial locale).
>
>It works fine except that null strings won't sort correctly in reversed order (they mess with empty strings).
>_______________________________________________
>Dbmail-dev mailing list
>Dbmail-dev [at] dbmail
>http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev
>


--
. .


paul at nfg

Aug 6, 2013, 9:26 AM

Post #6 of 7 (30 views)
Permalink
Re: sort-subject: wrong table, null strings [In reply to]

On 08/05/2013 05:41 PM, . . wrote:
> Same code with version for MySQL:

Thanks for that.

Can you please:

- try to provide git patches or a pull request on github?
- stop sending messages to my private address?

(I never reply to DM concerning dbmail unless it concerns sensitive
information, or you're a paying client)

About sorting collation:

I'm kind of reluctant to enforce collation on subject sort. I will
surprise users that know what collation their database is supposed to
use, only to find out we override that - and only in one particular
situation.

Personally I don't care if we 'break' RFC in this respect, especially
since I'm not sure forcing a latin1 collation won't break sorting in
subtle cases not covered by imaptest. As long as sorting is consistent
and predictable, that is.

But if you can provide a patch that covers all three main database
backends - sorry oracle - I'm happy to merge it. We can always revert it
if it causes more problems than it solves :-)


--
________________________________________________________________
Paul J Stevens pjstevns @ gmail, twitter, skype, linkedin

* Premium Hosting Services and Web Application Consultancy *

www.nfg.nl/info [at] nfg/+31.85.877.99.97
________________________________________________________________
_______________________________________________
Dbmail-dev mailing list
Dbmail-dev [at] dbmail
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev


talanchor at mail

Aug 7, 2013, 1:23 AM

Post #7 of 7 (28 views)
Permalink
Re: sort-subject: wrong table, null string s [In reply to]

>>> try to provide git patches or a pull request on github?
Unfortunately I can't provide patch for all three backends, and I doubt that even solution for PostgreSQL is ideal.
But if I find a way to fix this problem completely I'll provide a patch for sure.

>>> stop sending messages to my private address?
Oh, sorry about that, my bad. Won't happen again.

>>>I will surprise users that know what collation their database is supposed to
>>> use, only to find out we override that - and only in one particular
>>> situation.

Well, I think in this particular case sorting is not so predictable, as it depends on current locale, where it shouldn't.
But I agree that whilst there is no complete fix to this problem - RFC 'breaks' anyway.

<paul [at] nfg>:
>On 08/05/2013 05:41 PM, . . wrote:
>> Same code with version for MySQL:
>
>Thanks for that.
>
>Can you please:
>
>- try to provide git patches or a pull request on github?
>- stop sending messages to my private address?
>
>(I never reply to DM concerning dbmail unless it concerns sensitive
>information, or you're a paying client)
>
>About sorting collation:
>
>I'm kind of reluctant to enforce collation on subject sort. I will
>surprise users that know what collation their database is supposed to
>use, only to find out we override that - and only in one particular
>situation.
>
>Personally I don't care if we 'break' RFC in this respect, especially
>since I'm not sure forcing a latin1 collation won't break sorting in
>subtle cases not covered by imaptest. As long as sorting is consistent
>and predictable, that is.
>
>But if you can provide a patch that covers all three main database
>backends - sorry oracle - I'm happy to merge it. We can always revert it
>if it causes more problems than it solves :-)
>
>
>--
>________________________________________________________________
>Paul J Stevens pjstevns @ gmail, twitter, skype, linkedin
>
>  * Premium Hosting Services and Web Application Consultancy *
>
>            www.nfg.nl/info [at] nfg/+31.85.877.99.97
>________________________________________________________________
>_______________________________________________
>Dbmail-dev mailing list
>Dbmail-dev [at] dbmail
>http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev


--
. .

DBMail dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.