uhlar at fantomas
May 18, 2012, 4:45 AM
Post #7 of 7
>On 18/05/12 03:18, David F. Skoll wrote:
>> I looked at the regex and it seems that Perl treats är as having a
>> word boundary in the \b sense between the "ä" and the "r"
On 18.05.12 07:26, Jason Haar wrote:
>A bit OT, but is it because your perl is running under "C" locale
>instead of se? i.e. would the word boundary definition change under
>different localization contexts? Doesn't help solve the problem for you,
>but it certainly flags a potential issue with a tonne of the rules in SA...
sa would need to switch to correct locale before processing of the
e-mail to avoid this error. Setting the correct locale could be
different for different users and even for different mails.
I'm not sure if this is a way to go, although there may be single cases
where it helps.
I'm more in favor of advanced processing, watching different languages
and/or comparing matching strings for words in different languages,
e.g. FRT_SOMA misfiring for word "somar" (donkey), FRT_PENIS1 for
"penize" (money), FUZZY_CREDIT for "kredit" (credit) etc.
Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Remember half the people you know are below average.