Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: exim: users

catching newlines with ${sg {}{}{}}

 

 

exim users RSS feed   Index | Next | Previous | View Threaded


lehmann at cnm

Jun 17, 2008, 11:53 AM

Post #1 of 5 (325 views)
Permalink
catching newlines with ${sg {}{}{}}

Hello,

I need to extract the value "X-purgate-ID" from $spam_report:

X-purgate: Spam
X-purgate-ID: 150741::080616223818-6C9786C0-73CE72D8/2129941411-0/0-3
X-purgate-Ad: For more information about eXpurgate please visit
http://www.expurgate.net/

With real PCRE, the expression would look like this:

.*\nX-purgate-ID: (.*?)\n.*

whereas $1 would contain the id. Unfortunately, the sg expansion item
does seem to work with newlines.

It is easy to remove all lines but the first:

${sg{${sg{$spam_report}{X-purgate: }{}}}{\n.*}{}}

This returns: "Spam"

But all tries to extract the id (or furthermore remove anything before
and after) in one step failed. The only way that worked was this:

${sg{${sg{${sg{$spam_report}{X-purgate: .*\n}{}}}{X-purgate-Ad:
.*}{}}}{.*X-purgate-ID: (.*)\n.*}{\$1}}

But it looks very ugly. Any ideas, how this could be done nicer?

Kind regards
Marten

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


jh at plonk

Jun 17, 2008, 1:06 PM

Post #2 of 5 (311 views)
Permalink
Re: catching newlines with ${sg {}{}{}} [In reply to]

Marten Lehmann wrote:

> I need to extract the value "X-purgate-ID" from $spam_report:
>
> X-purgate: Spam
> X-purgate-ID: 150741::080616223818-6C9786C0-73CE72D8/2129941411-0/0-3
> X-purgate-Ad: For more information about eXpurgate please visit
> http://www.expurgate.net/

Does this work?

${extract {X-purgate-ID:} {$spam_report}}


--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


exim-users at spodhuis

Jun 17, 2008, 2:56 PM

Post #3 of 5 (309 views)
Permalink
Re: catching newlines with ${sg {}{}{}} [In reply to]

On 2008-06-17 at 20:53 +0200, Marten Lehmann wrote:
> I need to extract the value "X-purgate-ID" from $spam_report:
>
> X-purgate: Spam
> X-purgate-ID: 150741::080616223818-6C9786C0-73CE72D8/2129941411-0/0-3
> X-purgate-Ad: For more information about eXpurgate please visit
> http://www.expurgate.net/
>
> With real PCRE, the expression would look like this:

Exim uses "real" PCRE; Philip Hazel is the original author of both.

> .*\nX-purgate-ID: (.*?)\n.*
>
> whereas $1 would contain the id. Unfortunately, the sg expansion item
> does seem to work with newlines.

If you double-check the documentation on ${sg ...} then you'll see the
reminder:

----------------------------8< cut here >8------------------------------
Because all three arguments are expanded before use,
if any $ or \ characters are required in the regular expression or in the
substitution string, they have to be escaped. For example:

${sg{abcdef}{^(...)(...)\$}{\$2\$1}}

yields "defabc", and

${sg{1=A 4=D 3=C}{\N(\d+)=\N}{K\$1=}}

yields "K1=A K4=D K3=C". Note the use of "\N" to protect the contents of
the regular expression from string expansion.
----------------------------8< cut here >8------------------------------

Try:
${sg{$spam_report}{\N^.*\n\s*X-purgate-ID: (.*?)\n.*$\N}{\$1}}

Note the \s* to match the whitespace you showed above, the \N at each
end of the <regex> field and the \$1, so that $1 would be expanded by
the regex engine, instead of expanded as an Exim variable passed in to
be used in the substitution pattern.

That is, it's perfectly fine to use $acl_m_foo as the substitution, Exim
expanded that for you; so to use $1 for a regexp, you pass \$1.

Regards,
-Phil

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


lehmann at cnm

Jun 20, 2008, 5:48 AM

Post #4 of 5 (284 views)
Permalink
Re: catching newlines with ${sg {}{}{}} [In reply to]

Hello,

> Exim uses "real" PCRE; Philip Hazel is the original author of both.
>
>> .*\nX-purgate-ID: (.*?)\n.*
>>
>> whereas $1 would contain the id. Unfortunately, the sg expansion item
>> does seem to work with newlines.
>
> If you double-check the documentation on ${sg ...} then you'll see the
> reminder:

I read this, but I don't understand where is the difference wether \n is
expanded by exim to a newline (without using \N or using \\n) or using
\N so PCRE transforms \n to a newline.

> Try:
> ${sg{$spam_report}{\N^.*\n\s*X-purgate-ID: (.*?)\n.*$\N}{\$1}}

Thanks, this works. But it only works, because I know the exact format
of $spam_report. How can I tell ${sg{}} to include \n to the matching
characters of .*? I think in Perl this was done by the modifier /s.

Is there any way on exim (besides to embed Perl) to extract a value like

$id = $1 if $spam_report =~ /(^|\n)X-purgate-ID: (.*?)(\n|$)/s ?

Kind regards
Marten

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


exim-users at spodhuis

Jun 20, 2008, 8:15 PM

Post #5 of 5 (277 views)
Permalink
Re: catching newlines with ${sg {}{}{}} [In reply to]

On 2008-06-20 at 14:48 +0200, Marten Lehmann wrote:
> I read this, but I don't understand where is the difference wether \n is
> expanded by exim to a newline (without using \N or using \\n) or using
> \N so PCRE transforms \n to a newline.

In this case, not much. Just be sure to also use \$ instead of $ to
anchor the end of the regexp, etc etc. \N removes the need for one
layer of backslashes so is generally less error-prone.

> > Try:
> > ${sg{$spam_report}{\N^.*\n\s*X-purgate-ID: (.*?)\n.*$\N}{\$1}}
>
> Thanks, this works. But it only works, because I know the exact format
> of $spam_report. How can I tell ${sg{}} to include \n to the matching
> characters of .*? I think in Perl this was done by the modifier /s.

Embed the modifier at the start of the regexp with (?s) -- see manual
pages perlre(1) (for Perl's documentation) or pcrepattern(3) (comes with
later versions of pcre). The latter describes this under "INTERNAL
OPTION SETTING".

> Is there any way on exim (besides to embed Perl) to extract a value like
>
> $id = $1 if $spam_report =~ /(^|\n)X-purgate-ID: (.*?)(\n|$)/s ?

Again, adding the \s* at the start so that X-purgate-ID: doesn't need to
be at the beginning of the line:

${sg{$spam_report}{\N(?s)^(?:.*\n|)\s*X-purgate-ID:\s+([^\n]+)(?:|\n.*)$\N}{\$1}}

The main reason it's longer is because sg is short for Perl's s///g so
you need to handle the lines which _don't_ match and don't get away with
conditional setting.

That's the one long regexp approach. The closest you'll get to a
conditional is to use map-filter on lists generated by using newline as
a separator:

${map\
{<\n ${filter {<\n $spam_report}{match{$item}{\N^X-purgate-ID:\N}}}}\
{${sg{$item}{\N^[^:]+:\s*(.*)\N}{\$1}}}\
}

The straight regexp is probably faster, since you enter the regexp
engine just once.

Regards,
-Phil


--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

exim users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.