Gossamer Forum
Home : General : Perl Programming :

pattern matching prblem

Quote Reply
pattern matching prblem
Hi,

I'm trying to get rid of all
Code:
<br>
in between
Code:
<pre>
and
Code:
</pre>
tags...

Now why doesn't this work:
Code:
$rec{'Text'} =~ s|<pre>(.*?)<br>(.*?)</pre>|<pre>$1 $2</pre>|gis;

Thanks!
Quote Reply
Re: [Lex] pattern matching prblem In reply to
Hi. You need to escape things like < and >. Something like this should work;

Code:
$rec{'Text'} =~ s|\<pre\>(.*?)\<br\>(.*?)\</pre\>|<pre>$1 $2</pre>|gis;

Cheers

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [Andy] pattern matching prblem In reply to
In Reply To:
Hi. You need to escape things like < and >. Something like this should work;

Code:
$rec{'Text'} =~ s|\<pre\>(.*?)\<br\>(.*?)\</pre\>|<pre>$1 $2</pre>|gis;

Thanks Andy, but it doesn't. And just above it, I have:

Code:
$rec{'Text'} =~ s|</blokrechts>|</span>|gis;
$rec{'Text'} =~ s|<pre>(.*?)<br>(.*?)</pre>|<pre>$1 $2</pre>|gis;

The first one works fine so I guess it's something to do with the way I try to get rid of the <br>'s...
Quote Reply
Re: [Lex] pattern matching prblem In reply to
Trust me... you need to escape the <>, just like you would with the [] or . , or else it will be treated as non-charachter, and instead of being a direct match, it will be taken as its regex equivelant.

What if you change the ending to;

Code:
\Q<pre>$1 $2</pre>

or

Code:
\<pre\>$1 $2\</pre\>

?

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [Andy] pattern matching prblem In reply to
Well it still doesn't work then. I've figured out where the problem is now, but don't know how to solve it.

I'll paste a bit of html where the <br>'s should be taken away, the problem are the line endings etc.

I tried doing this:

in stead of:

Code:
$rec{'Text'} =~ s%<pre>(.*?)<br>(.*?)</pre>%<pre>$1 $2</pre>%gim;

I tried:

Code:
$rec{'Text'} =~ s%<pre>((.|\n)*?)<br>((.|\n)*?)</pre>%<pre>$1 $2</pre>%gim;

But that would just erase everything between <pre> and </pre> in the next example:

Code:
<br><b>Medische reden WAO-uitkering, in percentages</b>
<br><pre>
Turken Marokkanen Nederlanders
<br>Klachten aan het bewegingsapparaat 36 35 36
<br>Psychische klachten 23 26 27
<br>Overig 41 39 37
<br></pre>

hmmm... So what I actually want the script to do is the following:

look for <pre> and </pre> and erase all <br> that you find within it, no matter what you find. However: leave the rest!

But I don't know how to do it properly.

(still studying 'programming perl')
Thanks for your time anyway!
Quote Reply
Re: [Lex] pattern matching prblem In reply to
Goeiemorgen

Here is a solution using HTML::TokeParser::Simple, adapted from http://www.tek-tips.com/...m/pid/219/qid/861625

Code:
#!/usr/bin/perl
use strict;
use HTML::TokeParser::Simple;

my $in = q|
<br><b>Medische reden WAO-uitkering, in percentages</b>
<br><pre>
Turken Marokkanen Nederlanders
<br>Klachten aan het bewegingsapparaat 36 35 36
<br>Psychische klachten 23 26 27
<br>Overig 41 39 37
<br></pre>
|;

my $p = HTML::TokeParser::Simple->new(\$in);

my $pre = 0;

while (my $token = $p->get_token) {
$pre++ if $token->is_start_tag('pre');
$pre-- if $token->is_end_tag('pre');
next if $pre and $token->is_tag('br');
print $token->as_is;
}
The output is
Code:
<br><b>Medische reden WAO-uitkering, in percentages</b>
<br><b>Medische reden WAO-uitkering, in percentages</b>
<br><pre> Turken Marokkanen Nederlanders
Turken Marokkanen Nederlanders
Klachten aan het bewegingsapparaat 36 35 36
Psychische klachten 23 26 27
Overig 41 39 37
</pre>

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [yogi] pattern matching prblem In reply to
Goeiemorgen Yogi,

I'll study this and try to start using a parser more often. However, in this case I managed to get the folowing working (with help from a newsgroup):

Code:
$rec{'Text'} =~ s{(<pre.*?>.+?</pre>)}{
(my $rest = $1) =~ s/<br.*?>//gis;
$rest
}egis;

It's pretty safe as there is nothing more than <br> in between the <pre> and </pre> tags.

But thanks lot for your code, it's good to be told there are other directions to look at.

Bedankt,

Lex
Quote Reply
Re: [Andy] pattern matching prblem In reply to
Quote:
Trust me... you need to escape the <>, just like you would with the [] or . , or else it will be treated as non-charachter, and instead of being a direct match, it will be taken as its regex equivelant.

Mmm no you don't :) ....< and > are not meta-characters. You were right about escaping [ ] though.

Last edited by:

BlueBottle: Jul 17, 2004, 7:14 AM