Gossamer Forum
Home : General : Perl Programming :

pattern matching prblem

Quote Reply
pattern matching prblem
Hi,

I'm trying to get rid of all
Code:
<br>
in between
Code:
<pre>
and
Code:
</pre>
tags...

Now why doesn't this work:
Code:
$rec{'Text'} =~ s|<pre>(.*?)<br>(.*?)</pre>|<pre>$1 $2</pre>|gis;

Thanks!
Quote Reply
Re: [Lex] pattern matching prblem In reply to
Hi. You need to escape things like < and >. Something like this should work;

Code:
$rec{'Text'} =~ s|\<pre\>(.*?)\<br\>(.*?)\</pre\>|<pre>$1 $2</pre>|gis;

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] pattern matching prblem In reply to
In Reply To:
Hi. You need to escape things like < and >. Something like this should work;

Code:
$rec{'Text'} =~ s|\<pre\>(.*?)\<br\>(.*?)\</pre\>|<pre>$1 $2</pre>|gis;

Thanks Andy, but it doesn't. And just above it, I have:

Code:
$rec{'Text'} =~ s|</blokrechts>|</span>|gis;
$rec{'Text'} =~ s|<pre>(.*?)<br>(.*?)</pre>|<pre>$1 $2</pre>|gis;

The first one works fine so I guess it's something to do with the way I try to get rid of the <br>'s...
Quote Reply
Re: [Lex] pattern matching prblem In reply to
Trust me... you need to escape the <>, just like you would with the [] or . , or else it will be treated as non-charachter, and instead of being a direct match, it will be taken as its regex equivelant.

What if you change the ending to;

Code:
\Q<pre>$1 $2</pre>

or

Code:
\<pre\>$1 $2\</pre\>

?

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] pattern matching prblem In reply to
Well it still doesn't work then. I've figured out where the problem is now, but don't know how to solve it.

I'll paste a bit of html where the <br>'s should be taken away, the problem are the line endings etc.

I tried doing this:

in stead of:

Code:
$rec{'Text'} =~ s%<pre>(.*?)<br>(.*?)</pre>%<pre>$1 $2</pre>%gim;

I tried:

Code:
$rec{'Text'} =~ s%<pre>((.|\n)*?)<br>((.|\n)*?)</pre>%<pre>$1 $2</pre>%gim;

But that would just erase everything between <pre> and </pre> in the next example:

Code:
<br><b>Medische reden WAO-uitkering, in percentages</b>
<br><pre>
Turken Marokkanen Nederlanders
<br>Klachten aan het bewegingsapparaat 36 35 36
<br>Psychische klachten 23 26 27
<br>Overig 41 39 37
<br></pre>

hmmm... So what I actually want the script to do is the following:

look for <pre> and </pre> and erase all <br> that you find within it, no matter what you find. However: leave the rest!

But I don't know how to do it properly.

(still studying 'programming perl')
Thanks for your time anyway!
Quote Reply
Re: [Lex] pattern matching prblem In reply to
Goeiemorgen

Here is a solution using HTML::TokeParser::Simple, adapted from http://www.tek-tips.com/...m/pid/219/qid/861625

Code:
#!/usr/bin/perl
use strict;
use HTML::TokeParser::Simple;

my $in = q|
<br><b>Medische reden WAO-uitkering, in percentages</b>
<br><pre>
Turken Marokkanen Nederlanders
<br>Klachten aan het bewegingsapparaat 36 35 36
<br>Psychische klachten 23 26 27
<br>Overig 41 39 37
<br></pre>
|;

my $p = HTML::TokeParser::Simple->new(\$in);

my $pre = 0;

while (my $token = $p->get_token) {
$pre++ if $token->is_start_tag('pre');
$pre-- if $token->is_end_tag('pre');
next if $pre and $token->is_tag('br');
print $token->as_is;
}
The output is
Code:
<br><b>Medische reden WAO-uitkering, in percentages</b>
<br><b>Medische reden WAO-uitkering, in percentages</b>
<br><pre> Turken Marokkanen Nederlanders
Turken Marokkanen Nederlanders
Klachten aan het bewegingsapparaat 36 35 36
Psychische klachten 23 26 27
Overig 41 39 37
</pre>

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [yogi] pattern matching prblem In reply to
Goeiemorgen Yogi,

I'll study this and try to start using a parser more often. However, in this case I managed to get the folowing working (with help from a newsgroup):

Code:
$rec{'Text'} =~ s{(<pre.*?>.+?</pre>)}{
(my $rest = $1) =~ s/<br.*?>//gis;
$rest
}egis;

It's pretty safe as there is nothing more than <br> in between the <pre> and </pre> tags.

But thanks lot for your code, it's good to be told there are other directions to look at.

Bedankt,

Lex
Quote Reply
Re: [Andy] pattern matching prblem In reply to
Quote:
Trust me... you need to escape the <>, just like you would with the [] or . , or else it will be treated as non-charachter, and instead of being a direct match, it will be taken as its regex equivelant.

Mmm no you don't :) ....< and > are not meta-characters. You were right about escaping [ ] though.

Last edited by:

BlueBottle: Jul 17, 2004, 7:14 AM