Gossamer Forum
Home : General : Perl Programming :

regex help please

Quote Reply
regex help please
Hello,

New to regex and my head is spinning more than the girl in the "Exorcist".

I'm trying to delete a bunch of html code. For example, this seems to work for deleting all the stuff between the first beginning and ending center tags:

Code:
remove_center=s/\<center>(.*?)\</center>//g;

After those center tags, there are other center tags further down the html page, with a bunch of other code and text in between, all of which I'd like to delete. So any ideas how to remove everything between the first <center> tag and say maybe the third </center> tag?

In the same vein, is there a simple regex for removing everything in a html page between the first instance of a beginning string and the first instance of another string regardless of the amount of text, lines or code between?

That is, say the first and last strings are:

Code:
<form method="post" action="something.cgi" blah blah blah>
which is the first string, and there is a whole bunch of code and text ending with the last string of:
</form>

I've tried more regex expressions than I can remember, and none work (or produce unexpected results), although it's certainly a learning experience.

Any help would be appreciated.

Thanks,
ronzo
Quote Reply
Re: [ronzo] regex help please In reply to
I'm not sure I quite understand what you are asking... but something like this?

Code:
#!/usr/bin/perl

use strict;

my $string = q|

<p>just a test</p>
<center>something here
and some more
or here</center><BR><BR>
<center>something here
and some more
or here</center>

|;

$string =~ s/\<center>(.*?)\</center>//sg;

print "Content-type: text/html \n\n";
print $string;

That should return something like;

Quote:
<p>just a test</p>
<BR><BR>

This is untested, but it should work :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!

Last edited by:

Andy: Jan 26, 2004, 8:41 AM
Quote Reply
Re: [Andy] regex help please In reply to
Hi Andy,

I was wrong... the regex I gave doesn't remove what I wanted. Here's an example of what I'm trying to do:

Code:
<center>begin html removal</center>


<table border="0" width="75%">
<tr>
<td width="100%">table #1</td>
</tr>
</table>


<center>a lot more text and html tags here<br>
and here</center>

<table border="1" width="100%">
<tr>
<td width="100%">table #2</td>
</tr>
<tr>
<td width="100%">&nbsp;</td>
</tr>
</table>

<center>end html removal</center>

<p>and then from here on is stuff that will stay</p>

So basically I want to delete everything on the page from the first center tag (in red) to the final center ending tag (in red). I could do this line by line, but I'd rather just grab everything all at once and get rid of it if that's possible.

Hope that explains it better.
Thanks,
ronzo
Quote Reply
Re: [ronzo] regex help please In reply to
What about the modified version? Otherwise I'm afraid I don't have any more ideas :(

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [ronzo] regex help please In reply to
When you take out the "?" in the "match anything" expression, you tell Perl to be greedy and match as much as it can, rather than as little as it can. The "s" regular expression modifier is required in order for the match to span multiple lines. The "g" modifier is not required since we will only match once.

try this:
Code:
my $html = qq|
<center>begin html removal</center>
<table border="0" width="75%">
<tr>
<td width="100%">table #1</td>
</tr>
</table>
<center>a lot more text and html tags here<br> and here</center>
<table border="1" width="100%">
<tr>
<td width="100%">table #2</td>
</tr>
<tr>
<td width="100%">&nbsp;</td>
</tr>
</table>
<center>end html removal</center>
<p>and then from here on is stuff that will stay</p>
|;

$html =~ s,(<center>.*</center>),,s;
print $html;

Philip
------------------
Limecat is not pleased.

Last edited by:

fuzzy logic: Jan 26, 2004, 5:28 PM
Quote Reply
Re: [fuzzy logic] regex help please In reply to
Hi Philip,

Thanks, that works great!

Just picked up the O'Reilly regex book by Friedl, and found a few good web tutorials. Looks like a potentially long learning curve, but it's certainly going to be worth it, especially after seeing how powerful regex can be.

Thanks again, and thanks also to Andy.

ronzo