Gossamer Forum
Home : General : Perl Programming :

Regex...but why?

Quote Reply
Regex...but why?
I have a bit of a problem here. In a file I'm reading, I have;

<article send-id="Cwebwire-cnet_mainU2031" itemname="wed/cz/Cwebwire-cnet_main.2031.html">

I'm trying to parse out the last little bit. I have tried;

$_ =~ m,<article send-id="(.+?)" itemname="(.+?)">,sig and $string = $2;
$_ =~ m,article send-id=(.+?) itemname=(.+?),sig and $string = $2;
$_ =~ m,<article send-id=\"(.+?)\" itemname=\"(.+?)\">,sig and $string = $2;

....and several other combinations, but all of them don't give me the right results. Does anyone know why this would not be working? The variable $_ contains several new lines in it, so I'm wondering if it has something thing to do with the fact they are on newlines, and its not finding the match.

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Regex...but why? In reply to
Code:
use HTML::TokeParser;

$p = HTML::TokeParser->new(shift);

while (my $token = $p->get_tag("article")) {
my $itemname = $token->[1]{itemname};
print $itemname;
}

- wil

Last edited by:

Wil: May 2, 2003, 2:19 AM
Quote Reply
Re: [Andy] Regex...but why? In reply to
You don't need $_ =~ when you are matching against $_, you can leave it out.

You also don't need "g" in your regex and you are only matching one string and so "g" is redundant.

Thirdly I'm not sure why you are parenthesizing the send-id field and then doing nothing with it.

Fourthly, I think I've mentioned before that you only need to quote meta-characters - quotes are not meta characters so you don't need \"

Fifthly, you are using (.+?) which could be sped up by using ([^"]+) as it does not need to look ahead.

Having said all that though, the regex looks like it should work. But I'd write it as:

Code:
m|<article send-id="[^"]+" itemname="([^"]+)">|si and $string = $1;

If that doesn't work you need to look closer at what is contained in $_ to check it's what you expect.
Quote Reply
Re: [Paul] Regex...but why? In reply to
Thanks Wil/Paul.

>>>You also don't need "g" in your regex and you are only matching one string and so "g" is redundant. <<<

I just put it in there, in case that was the reason why it wasn't working.

>>>Fourthly, I think I've mentioned before that you only need to quote meta-characters - quotes are not meta characters so you don't need \"
<<<

Again, I was just putting it in there, in case it was causing the regex not to work.

Thanks for the regex, it worked a charm Smile Still not really sure why my regex didn't work :(

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Regex...but why? In reply to
Actually to make it really slick you may want to get the regex to check whether the quote is preceeded by an escape, and if so ignore it, for example, if the value of "itemname" was:

itemname="Paul says \"Hello!\""

...then my regex would give the following value to $string:

Paul says \

...and the rest would be chopped off.

You could use a zero-width negative look-behind assertion for this...eg...

/(?<!\)"/

...that will match a quote in a string that is not preceeded by a \

Last edited by:

Paul: May 2, 2003, 3:30 AM