Gossamer Forum
Home : General : Perl Programming :

In this thread, I announce the first annual regex challenge

Quote Reply
In this thread, I announce the first annual regex challenge
DADA!

The first regex challenge is as such:

There is a variable $html, that holds a page in html. In this page, there are lots of images ("<img alt="A Fantastic Image" border=0>"). The challenge is to remove all of the images, and extract the alt text and place it in a "<h3>Alt Text</h3>\n" tag as such.

I would hasten to add that I'm not launching this challenge because I am unable to do it and would like somone else to do it for me, it's because I would like the world of perl to progress and move on.

Prize: The satisfaction of knowing that you are a perl guru.

Last edited by:

TLA: Jul 26, 2002, 1:15 AM
Quote Reply
Re: [TLA] In this thread, I announce the first annual regex challenge In reply to
Well I assume this isn't really a challenge and is some code you are looking for for one of your scripts, but nevertheless....Cool

Code:
#!/perl/bin/perl
# Paul's Demo
#====================================

use strict;
use LWP::Simple;
main();

#====================================

sub main {
#----------------------------------------------------------
# Ewww

my $page = join "", get('http://www.physics.ohio-state.edu/~swlee/Pictures/');

print "Content-type: text/html\n\n";
$page =~ s/alt=['"]*([^"']+)/print "<h3>$1</h3><br>\n"/segi;
}

It's not perfect but should work fine in general Blush

Last edited by:

Paul: Jul 26, 2002, 8:37 AM
Quote Reply
Re: [Paul] In this thread, I announce the first annual regex challenge In reply to
Paul, why do you use s///e and put a print statement in the replace expression?

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [TLA] In this thread, I announce the first annual regex challenge In reply to
I wouldn't use a regex for this purpose. I'd recommend you looking into modules such as HTML::TokeParser which have been written specifically for this kinda of task -- extract data out of HTML tags.

- wil
Quote Reply
Re: [yogi] In this thread, I announce the first annual regex challenge In reply to
It's just quicker, otherwise you'd have to push everything into an array and print it afterwards, whereas by putting the print inside the regex it will just print what we want....it may be a bit slower though.

Last edited by:

Paul: Jul 26, 2002, 2:58 AM
Quote Reply
Re: [Paul] In this thread, I announce the first annual regex challenge In reply to
But then the rest of the page is lost, I assumed the whole page should be displayed, with the replacements...

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [yogi] In this thread, I announce the first annual regex challenge In reply to
>>
But then the rest of the page is lost, I assumed the whole page should be displayed, with the replacements...
<<

I read it the other way, in that everything should be stripped other than the alt's but now I re-read it you are right.

Last edited by:

Paul: Jul 26, 2002, 2:59 AM
Quote Reply
Re: [Paul] In this thread, I announce the first annual regex challenge In reply to
Anyway, you saved him some work....

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [Wil] In this thread, I announce the first annual regex challenge In reply to
>>
I wouldn't use a regex for this purpose. I'd recommend you looking into modules such as HTML::TokeParser which have been written specifically for this kinda of task -- extract data out of HTML tags.
<<

Did someone at perlmonks tell you that? Tongue

Anyway take a look at the thread title. "regex", "challenge"....if it was "what module can do this" then maybe I'd have recommended one Wink
Quote Reply
Re: [Paul] In this thread, I announce the first annual regex challenge In reply to
I did read the thread title. But more often than not, people do not realize that there is an existing module on CPAN that will do exactly what they're looking for, better. I was simply flagging this up to the OP.

Edit: Corrected typo.

- wil

Last edited by:

Wil: Jul 26, 2002, 3:18 AM
Quote Reply
Re: [Wil] In this thread, I announce the first annual regex challenge In reply to
Hello,

I have just convened with the judge and I would like to say that he is very impressed with what you have all done in this first annual regex challenge. Well done to you all!

First Prize: Perl Guru Status goes to Paul! Well done Paul, I'm sure that a slew of Perl T.V programs and book deals will now follow. Time Magazine contacted the panel last night, but we thought that we would hold out for you until you got a real media contact interested, like Newsnight or CNN Talkback. Edit: We have them both on the phone now. They are interested in knowing your per show fee.

Second Prize: "Modules Master" Status Goes to Wil! Well done to you Wil, I am sure that your family are very proud. You can now look forward to some new lucrative contracts!.
Quote Reply
Re: [TLA] In this thread, I announce the first annual regex challenge In reply to
The judge would like to know what the -e modifier does to a regular expression. He says that he was unable to find it on perldoc or elsewhere.
Quote Reply
Re: [TLA] In this thread, I announce the first annual regex challenge In reply to
Well, the judge is not worthy to judge then.....

http://www.perldoc.com/...Quote-Like-Operators

Tell him to read the docs and come back only after he has studied a bit...

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [TLA] In this thread, I announce the first annual regex challenge In reply to
e = evaluate

Basically it means you can put perl code on the right hand side of the substitution using s///e (I needed it as I was "print"ing)
Quote Reply
Re: [Paul] In this thread, I announce the first annual regex challenge In reply to
what if someone happens to alt= some other tag?

print "<h3>$1</h3>\n" while ($html =~ m#<img[^>]+alt=["']?(.+?)["']?[^>]*>#ig);


not tested.. but that's my 1 min regex.

[edit]boo.. forgot case insensitive.

Last edited by:

widgetz: Jul 29, 2002, 7:04 PM