Gossamer Forum
Home : General : Perl Programming :

Extracting title...

Quote Reply
Extracting title...
Eugh, can anyone tell me why this is printing the WHOLE html page for the loop?

if (m,<title>.+?</title>,i) {
chomp;
$title = $_;
$title =~ s,<title>,,;
$title =~ s,</title>,,;
print $title . "<BR>";
}

I've been playing with different options, but none of them seem to work Frown

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Extracting title... In reply to
Can you show the sub-routine otherwise it is hard to know.

You should just be parenthesizing what you want to match though and doing $title = $1 ....that's just something I spotted in that snippet.

Last edited by:

Paul: Oct 5, 2002, 4:41 AM
Quote Reply
Re: [Paul] Extracting title... In reply to
Hi..thanks for the reply. Here is the sub;

Code:
foreach (@urls) {

my @html = get($_);

foreach (@html) {
if (m,<title>.+?</title>,i) {
chomp;
$title = $_;
$title =~ s,<title>,,;
$title =~ s,</title>,,;

}

}

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Extracting title... In reply to
Code:
foreach my $url (@urls) {
my @html = get($url);
foreach (@html) {
m,<title>(.+?)</title>,i and $title = $1;
}
}
The problem was the your $_ was referring to the whole string.

Does the get function really return an array, not a string?

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [yogi] Extracting title... In reply to
Hi Ivan..thanks for the reply, but it still isn't working :( Still getting a blank for the $title variable :-/

Any ideas?

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Extracting title... In reply to
Then maybe it is blank!!!

The following works:
Code:
#!/usr/bin/perl
use strict;
my @html = ('<title>this is the title</title>','some other text');
my $title;
foreach (@html) {
m,<title>(.+?)</title>,i and $title = $1;
}
print "$title\n";

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [yogi] Extracting title... In reply to
I think you can do:

Code:
my $page = get($url);

$page =~ m|<title>(.+?)</title>|i and $title = $1;

Instead of looping twice.

If you were to loop you'd want a last; in there after you found the title.

Last edited by:

Paul: Oct 5, 2002, 5:34 AM
Quote Reply
Re: [yogi] Extracting title... In reply to
Eugh..how stupid of me! A few lines further down, before the displaying of the data, I had;

my $title = "None";

Doh! I must have put it in their earlier, before I had the $title variable defined (to stop any errors being called from strict).

Thanks for the help guys Smile

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Paul] Extracting title... In reply to
Ok...now I'm trying to extract an email address from the HTML page...how would I go about that?

foreach (@html) {
m#(.\@.)#i and $email = $1;
}

That is what I have at the moment, but its grabbing stuff like r@d.

Anyone got a near enough fool proof email grabbing regex?

Thanks in advance Smile

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Extracting title... In reply to
Email regex's have been posted quite a few times. The simplest one is:

\S+@\S+\.\S+
Quote Reply
Re: [Paul] Extracting title... In reply to
I'm now using;

m,([a-z_]+@\S+\.[a-z]+),i and $email = $1;

That seems to work a treat :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Extracting title... In reply to
I don't think so :)

What if my address is:

paul.wilson@aol.co.uk

or

paul.wilson@something.mp3

Tongue

Last edited by:

Paul: Oct 7, 2002, 8:06 AM