Gossamer Forum
Home : General : Perl Programming :

copy and past script

Quote Reply
copy and past script
Hi,

this is something totally different but I need the following:

Lets assume there is a html page ( with no frames, tables etc ) on the web which updates one paragraph every 5 minutes, the rest keeps the same.
I want to copy this and only this paragraph and past it into a new html file.

I looked around but I didn't find a script which can can do this or can be modified to do so.
Any hints ...

Bye Diemo
Quote Reply
Re: copy and past script In reply to
Hi,

Something like this should work:

Code:
my $begin = '<p>'
my $end = '</p>';

use LWP::Simple;
my $html = get ('http://www.yahoo.com/');
if ($html =~ /$begin(.+?)$end/o) {
print "Content is: $1";
}
else {
print "Couldn't find the info!";
}

What this does is go to www.yahoo.com, retrieve the whole html page and store it in $html. You then look for the begin and end markers of the section you want and the info will be in $1 if found.

Hope this helps,

Alex
Quote Reply
Re: copy and past script In reply to
Wow, LWP is much easier then using sockets. Should look into using it Smile

------------------
Quote Reply
Re: copy and past script In reply to
Alex,
Would there be a way to modify this so that it could take the links from a yahoo category and put them in the links db and have them formatted the way links needs them?
Michael
Quote Reply
Re: copy and past script In reply to
Hi Alex,

thanks for your help. Pleae don't call me an ideot but I am just very new to this script language. I try to use your sample but it won't work - internal server error. Here is what I have done:

#!/usr/bin/perl
my $begin = '<html>'
my $end = '</html>';
use LWP::Simple;
my $html = get ('http://www.yahoo.com/');
if ($html =~ /$begin(.+?)$end/o) {
print "Content is: $1";
}
else {
print "Couldn't find the info!";
}

To keep it simple I one again use yahoo as example.
This should normally capture the hole site, right ? It would be very helpfull if you can write down the hole script


Thanks so much ...

Diemo
Quote Reply
Re: copy and past script In reply to
Quick fix:

1. Add the following line before the if...else control block:

print "Content-type: text/html\n\n";

2. Change:

my $begin = '<html>'

to

my $begin = '<html>';

And make sure you saved and uploaded it in ASCII, and CHMODed (set file permissions) the script to 0755 - rwxr_xr_x.

Dan Smile
Quote Reply
Re: copy and past script In reply to
Hi Don and Alex,

OK, now the server error is gone. Nevertheless the script seems not to find the begin/end- mark and write "Couldn't find the info!". Any ideas ?

Here it is:
#!/usr/bin/perl
my $begin = '<html>';
my $end = '</html>';
use LWP::Simple;
my $html = get ('http://www.yahoo.com/index.html');
print "Content-type: text/html\n\n";
if ($html =~ /$begin(.+?)$end/o) {
print "Content is: $1";
}
else {
print "Couldn't find the info!";
}


Bye Diemo

Quote Reply
Re: copy and past script In reply to
Me again,

I saw another solution like:

#!/usr/local/bin/perl
use LWP::Simple;
#retrieve the page which should be modified
my $page = get("add URL");
$page =~ s/\n//g;
#everything above this will be cut off
$page =~ s/^.*<\/head>//is;
#everything below this will be cut off
$page =~ s/<\/body>.*$//is;
print "Content-type: text/html\n\n";
#print result
print "$page";


but since I don't know anything about sytax this script only works if you use the html formating like <head>, <body> etc.
I like to use any phrase in the HTML as upper and lower limit. I know the solution is at the following: eg. <\/head>//

but ...

PLEASE HELP ...

Bye Diemo
Quote Reply
Re: copy and past script In reply to
[pet peeve]You should learn perl before learning CGI. Re: the 500 server error question -- a 500 server error does not mean the script didn't work, it means that it didn't output the proper headers to the web server. Not all perl scripts are CGI scripts.[/pet peeve]

As for the problem, to help debugging, change the couldn't find line to:

print "Couldn't find info. Found this instead: $html";

then you can see what's happening. I suspect it's a case sensitivity issue, where they use <HTML> and you use <html>. You can either change your tags, or add a /i to the regular expression.

Hope this helps,

Alex

Quote Reply
Re: copy and past script In reply to
To ALEX,

Hi,

the server error was the caused by one little mistake you made - which was already corrected by Dan.
Nevertheless, the $html var return exactly the same then the original page. Therefore I thought that the problem with your script was caused by the if-else paragraph ?! I cheched for case sensitivity but this was OK

Bye Diemo
Quote Reply
Re: copy and past script In reply to
mellinger,

This will be "stealing".
I don't think that Yohoo will let any one do it without taking a legal action.

Regards,

Pasha

------------------
webmaster@find.virtualave.net
http://find.virtualave.net
Quote Reply
Re: copy and past script In reply to
Pasha,
Sorry, I seriously thought you could do this. Sorry again.
Michael
Quote Reply
Re: copy and past script In reply to
Hi mellinger,

I don't know Links but I think what you want to do is here_
http://www.gossamer-threads.com/scripts/misc/altavista.cgi

Bye Diemo
Quote Reply
Re: copy and past script In reply to
Mellinger, another option would be to pull links from the Open Source Directory Project. Granted, its not Yahoo!, but it seems to be growing faster than Yahoo! due to volunteer editors.
You can find a couple of scripts to do it at cgi-resources.com. Lycos and, I believe, Excite pull links from it currently to supplement their spidered indexes.
Quote Reply
Re: copy and past script In reply to
Hi, All.

As I can imagine - I should copy simple.pm to my perl directory. Am I right? Or I just can copy it to cgi-bin?

Instead of using LWP/simple.pp - how to use sockets?

Sincerely yours,


------------------
=========================
Seleznev Gregory
http://come.to/sgdesign
devil@quake.ru