Gossamer Forum
Home : Products : Links 2.0 : Customization :

Can Spider.cgi display description for all pages?

Quote Reply
Can Spider.cgi display description for all pages?
Hello,
What should i edit in order to make spider.cgi (Monster-submit mod) to display description for all the pages spidered.
For your reference the following is the spider.cgi code

#!/usr/bin/perl -w

# Script:
# Virtual Solutions Links Spider

# Copyright:
# Copyright 1999 by Virtual Solutions. Links Spider is a modification (with permission of Fluid Dynamics) of the Fluid
# Dynamics Search Engine Version 2.0 script. The Links Spider modification is freeware and is made available at no
# cost for both personal and commercial use. However, use does not constitute legal rights for resale or
# redistribution without the expressed written permission of both Virtual Solutions and Fluid Dynamics.

# Note:
# For further details including installation instructions please go to http://www.monster-submit.com/mods02.html.
# The original comment lines have been edited out but can be found in the original script at
# ftp://ftp.xav.com/search.txt.

# Fluid Dynamics Copyright Header:
# Fluid Dynamics Search Engine, Version 2.0
# Copyright 1997, 1998 by Fluid Dynamics. Please adhere to the copyright
# notice and conditions of use, described in the attached help file and
# hosted at the URL below. For the latest version and help files, visit:
# http://www.xav.com/scripts/search/

#Edit to point to domains allowed to use this script
@referers = ('www.monster-submit.com','monster-submit.com');

use Socket;

$Rules{'Hits Per Page'} = 10;
$Rules{'Multiplier: URL'} = 4;
$Rules{'Multiplier: Title'} = 10;
$Rules{'Multiplier: Keyword'} = 10;
$Rules{'Multiplier: Description'} = 4;
$Rules{'Max Characters: URL'} = 128;
$Rules{'Max Characters: Title'} = 96;
$Rules{'Max Characters: Description'} = 384;
$Rules{'Max Characters: Auto Description'} = 150;
$Rules{'Max Characters: Keywords'} = 256;
$Rules{'Max Characters: File'} = 64000;
$Rules{'Forbid All Cap Titles'} = 1;
$Rules{'Forbid All Cap Descriptions'} = 1;
$Rules{'Crawler: Minimum WhiteSpace'} = 0.01;
$Rules{'Crawler: Max Pages Per Batch'} = 12;
$Rules{'Crawler: Max Redirects'} = 6;
$Rules{'Crawler: Days Til Refresh'} = 30;
$Rules{'Crawler: User Agent'} = 'Mozilla/4.0 (compatible: FDSE robot)';
$Rules{'Crawler: Follow Query Strings'} = 0;
$Rules{'Crawler: Rogue'} = 0;

@PromoteSites = "";
$Rules{'Promote Value'} = 20;

@IgnoredWords = ('a','about','all','an','and','any','are','as','at',
'be','been','by','can','do','find','for','from','get','have','he',
'how','htm','html','http','i','if','in','is','it','me','most','new',
'no','not','of','on','one','or','other','page','s','site',
'that','the','this','to','two','use','w','web','what','when','where',
'which','who','why','will','with','you','your');

@MonthNames = ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec');

my %FORM = &ReadInput;
$|=1;

print <<'EOM';
Content-Type: text/html

<html>
<head>
<title>Links Spider</title>
<meta name="Robots" content="noindex">
<meta name="Robots" content="nofollow">
</head>

<body bgcolor="#FFFFFF" text="#000080" link="#3399FF" vlink="#3399FF" alink="#0000FF">

<table border="1" bgcolor="#FFFFFF">
<tr>
<td bgcolor="#C0C0C0"><font color="#FFFFFF">
<b>Attention Webmasters:</b> ALL listings are available for spidering. If you do not wish for your
site to be spidered then it will be necessary to write a
<a href="http://www.altavista.com/av/content/addurl_exclude.htm" target="new">robots.txt</a> file for the
site. Or you can include meta tags in each page between the <head> ... </head> lines:<p>

<b><meta name="Robots" content="noindex"></b> <i>To exclude page from being spidered</i>

<b><meta name="Robots" content="nofollow"></b> <i>To exclude embedded links from being
spidered</i><p>
</font></td>
</tr>
</table><p>

EOM

local($check_referer) = 0;
if ($ENV{'HTTP_REFERER'}) {
foreach $referer (@referers) {
if ($ENV{'HTTP_REFERER'} =~ m|https?://([^/]*)$referer|i) {
$check_referer = 1;
last;
}
}
}
else {
$check_referer = 0;
}
if ($check_referer != 1) {
$E = "Error: Bad Referer. The form attempting to use this script resides at $ENV{'HTTP_REFERER'} which is not allowed to access this script.";
&error;
}

$|=0;
my @HITS;

if ($FORM{'URL'} ne "") {
if (defined $FORM{'URL'}) {
$FORM{'AddLink0'} = $FORM{'URL'};
}
foreach (keys %FORM) {
next unless (m!^AddLink!);
if ($FORM{$_} =~ m!^http://([^\/]+)/!) {
push(@AddressesToIndex,$FORM{$_});
}
else {
push(@AddressesToIndex,"$FORM{$_}/");
}
}
&AddURL(2,@AddressesToIndex);
}
exit;

sub ReadInput {

my $InputString = '';
my ($Name,$Value);
my %FORM = ('Mode','','Terms','','Password','','SetPassword','','CL',0,'maxhits',0);

$InputString = $ENV{'QUERY_STRING'};

foreach ((split(m!\&!,$InputString)),@ARGV) {
next unless (m!^(.*?)=(.*)$!);
($Name,$Value) = ($1,$2);
$Name =~ s!\%([a-fA-F0-9][a-fA-F0-9])!pack('C',hex($1))!eg;
$Name =~ tr!+! !;
$Value =~ tr!+! !;
$Value =~ s!\%([a-fA-F0-9][a-fA-F0-9])!pack(C,hex($1))!eg;
$FORM{$Name} = $Value;
}
return %FORM;
}

sub GetAbsoluteAddress {

my ($Link,$URL) = @_;

if (($Link =~ m!^\/!) && ($URL =~ m!^http\:\/\/([^\/]+)!i)) {
$Link = "http://$1$Link";
}
elsif (($Link =~ m!^(\w+)\:!) && ($1 !~ m!^http$!i)) {
return '';
}
elsif (($Link !~ m!^http\:\/\/!i) && ($URL =~ m!^(.*)\/!)) {
$Link = $1.'/'.$Link;
}

if ($Link =~ m!^(.*?)\#!) {
$Link = $1;
}
$Link =~ s!^HTTP\:\/\/!http\:\/\/!i;

if ($Link =~ m!^http://([^\/]+)\:80$!) {
$Link = 'http://'.$1.'/';
}
elsif ($Link =~ m!^http://([^\/]+)\:80/(.*)$!) {
$Link = 'http://'.$1.'/'.$3;
}

if ($Link =~ m!^http://([^\/]+)$!) {
$Link .= '/';
}
$Link =~ s!/\./!/!g;

while ($Link =~ m!^([^\?]+)\/([^\/|\.]+)\/\.\.\/(.*)$!) {
$Link = $1.'/'.$3;
}
return $Link;
}

sub OpenSocket {

my ($THEM,$PORT) = @_;

unless (socket(HTTP, PF_INET, SOCK_STREAM, getprotobyname('tcp'))) {
$E = "Error: Low-level socket() function failed with system error \"$!\"";
&error;
}
if ($HashIP{$THEM}) {
$HexIP = $HashIP{$THEM};
}
else {
$HexIP = inet_aton($THEM);
$HashIP{$THEM} = $HexIP;
}

if ((!($HexIP)) || ($HexIP eq 'fail')) {
$HexIP = 'fail';
$E = "Error: Hostname $THEM does not have a DNS entry (no corresponding IP address could be found for this machine). The address may have been mistyped, the site may no longer be online, it's domain may have expired or network errors could have prevented resolution.";
&error;
}
unless (connect(HTTP, sockaddr_in($PORT,$HexIP))) {
$E = "Error: Connect() failed with system error \"$!.\" Typically connect errors involve unreachable or non-functional servers, incorrect port numbers, local DNS problems or a corrupt TCP environment";
&error;
}
select(HTTP);
$|=1;
select(STDOUT);
return 1;
}

sub GetRobotFile {

($THEM,$PORT,$RobotForbidden) = @_;
if ($RobotForbidden) {
$RobotForbidden .= '|';
}
else {
$RobotForbidden = '';
}
$RobotForbidden .= '(';
$RobotForbidden .= quotemeta("$THEM.robot");
$RobotForbidden .= ')';
unless (&OpenSocket($THEM,$PORT)) {
print "\n";
}
}

unless (length($HTMLText) > 24) {
$E = "Error: Less than 24 bytes of HTML text";
&error;
}

$NumSpaces = ($HTMLText =~ s! ! !g);
if (($NumSpaces/length($HTMLText)) < $Rules{'Crawler: Minimum WhiteSpace'}) {
$E = "Error: Suspicious content - only $NumSpaces blank spaces in " . length($HTMLText) . " characters. \n";
$E .= "This is forbidden by the 'WhiteSpace Ratio' set up in the \$Rules{} array";
&error;
}
else {
return ($URL,&RawTranslate($HTMLText));
}
}

sub RawTranslate {

$_ = shift;

tr!\n\r\t! !;
s/\&nbsp\;/ /g;

s/(\&Agrave\;|\&\#192\;)//g;
s/(\&Aacute\;|\&\#193\;)//g;
s/(\&Acirc\;|\&\#194\;)//g;
s/(\&Atilde\;|\&\#195\;)//g;
s/(\&Auml\;|\&\#196\;)//g;
s/(\&Aring\;|\&\#197\;)//g;
s/(\&AElig\;|\&\#198\;)//g;
s/(\&Ccedil\;|\&\#199\;)//g;
s/(\&Egrave\;|\&\#200\;)//g;
s/(\&Eacute\;|\&\#201\;)//g;
s/(\&Ecirc\;|\&\#202\;)//g;
s/(\&Euml\;|\&\#203\;)//g;
s/(\&Igrave\;|\&\#204\;)//g;
s/(\&Iacute\;|\&\#205\;)//g;
s/(\&Icirc\;|\&\#206\;)//g;
s/(\&Iuml\;|\&\#207\;)//g;
s/(\&Ograve\;|\&\#210\;)//g;
s/(\&Oacute\;|\&\#211\;)//g;
s/(\&Ocirc\;|\&\#212\;)//g;
s/(\&Otilde\;|\&\#213\;)//g;
s/(\&Ouml\;|\&\#214\;)//g;
s/(\&times\;|\&\#215\;)//g;
s/(\&Oslash\;|\&\#216\;)//g;
s/(\&Ugrave\;|\&\#217\;)//g;
s/(\&Uacute\;|\&\#218\;)//g;
s/(\&Ucirc\;|\&\#219\;)//g;
s/(\&Uuml\;|\&\#220\;)//g;
s/(\&Yacute\;|\&\#221\;)//g;
s/(\&THORN\;|\&\#222\;)//g;
s/(\&szlig\;|\&\#223\;)//g;
s/(\&agrave\;|\&\#224\;)//g;
s/(\&aacute\;|\&\#225\;)//g;
s/(\&acirc\;|\&\#226\;)//g;
s/(\&atilde\;|\&\#227\;)//g;
s/(\&auml\;|\&\#228\;)//g;
s/(\&aring\;|\&\#229\;)//g;
s/(\&aelig\;|\&\#230\;)//g;
s/(\&ccedil\;|\&\#231\;)//g;
s/(\&egrave\;|\&\#232\;)//g;
s/(\&eacute\;|\&\#233\;)//g;
s/(\&ecirc\;|\&\#234\;)//g;
s/(\&euml\;|\&\#235\;)//g;
s/(\&igrave\;|\&\#236\;)//g;
s/(\&iacute\;|\&\#237\;)//g;
s/(\&icirc\;|\&\#238\;)//g;
s/(\&iuml\;|\&\#239\;)//g;
s/(\&eth\;|\&\#240\;)//g;
s/(\&ntilde\;|\&\#241\;)//g;
s/(\&ograve\;|\&\#242\;)//g;
s/(\&oacute\;|\&\#243\;)//g;
s/(\&ocirc\;|\&\#244\;)//g;
s/(\&otilde\;|\&\#245\;)//g;
s/(\&ouml\;|\&\#246\;)//g;
s/(\&divide\;|\&\#247\;)//g;
s/(\&oslash\;|\&\#248\;)//g;
s/(\&ugrave\;|\&\#249\;)//g;
s/(\&uacute\;|\&\#250\;)//g;
s/(\&ucirc\;|\&\#251\;)//g;
s/(\&uuml\;|\&\#252\;)//g;
s/(\&yacute\;|\&\#253\;)//g;
s/(\&thorn\;|\&\#254\;)//g;
s/(\&yuml\;|\&\#255\;)//g;

return $_;
}

sub CompressStrip {

$_ = shift;
$_ = ' '.$_.' ';
s!\W! !g;
s!\_! !g;
$_ = StripIgnoreWords($_);
s!\s+! !g;
return $_;
}

sub AddURL {

my ($tag,@AddressesToIndex) = (@_);
$|=1;

$SetSaveLinks = 1;
$NumRank = 0;

$NumRedirectsFollowed = 0;
$MaxAddresses = scalar @AddressesToIndex;
ADDRESS: for ($AddressIndex = 0; $AddressIndex < $MaxAddresses; $AddressIndex++) {
if ($Rules{'Crawler: Max Pages Per Batch'} <= $AddressIndex) {
push(@IndexedAddresses,'DONE');
last ADDRESS;
}
$URL = $AddressesToIndex[$AddressIndex];
if ($URL !~ m!^http://!i) {
$NumRank++;
$SpiderResults{$URL} = -1;
push(@IndexedAddresses,$URL);
next ADDRESS;
}
$OldURL = $URL;
($URL,$Text) = &GetStringByURL($URL);
if (($Text eq '302') && ($NumRedirectsFollowed < $Rules{'Crawler: Max Redirects'})) {
$NumRank++;
$SpiderResults{$OldURL} = -1;
if ($URL =~ m!http://([^\/]+)$!) {
$URL .= '/';
}
@AddressesToIndex = ('',@AddressesToIndex);
$AddressesToIndex[$AddressIndex+1] = $URL;
$MaxAddresses++;
$NumRedirectsFollowed++;
push(@IndexedAddresses,$OldURL);
next ADDRESS;
}
elsif ($Text eq '302') {
$NumRank++;
$SpiderResults{$OldURL} = -1;
push(@IndexedAddresses,$OldURL);
next ADDRESS;
}
unless ($Text) {
$NumRank++;
$SpiderResults{$URL} = -1;
push(@IndexedAddresses,$URL);
next ADDRESS;
}
$RecordLine = &MakeRecord($URL,'',$Text);
$SpiderResults{$URL} = $RecordLine;
$ByteSize = length($Text);
($DD,$MM,$YYYY) = unpack('A2A2A4',substr($RecordLine,2,8));
$NumRank++;
$Month = $MonthNames[$MM];
$UserResults{$URL} = &AdminVersion($NumRank, $URL, $Title, $Description, $ByteSize, $DD, $Month, $YYYY);
push(@IndexedAddresses,$URL);
next ADDRESS;
}

ADDRESS: foreach $URL (@IndexedAddresses) {
if ($UserResults{$URL}) {
print $UserResults{$URL};
}
}

&CompileLinks;
$LinkCount = scalar (keys %SaveLinks);

if ($LinkCount) {
$QueryString = "";
$QueryString =~ tr! !+!;

$LinkCount = 1;

$PastTime = time - (86400 * $Rules{'Crawler: Days Til Refresh'});
($UnSearched,$OutDated,$Searched,$Failed,$Checked) = (0,0,0,0,1);
foreach (reverse (sort {$SaveLinks{$b} <=> $SaveLinks{$a} || $a cmp $b} keys %SaveLinks)) {
if ($SaveLinks{$_} == 1) {
if ($UnSearched == 0) {
$UnSearched = 1;
$Checked = 1;
}
}
elsif ($SaveLinks{$_} == 2) {
if ($Failed == 0) {
$Failed = 1;
$Checked = 0;
}
}
elsif ($SaveLinks{$_} <= $PastTime) {
if ($OutDated == 0) {
$OutDated = 1;
$Checked = 1;
}
}
else {
if ($Searched == 0) {
$Checked = 0;
$Searched = 1;
}
}
print "<a href=\"$_$QueryString\">$_</a>
\n";
$LinkCount++;
}
}
else {
print <<"EOM";
No embedded links were found during this crawl session
EOM
}
print <<"EOM";

</body>
</html>
EOM
}

sub CompileLinks {

foreach (@SavedLinks) {
$SaveLinks{$_} = 1;
# push(@Global,"$_");
}
}

sub MakeRecord {

my ($URL, $LastModT, $sText) = @_;
$FBYTES = sprintf('.f',length($sText));
($Title,$Description,$sText,$Links) = &Extract_Meta($sText,$URL);
$AlphaData = ' ';
$AlphaData .= "u= $URL ";
$AlphaData .= "t= $Title ";
$AlphaData .= "d= $Description ";
$AlphaData .= 'uM='.CompressStrip($URL);
$AlphaData .= 'h='.CompressStrip($sText);
$AlphaData .= 'l='.CompressStrip($Links);
$LastModT = $LastModT ? $LastModT : time;
($DD,$MM,$YYYY) = (localtime($LastModT))[3..5];
$YYYY += 1900;
$CC = 1;
foreach (@PromoteSites) {
next unless ($URL =~ m!^$_!i);
$CC = $Rules{'Promote Value'};
last;
}
for ($CC,$DD,$MM) {
$_ = sprintf('.f',$_);
}
return "$CC$DD$MM$YYYY$FBYTES$AlphaData\n";
}

sub Extract_Meta {

($HTML_Text,$URL) = @_;
($Title, $Description, $Links) = ('','','');

foreach (split(m!\<A !i, $HTML_Text)) {
next unless (m!^([^\>]*)HREF(\s+)?=(\s+)?\"?([^\"\s\>]+)!i);
$ThisLink = $4;
$Links .= ' '.$ThisLink;
next unless ($SetSaveLinks == 1);
next if (($Rules{'Crawler: Follow Query Strings'} == 0) && ($ThisLink =~ m!\?!));
$ThisLink = &GetAbsoluteAddress($ThisLink,$URL);
push(@SavedLinks,$ThisLink) if ($ThisLink);
}

foreach (split(m!\<I?FRAME !i, $HTML_Text)) {
next unless (m!^([^\>]*)SRC(\s+)?=(\s+)?\"?([^\"\s\>]+)!i);
$ThisLink = $4;
$Links .= ' '.$ThisLink;
next unless ($SetSaveLinks == 1);
next if (($Rules{'Crawler: Follow Query Strings'} == 0) && ($ThisLink =~ m!\?!));
$ThisLink = &GetAbsoluteAddress($ThisLink,$URL);
push(@SavedLinks,$ThisLink) if ($ThisLink);
}

$HTML_Text .= ' || ';

if ($HTML_Text =~ m!<TITLE.*?>(.*?)<!i) {
$Title = ' '.$1;
$HTML_Text =~ s!<TITLE.*?>.*?<\/TITLE>! !i;
$HTML_Text .= $Title x $Rules{'Multiplier: Title'};
}
elsif (($FILE) && ($FILE =~ m!([^\/]+)$!)) {
$Title = $1;
}

elsif ($URL =~ m!([^\/]+)$!) {
$Title = $1;
}
elsif ($FILE) {
$Title = $FILE;
}
elsif ($URL) {
$Title = $URL;
}
else {
$Title = 'Document';
}

if (($Rules{'Forbid All Cap Titles'}) && ($Title !~ m![a-z]!)) {
$NewTitle = '';
foreach (split(m!\s+!,$Title)) {
unless (length($_) > 1) {
$NewTitle .= $_.' ';
next
}
$NewTitle .= ' '.substr($_,0,1);
$_ = substr($_,1,(length($_)-1));
tr[A-Z][a-z];
$NewTitle .= $_;
}
$Title = $NewTitle;
}

if ($HTML_Text =~ m!.*?<META([^\>]*?)(NAME|HTTP-EQUIV)="keywords"([^\>]*?)(CONTENT|VALUE)="([^\"]+)"!i) {
$KeyWords = ' '.$5;
$HTML_Text .= $KeyWords x $Rules{'Multiplier: Keyword'};
}
if ($HTML_Text =~ m!.*?<META([^\>]*?)(NAME|HTTP-EQUIV)="description"([^\>]*?)(CONTENT|VALUE)="([^\"]+)"!i) {
$Description = ' '.$5;
$HTML_Text .= $Description x $Rules{'Multiplier: Description'};
}

$HTML_Text =~ s/<[^>]*\s+ALT\s*=\s*"(([^>"])*)"[^>]*>/ $1 /ig;

$NoScript = '';
foreach (split(m!(\<\/SCRIPT>|\<\/STYLE>)!i, $HTML_Text)) {
next unless $_;
if (m!^(.*)(\<SCRIPT|\<STYLE)!i) {
$NoScript .= ' '.$1;
}
else {
$NoScript .= ' '.$_;
}
}
$HTML_Text = $NoScript;

if ($HTML_Text =~ m!(.*)<NOFRAMES>(.*)</NOFRAMES>(.*?)!i) {
if (length($2) < 2000) {
$HTML_Text = $1.' '.$2;
}
}

$HTML_Text =~ s!<([^>]*?)>! !g;
$HTML_Text =~ s!\s+! !g;

unless ($Description) {
$tempDescription = substr($HTML_Text,0,$Rules{'Max Characters: Auto Description'});
if ($tempDescription =~ m!([^\|]*)\s+!) {
$Description = $1.'...';
}
else {
$Description = 'No description available.';
}
}

$HTML_Text =~ s!(\W|\_)! !g;
$Title =~ s!\s+! !g;
if ($Title =~ m!^ (.+)!) {
$Title = $1;
}

$Description =~ s!\s+! !g;
if ($Description =~ m!^ (.+)!) {
$Description = $1;
}

if (($Rules{'Forbid All Cap Descriptions'}) && ($Description !~ /[a-z]/)) {
$NewDescription = '';
foreach (split(/\s+/,$Description)) {
$NewDescription .= ' '.substr($_,0,1);
$_ = substr($_,1,(length($_)-1));
tr[A-Z][a-z];
$NewDescription .= $_;
}
$Description = $NewDescription;
}
return($Title,$Description,$HTML_Text,$Links);
}

sub AdminVersion {

my ($Rank,$URL,$Title,$Description,$Size,$Day,$Month,$Year) = @_;
$Size = ($Size<1500)?int($Size).' bytes':(int($Size/1000)).' Kilobytes';
$wURL = webEncode($URL);
return <<"EOM";
<a href="$wURL"><b>$Title</b></a> $Description

<i><a href="$wURL">$URL</a><small> - size $Size - last updated $Day $Month $Year</small></i><p>

EOM
}

sub webEncode {

$_ = shift;
return $_ unless m!(\%|\+|\&|\s)!;
s!\%!\%!g;
s!\+!\+!g;
s!\&!\&!g;
s! !\+!g;
return $_;
}

sub StripIgnoreWords {

$_ = shift;
foreach $Ignore (@IgnoredWords) {
s! $Ignore ! !g;
}
return $_;
}

sub error {

print "<center>\n";
print "$E\n\n";

print <<"EOM";
</center>

</body>
</html>
EOM
exit;
}
exit;


-----------------------------------------------
Best Regs
JackofNone

Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
Did anybody ever post a reply to this question? I've been looking, but I can't seem to find it.

http://www.irishsearch.net/
Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
Hello,
in the german forum (http://www.nicky.net/foren/) is someone who called Martin,
who has edit the spider to fetch the meta tags of a site. Search the german forum for Spider or MetaSpider if you're interested.

DelPierro
http://www.simball.de/links_mods/
Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
Thanks - Danke
I'll have a look (hope I can understand it)
M

http://www.irishsearch.net/
Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
Okay - I found the thread in the German forum - but I cannot understand what they were saying! Help!

http://www.irishsearch.net/
Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
The mod is available for download in the resources section on the German site - http://www.nicky.net/links/mods/Mods/

It wasn't exactly what I was looking for, but I might find a use for it yet!

http://www.irishsearch.net/
Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
What a great mod. However, my limited experience in the German language makes it hard to understand the instructions. Has anyone translated this mod into English yet. I could not believe it when I saw the mod - it will actually allow me to spider a URL and put it straight into the add.cgi procedure.

Thanks,

Joe

Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
How you managed to translate this mod into English. It seems to me that it is a great mod BUT I cannot understand the German.

Help is highly appreciated

Joe

Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
I'll ask the author of the mod, Martin, if it's possible to translate the instruction for english user.
I think someone in the german forum would do that, but I don't know if that has been done.
If I have an english instruction I'll post it here. Wink

There is only one problem: The mod is written for the german Links version (http://www.nicky.net/) and musst be modified to use it with the english (standard) version.

DelPierro
http://www.simball.de/links_mods/
Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
Many thanks,

I think that a great deal of LINKS user would appreciate this mod.

Thanks again,

Joe

Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
But is there any difference between Martin (German) MOD and the standard English Spider.cgi???
The code seems very similar - although I (STILL) haven't had a chance to play with it.

http://www.irishsearch.net/
Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
I believe that there are a few minor differences between the two mods. I have not had the time to play around with it either mainly due to the fact that it is all in German.

If you manage to get it to work - please post a reply.

Auf wiedersehen (hope the spelling is correct),

Joe

Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
Any reply from the German guy yet?

jRich
Quote Reply
Re: Can Spider.cgi display description for all pages? In reply to
Don't know but if you've got lwp you could use the suggestion I provided here:

http://www.gossamer-threads.com/...ew=&sb=&vc=1

Just change the sub spider to:

sub spider_it {
use LWP::Simple;
my $url = get($in{'url'});
@content=split(/\n/,$url);
LINE: foreach $line (@content) {
chomp($line);
($title = $1) if ($line =~ m|<TITLE>([^"]+)</TITLE>|i);
($description = $1) if ($line =~ m|<meta name="Description" content="([^"]+)">|i);
($name = $1) if ($line =~ m|<meta name="Author" content="([^"]+)">|i);
last LINE if ($line =~ m|</head>|i);
&site_html_add_form ();
}

Alternatively you could use lwp but instead use the module which just goes thru the head of the document (I can't remember off the top of my head what it was called).



Glenn

Links 2 Mods Site:
http://cgi-resource.co.uk/pages/links2mods.shtml
Quote Reply
Re: [jackofnone] Can Spider.cgi display description for all pages? In reply to
ENGLISH VERSION!!!



##########################################################
## Globals ##
##########################################################


:

require " links.cfg ";

Afterwards you seek:

sub html_add_form {

and insert in the Bodybereich:

< table boron that = " 0 " cellpadding = " 0 " cellspacing = " 0 ">
< tr >
< td >
< div align = " center "> M E T A S P I D E R </div >< br >
< form action = " $$db_cgi_url/metaspider_a.cgi " method = " GET ">
< b >< font size = " 1 " face = " Verdana ">< font color = " talks "> URL: < br ></font ><// >< b font >
< input type = " text " name = " URL " size = " 25 " VALUE = " HTTP:// ">

< input type = " submit " VALUE = " Spidern ">
</form >
< HR noshade >
< A href = " http://www.linkdb.de "> LINK DATA COUSIN </A ></td >
</tr >
</table >


2. Edit metaspider_a.cgi

Unterstuezte designations:

Name for the title:
= " titles ";
Name for the description:
$$Beschreibung = " description ";
Name for the name:
$$Name = " name ";
Name for the email:
$$Email = " email ";
Name for the URL:
eitenURL = " URL ";

Name for the field category:
$$Kategorie = " category ";
Optionally --
Name for the keywords:
chluesselwoerter = " keywords ";


If you liked to edit the HTML code (Bodybereich), are starting from line 828.

3 . Load metaspider_a.cgi into the user the cgi listing, thus where entry cgi, search cgi, reference cgi etc.. are.



That is everything
If you have still questions, suggestions or improvement suggestions, then you mailen on me:
martin@linkdb.de


Much success!