Gossamer Forum
Quote Reply
Modify Andy's spider
Would like to modify the part:

(m#.*?<meta([^\>]*?)(name|http-equiv)="?description"?([^\>]*?)(content|value)="?([^\"]+)"?#i) {
$description = $5;
}
}

to pick up everything between <!-- news-article --> and <!-- /news-article --> instead... and put it in the description block...

Is this possible???

Thanks,


Is this

</not a clue>
Quote Reply
Re: [Dinky] Modify Andy's spider In reply to
(m#\<\!-- news-article --\>(.+?)\<!-- /news-article --\>#i) {
$description = $5;
}
}

Maybe the above? Unsure

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Modify Andy's spider In reply to
Tried that earlier, keep getting the None Got in the description field... still tweaking will post later...

</not a clue>
Quote Reply
Re: [Dinky] Modify Andy's spider In reply to
Heres a link to one of the pages I'm trying to parse... http://www.defenselink.mil/news/Dec2003/n12232003_200312237.html
I'm actually thinking of going with php now Frown

</not a clue>
Quote Reply
Re: [Dinky] Modify Andy's spider In reply to
>> I'm actually thinking of going with php now

Why? PHP is a "poor mans" perl. That's how it was designed. It was set up to allow people without access to a cgi-bin scripting directory to run "page" scripts within their HTML. I don't [ever] see that as secure. Your code can be displayed by a simple mis-configuration -- and it happens. Anything embedded in the page can be made public.

Also, your HTML starts to look really weird.

With Links (or almost any GT Program) you have _all_ the power of PHP, and _MORE_. You can access the full power of perl, keep your code and templates *COMPLETELY* separate, and gain the speed of mod_perl simply.

Their template parser is a thing of beauty! Never display HTML without it ! <G>. There is no way you can "accidentally" display stuff you don't want seen. If the server has problems, people will get 500 errors *NOT* your source code. You can make *MAJOR* changes to how links works by tags in the template parser -- which is effectively PHP. You can embed entire functions and routines with a simple <%..%> tag. You can display any page with the template parser by calling page.cgi?p=pagename

If you were thinking about using page.php with links, well.... Again, why? Anything you can do with PHP you can do *BETTER* with the native GT engine and perl.

Granted, PHP is better than it was a couple of years ago... but it's still not perl.

Also, regexs (pattern matches) are pattern matches and I doubt even the AI languages can do it better (and easier) than Perl :)

Just a few of my [un]biased opinions :)


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Modify Andy's spider In reply to
The php thing was a joke...

I think because of the <! -- zzz --> thing, it's not picking it up... still can't figure it out....

</not a clue>
Quote Reply
Re: [Dinky] Modify Andy's spider In reply to
>> The php thing was a joke...

I guess I'm a little sensitive <G> I tried PHP back at version 2/3 (right at the change over) and never was happy with it. I kept going back to perl, and then GT released Links SQL... and I couldn't even thing about going back :)

Hmm... I'm not sure about the problem. That sort of thing makes my head spin. :)

Just a thought, though, do you have to escape the greater than/less than signs when used in the pattern match?


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Modify Andy's spider In reply to
PHPAngelic !! not a bad way to go, just don't like all the security holes it leaves...

Here's what I have tried in various forms/fashions...
\<\!-- news-article --\>\
on each side, still no luck -- going through some old scripts to see if they have anything...

</not a clue>
Quote Reply
Re: [Dinky] Modify Andy's spider In reply to
I'll take a look tomorrow. Its probably gonna be a bitch to do, so I don't really wanna ruin my Xmas over it Tongue

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Modify Andy's spider In reply to
Thanks,
It's not something that I needed yesterday, just trying to speed up my news service...
Tried:

$start_pattern="<!-- news-article -->";
$end_pattern="<!-- /news-article -->";
$description =~/$start_pattern(.*?)$end_pattern/;

Didn't work either

</not a clue>