Gossamer Forum
Home : Products : Gossamer Links : Discussions :

Check Links

Quote Reply
Check Links
How does LinksSQL check that links are valid? What module does it use or does it use it's own code built on Sockets? I'm just wondering as I'm getting very slow results using LWP::Parallel.

Thanks

- wil
Quote Reply
Re: [Wil] Check Links In reply to
Hi,

It uses it's own built in parallel link checker. The code to do the checking is in GT::URI::HTTP, and the code that does it in parallel is in Links::Parallel.

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] Check Links In reply to
Ah. I'm now getting quite similar results using LWP::Parallel as apposed to any of the other LWP modules.

I am, however, concerned at the apparent overheads of parallel processing such a large number of URLs. Couldn't this really bog down a server?

- wil
Quote Reply
Re: [Wil] Check Links In reply to
Yeah. I would run it via Cron with a NICE level setting (i.e. not over 40% of the servers resources). That way it shouldn't bog it down too much.

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [AndyNewby] Check Links In reply to
Why would automaticaly running the program under CRONTAB make any difference? Surely the system resource would be greater as you need to startup CRONTAB before you start my link checker.

I was referring to the number of simaltaneous parallel connections one can make before bogging down the system resources so much that parallel connections become worthless, and you'd might as well of stuck with singluar connections.

- wil
Quote Reply
Re: [Wil] Check Links In reply to
Quote:
Why would automaticaly running the program under CRONTAB make any difference?
It would be faster as nothing has to be downloaded to your browser/telnet window. Also because you use NICE it would mean that no move than x% of system resources are being used to run the script.

Hope that makes sense Tongue

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [AndyNewby] Check Links In reply to
Nothing gets downloaded anywhere. The output would be written to a log file if the program was executed by myself or by any other means. Why would I want to run a link checker without knowing the results? That seems to me like it's defeating the object.

I understand your point about x amount of system resources being used, that's fair enough. But my problem is how to work out what percentage is a good cut off point for parallel processing of multiple HEAD requests to various web servers around the globe. What is the point where parallel processing outweights the benefits of singular processing.

- wil
Quote Reply
Re: [Wil] Check Links In reply to
I think you want as much parallel processing as your machine can safely handle. The slowest point is almost always dns lookup and requesting the page, all of which can be done in parallel.

LWP::Parallel last I look doesn't do the dns lookup in parallel, so it can be slow.

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] Check Links In reply to
Yes, you're right. The DNS lookups is the slowest part. Actualy putting out the HEAD or GET request seems very fast. Maybe the fastest way would be to do all hostname lookups locally and parallel before going onto checking them? Instead of looking up hostname and checking all in one?

- wil