How does LinksSQL check that links are valid? What module does it use or does it use it's own code built on Sockets? I'm just wondering as I'm getting very slow results using LWP::Parallel.
It uses it's own built in parallel link checker. The code to do the checking is in GT::URI::HTTP, and the code that does it in parallel is in Links::Parallel.
Why would automaticaly running the program under CRONTAB make any difference? Surely the system resource would be greater as you need to startup CRONTAB before you start my link checker.
I was referring to the number of simaltaneous parallel connections one can make before bogging down the system resources so much that parallel connections become worthless, and you'd might as well of stuck with singluar connections.
Why would automaticaly running the program under CRONTAB make any difference?
It would be faster as nothing has to be downloaded to your browser/telnet window. Also because you use NICE it would mean that no move than x% of system resources are being used to run the script.
Nothing gets downloaded anywhere. The output would be written to a log file if the program was executed by myself or by any other means. Why would I want to run a link checker without knowing the results? That seems to me like it's defeating the object.
I understand your point about x amount of system resources being used, that's fair enough. But my problem is how to work out what percentage is a good cut off point for parallel processing of multiple HEAD requests to various web servers around the globe. What is the point where parallel processing outweights the benefits of singular processing.
I think you want as much parallel processing as your machine can safely handle. The slowest point is almost always dns lookup and requesting the page, all of which can be done in parallel.
LWP::Parallel last I look doesn't do the dns lookup in parallel, so it can be slow.
Yes, you're right. The DNS lookups is the slowest part. Actualy putting out the HEAD or GET request seems very fast. Maybe the fastest way would be to do all hostname lookups locally and parallel before going onto checking them? Instead of looking up hostname and checking all in one?