Gossamer Forum
Quote Reply
cache a page plugin
Hi All,

I am working on a new plugin called cache_a_page. What it does is basically to generate a cached page for every link in Link database. When a link is broken, users can still see the cached page, the same as Google's cached page.

It's useful for sites which provide deep link to content pages. Very often, links in database become invalid not long after the links were added due to many situations such as the server is down, the page is deleted or moved, the whole site disappears... If a cached page is stored in Link database and can serve to users it will save lots of time maintaining the database.

What it does:

1) The plugin will first create a table in Linksql database for storing cached page.
2) It allows links manager to do batch caching based on links status (certainly you don't want to cache links for which page doesn't exist (404)
3) It allows to do individual caching given a linkID or URL.
4) It allows to delete cached page for one or more links based on URL, LinkID or Status
5) It will build the cached page based on a template for viewing by users. The template will add a head to the cached page saying that "This is a cached page for the URl .....", the same we find on Google
6) It allows links manager to view the cached page by searching or browsing.
7) If a cached page exists for a link, users will see the choice to view the cached page for each link by adding a line to the template links.html

What it doesn't do

1) It doesn't cache PDF, DOC and other special file types
2) Currently it won't store images for the cached pages (it seems that Google dosen't do it either). It won't be a big problem since in most cases when a page is deleted from a server the images for the page may not move unless the whole site is moved. So the image tag in the cached page can still point to the images on the remote server when the cached page is viewed.

I would like to have some input from people who is (or is not) interested in this plugin before I release it.

Cheers.

Long
Quote Reply
Re: [long327] cache a page plugin In reply to
Sounds good.

Is it possible to run the batch caching with a cron job? Also, when you retrieve the pages, does your plugin check the status of the page that it retrieves?

Looking forward to the plugin.

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [long327] cache a page plugin In reply to
Very good idea. At least I like it.
I think many of us would find it interesting...

Quote:
Currently it won't store images for the cached pages (it seems that Google dosen't do it either)
Yeah, it seems the base href is the original page. And images are relative to the base URL.

Best regards,
Webmaster33


Paid Support
from Webmaster33. Expert in Perl programming & Gossamer Threads applications. (click here for prices)
Webmaster33's products (upd.2004.09.26) | Private message | Contact me | Was my post helpful? Donate my help...
Quote Reply
Re: [long327] cache a page plugin In reply to
If you have a remote cgi script (i.e get_cache.cgi, in the admin folder), that can be run via Telnet/SSH to do the job, I will add a setup feature in my Cron_Manager plugin, that will let people setup the cron job with a couple of clicks (if thats ok with you Smile)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [yogi] cache a page plugin In reply to
Thank you for your interest.

Quote:


Is it possible to run the batch caching with a cron job? Also, when you retrieve the pages, does your plugin check the status of the page that it retrieves?


It is poosible to do so. But I have not considered because for my purpose I don't need to update my cached pages regularily. If people need this feature, I can add it.

The plugin doesn't check the status of links but can do caching based on link status. Actually Checking status and caching are the very similar processes with the former getting the status and the latter getting the content. They can naturally be run as a single task. It is up to Alex to add this function to the verify module.

Long
Quote Reply
Re: [Andy] cache a page plugin In reply to
Quote:
If you have a remote cgi script (i.e get_cache.cgi, in the admin folder), that can be run via Telnet/SSH to do the job, I will add a setup feature in my Cron_Manager plugin, that will let people setup the cron job with a couple of clicks (if thats ok with you Smile)
That's OK with me. As I said above, if caching with cron job is welcomed, I will have such script available.

Long

Last edited by:

long327: Jan 7, 2003, 8:21 AM
Quote Reply
Re: [webmaster33] cache a page plugin In reply to
Quote:
Yeah, it seems the base href is the original page. And images are relative to the base URL.


Good point. We have to change the image URL from relative one to a absolute when storing a page.

Long
Quote Reply
Re: [long327] cache a page plugin In reply to
I use $res->base with lwp for doing that Smile
Quote Reply
Re: [Paul] cache a page plugin In reply to
Quote:
I use $res->base with lwp for doing that


Do you use GT's parallel.pm?

Long
Quote Reply
Re: [long327] cache a page plugin In reply to
Jason let me try the not yet released GT::WWW but unfortunately a method to retrieve the base href was not available so Jason said he'd add it in. I ended up using LWP::UserAgent.

Last edited by:

Paul: Jan 7, 2003, 9:10 AM
Quote Reply
Re: [Paul] cache a page plugin In reply to
GT::WWW is released, it's in Links SQL 2.1.2...

Ivan
-----
Iyengar Yoga Resources / GT Plugins
Quote Reply
Re: [long327] cache a page plugin In reply to
Interesting idea. I don't think I've see a directory with cached pages. The reason Google has cached pages is because it spiders/crawls the pages and stores them in a repository. I like your idea of doing it for Links SQL. Smile

Sean
Quote Reply
Re: [long327] cache a page plugin In reply to
Likely would be fine to have some robots feature, like reading, understanding robots.txt and also reading the META tags.
In case you implement the automatic caching with a cron job, like Yogi suggested, then plugin should understand the <meta NAME="revisit-after" CONTENT="x days"> to schedule next caching date...
This should be optional, tough.

Best regards,
Webmaster33


Paid Support
from Webmaster33. Expert in Perl programming & Gossamer Threads applications. (click here for prices)
Webmaster33's products (upd.2004.09.26) | Private message | Contact me | Was my post helpful? Donate my help...
Quote Reply
Re: [webmaster33] cache a page plugin In reply to
Quote:
Likely would be fine to have some robots feature, like reading, understanding robots.txt and also reading the META tags.
In case you implement the automatic caching with a cron job, like Yogi suggested, then plugin should understand the <meta NAME="revisit-after" CONTENT="x days"> to schedule next caching date...


This is what I have thought about. Parsing robots.txt and response header can help decide if a url should be cached based on robots.txt and the last modify date.

One thing I have not decided is whether I should use the LWP module which is not a standard module (am I right?) or just use GT's module.

Long
Quote Reply
Re: [long327] cache a page plugin In reply to
If it is for a GT product it is always best to use their own modules, so you'll need to use GT::WWW or GT::URI
Quote Reply
Re: [long327] cache a page plugin In reply to
Alex said both does the same, fetching a webpage. I think, you can safely use any of them.
As I know LWP is PP (Pure Perl), so installing in a breeze, just create the dirs, and copy the module files into.

As for the features, I don't know which is better, which knows more, since I did not use them actively.
LWP is an old product, we may say it's a well known mature product.

But Paul, will inform you about both product in details, I'm sure Wink.
EDIT: Hehe, Paul is fast & clever as always. He did precede me, even if the question was not addressed to him. Laugh

Best regards,
Webmaster33


Paid Support
from Webmaster33. Expert in Perl programming & Gossamer Threads applications. (click here for prices)
Webmaster33's products (upd.2004.09.26) | Private message | Contact me | Was my post helpful? Donate my help...

Last edited by:

webmaster33: Jan 9, 2003, 9:05 AM
Quote Reply
Re: [webmaster33] cache a page plugin In reply to
Quote:
Alex said both does the same, fetching a webpage. I think, you can safely use any of them.

It is always better to use a GT module if it has the functionality needed as it is guaranteed to be installed. Using a third party module is not a good idea as then you have to worry about compatability with upgrades and also some people may not have it installed or may not be able to install it.
Quote Reply
Re: [Paul] cache a page plugin In reply to
Quote:
Using a third party module is not a good idea as then you have to worry about compatability with upgrades and also some people may not have it installed or may not be able to install it.
Well, we are talking about plugin developers. Plugin system have the feature to install/copy files, so if a plugin requires an LWP file to use, then the plugin writer can create the directory & copy the required LWP module file from the plugin tar archive (can do this in install.pm).
And I do not think, that if somebody uses LWP, will have to worry about compatability with upgrades...
LWP is Pure Perl module, and can safely copied to your directory, as well the GT::WWW module (of course we also need the dependencies).

Additionally GT::WWW is released in Links SQL 2.1.2. I would not say it's more mature than LWP Wink.

Anyway, I always trusted GT modules. But I would not say, that GT:WWW or LWP is better.
If somebody decides, to use GT:WWW, just do it.
If somebody decides to use LWP, just do it.
IMHO Both are good.

Best regards,
Webmaster33


Paid Support
from Webmaster33. Expert in Perl programming & Gossamer Threads applications. (click here for prices)
Webmaster33's products (upd.2004.09.26) | Private message | Contact me | Was my post helpful? Donate my help...
Quote Reply
Re: [webmaster33] cache a page plugin In reply to
Hmm maybe I'm not making the point clear :) ....I'm not suggesting that one is better than the other, I'm talking in terms of availability and compatibility. Sure LWP would work, but GT::WWW or GT::URI has been created by the authors of the script and so is a much better option. It is tailored to the product in question, plus that means that as a plugin developer there will be greater support for the module.

As a plugin developer I'd certainly want to use GT::WWW over LWP. I don't want to distribute a plugin that has a chance of failing on quite a few servers because LWP isn't installed.

Trust me :)

Last edited by:

Paul: Jan 9, 2003, 11:08 AM
Quote Reply
Re: [Paul] cache a page plugin In reply to
As I wrote LWP install is a simple copy and the plugin developer has to do in the install.pm.
Anyway, we went away from the subject. I think we told our opinion, so stopping here.

Best regards,
Webmaster33


Paid Support
from Webmaster33. Expert in Perl programming & Gossamer Threads applications. (click here for prices)
Webmaster33's products (upd.2004.09.26) | Private message | Contact me | Was my post helpful? Donate my help...

Last edited by:

webmaster33: Jan 9, 2003, 11:21 AM
Quote Reply
Re: [webmaster33] cache a page plugin In reply to
Quote:
As I wrote LWP install is a simple copy and the plugin developer has to do in the install.pm.

I know you don't like taking advice from me, so maybe Yogi or Alex can explain to you :)

It's not as simple as perhaps you think. There are a lot of dependencies for the LWP modules.
Quote Reply
Re: [webmaster33] cache a page plugin In reply to
Helping Paul here, as requested.

I would also strongly advise to use GT::WWW for a Links SQL plugin, because not all people might have LWP. And even if it might be easy to install, I don't see the reason for bundling LWP with the plugin when you have a very capable GT module that can do just the same.

Ivan
-----
Iyengar Yoga Resources / GT Plugins