Gossamer Forum
Home : Products : Gossamer Links : Development, Plugins and Globals :

thumbshots in dmoz listing?

Quote Reply
thumbshots in dmoz listing?
Hi there...

Just creating a new site based on Links SQL. The site is basically a DMOZ mirror...

Is there a way to incorporate thumbnail images from http://www.thumbshots.org/ into Links easily? All our pages are statically built...

Thanks,

Paul
Quote Reply
Re: [pauls] thumbshots in dmoz listing? In reply to
Whoops.. should have stated that from thumbshots you can download their data files... this is what I want to incorporate.... the non-remote version of their service... want it all stored locally..

Thanks!

In Reply To:
Hi there...

Just creating a new site based on Links SQL. The site is basically a DMOZ mirror...

Is there a way to incorporate thumbnail images from http://www.thumbshots.org/ into Links easily? All our pages are statically built...

Thanks,

Paul
Quote Reply
Re: [pauls] thumbshots in dmoz listing? In reply to
You would need a custom script written, which would grab the image data, and update it as appropriate. I'm downloading their latest file now, to see how it works. I'll let you know what I find :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [pauls] thumbshots in dmoz listing? In reply to
BTW, I'm assuming you have read this bit on their site?

Quote:
Requirements

Free disk space: 22 GB
Choice of database or folder based storage.

?

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [pauls] thumbshots in dmoz listing? In reply to
Just so you know, I'm working on a script/plugin to import the data from here. Here is a sample image (attached).

So far, I have it doing the following;

1) Download the .gz file from thubmnails.org.
2) Decompress it (gzip -d file.gz)
3) Run the script ... i.e perl thumb_import.cgi --file=TS14Jan04.11

I am now working on a way to 'split' the images up (most servers won't like having more than a few hundred images in one category, so I'm trying to make a way where it will be put in folders, i.e. /1/, /2/, /3/ etc.

I'm going to be offering this as a pay-for plugin. The price will be around $80 (worth it IMO). People who own the $200 package will be entitled to a free copy :)

I'll keep you updated :)

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] thumbshots in dmoz listing? In reply to
I now have it working so that it only puts 500 images in each directory :)

perl thumb_import.cgi --file=TS14Jan04.11 --image_folder=/home/domains/linkssql.net/www/test_thumbs

This would make the structure as follows;

/home/domains/linkssql.net/www/test_thumbs/1/ (first 500 images)
/home/domains/linkssql.net/www/test_thumbs/2/ (next 500 images)
/home/domains/linkssql.net/www/test_thumbs/3/ (next 500 images)

..etc.

I'm now working on adding a new field in lsql_Links, so that the global (which will show the end image), has somewhere to work out where the image is located :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] thumbshots in dmoz listing? In reply to
Hey Andy... this is great!! Keep me posted on the progress of the plugin... you'll have a new customer in no time! :)

BTW, I read the disk space usage requirements etc... no problem... the server that willl be housing this has about 240 Gig free right now (RAID5)..

Take care,

Paul
Quote Reply
Re: [pauls] thumbshots in dmoz listing? In reply to
Hi. The plugin has now been completed, and can be brought from here: http://www.ultranerds.com/...t=new_design&d=1 (its at the top of the listings).

I'll be putting a demo online soon, but I need to get a suitable one setup (not too big, but not too small).

The members are on our site isn't quite working correctly yet, but if you would like to buy this plugin, please order it through the site, and I'll email it over to you :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [pauls] thumbshots in dmoz listing? In reply to
I've added a couple more features.

It splits the file up into about 700 smaller files, and for each file, it shows you something like;

Quote:
Working on file: thumbs_2
Skipping http://www.culturanuova.net/filosofia/kierkegaard.php. Reason: No content for image...
Skipping http://www.ncbe.gwu.edu/miscpubs/ncrcdsll/rr6/index.htm. Reason: No content for image...
Skipping http://es.geocities.com/cpherrero/vindex.htm. Reason: No content for image...
Skipping http://www.essen-bei.de/supermarkt. Reason: No content for image...
Skipping http://www.indianmotorcycles.com. Reason: No content for image...
Skipping http://www.osinga.com/cairo. Reason: No content for image...
Skipping http://www.trait-personnel.com. Reason: No content for image...
Skipping http://www.ams.usda.gov/lsg/mpb/soy/SoybeanAct.pdf. Reason: No content for image...
Skipping http://www.arpnet.it/~gbruno/. Reason: No content for image...
Skipping http://www.vmm.bc.ca/home.htm. Reason: No content for image...
Skipping http://autismo.yadahost.com/. Reason: No content for image...
Skipping http://italiaans.nu. Reason: No content for image...
Skipping http://pub9.ezboard.com/bsaduk. Reason: No content for image...
Skipping http://www.conti.com.mk/. Reason: No content for image...
Skipping http://www.encode.com/skyart/. Reason: No content for image...
Skipping http://www.gentofte.skorstensfejeren.dk/. Reason: No content for image...
Skipping http://www.ju-nrw.de/minden-luebbecke/ju-milk/start.htm. Reason: No content for image...
Skipping http://www.mosaicoes.it/. Reason: No content for image...
Skipping http://www.panoramas.cl/. Reason: No content for image...
Skipping http://www.thensane.com/. Reason: No content for image...
Skipping http://www.vietgate.net/. Reason: No content for image...
Skipping http://geocities.com/webtekrocks/. Reason: No content for image...
Skipping http://iasos.com/artists/karlbang/. Reason: No content for image...
Skipping http://www.decoralia.com. Reason: No content for image...
DONE!!! Added: 59 Skipped: 441

Note at the bottom, it tells you how many of the links in that file were added. The 'added' part is how many links were updated to give a URL to the thubmnail image. It won't update all the links, cos not every link in the database actually has a thumbnail on thumbshots.org's database.

I'll post a demo in a little bit. Its just running now :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [pauls] thumbshots in dmoz listing? In reply to
Got a demo of it here now;

http://dmoz.linkssql.net/...2Findex.html&d=1

http://dmoz.linkssql.net/...2Findex.html&d=1

http://dmoz.linkssql.net/...s/index.html&d=1

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [pauls] thumbshots in dmoz listing? In reply to
I just incorporated the dynamic linking of the images into
http://origami.net
and
http://bodyart.com

and I'm working on http://pugs.net today.

It uses a script they have, that caches the images, so it increases performance.

This is not the same as using the plugin andy has, and it works more or less in real time.

It should work on static pages, as the <img> tag is what is used to serve most banners.

I would really like to find some software that will let me generate my own screen shots, and thus incorporate it into links directly. Not all my links have screen shots, and many targeted/niche directories have missing images.

Between andy and myself, we have a lot of code and tweaks that could make for awesome directories, but I can't find anything that will grab a screen and convert to a graphic. All solutions seem to run on windows terminals, and require integration. Nothing I've found will run without an actual screen buffer, and simply write the page as a graphic, which can then be resized via net::pbm or Image::Magick.

I realize web page rendering is very tough, but there have to be ways to use the Mozilla rendering engine to write to a .bmp buffer, rather than screen buffer, and then take the file and pipe it into Image::Magick, cut it to 600x450, then resize to 120x90, or any other size.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] thumbshots in dmoz listing? In reply to
Andy and I have attacked this from both ends. Once I get my end fully working, I'll see if I can combine them.

Right now, I have a kludge of a thumbnailer program working. It only runs under windows, unfortunately, as it uses windows API calls to load MSIE and capture a screen shot from the active client area window.

If anyone can come up with a way to do the same thing under unix, using perhaps the Mozilla code, I'd be able to put together an integrated system that can mesh directly with Links SQL to maintain the thumbshots database.

The code I have runs fairly slowly, an average of only 6-10 screen shots a minute. Part of that is my connection, and part of that is windows reformatting the screen, but a big part is slow-loading ad files and pop-ups, and javascript "set your home page" boxes which have to be manually closed. If I can set up a dedicated system, running with cookies and javascript off, maybe I can speed things up.

This code just writes the screen shot out. I upload the screen shots, and make the thumbnails on the server. The file name is ID_ss.png, and I'm working on a script update now that will first check to see if there is a Thumbshots.org file, if not, see if one was created in the holding directory, and if not, display a default image or spacer.

Images will be stored/cached locally, and I'm working on a "Last_Good" type system, so if a site goes down, or disappears, you can archive the site/link with the last good screen shot.

This happens a lot with niche directories, unfortunately.

The next automation stages to go will be to do automatic dumps of the database "ID, URL" where screenshot=No. These automatic dumps can be imported, and run, hopefully, slowly filling in all the gaps in your system.

An update will also write the file name as an MD5 hash of the URL as the Thumbshots.org scripts do, so that if we have a screen shot in our database, you are missing, you can get it, even if we originally downloaded it for another directory.

The way I envision this, is if you have a niche directory which is thinly populated using the stock thumbshots.org image set, you can use our service to fill in the gaps, in real time. A script on your site will query our database, and see if the image exists. If it does, you can download it. If not, it will be added to our "fetch" queue.

For obvious reasons this will be a subscription service. Besides the incredible drain of resources an "open" system would require, the fact the system has to run under windows, and needs occaisional user intervention, means it can't be free. Windows has a habit of stopping the instant you look away. So, unless you sit there and nurse it through, you often lose entire nights/weekends of unattended operation due to one windows quirk. (Wonderful OS).

Right now, it's human-intensive, but hopefully it will be more automated. For instance, my first 400+ link run took about 4 hours, and I don't see that changing too much in the short run. 100 links/hour over all for screen shoting, editing, thumbnailing, uploading and importing is about right.

If you are interested in adding the missing shots to your directory, you need to run a "Verify Links" to get rid of all 404 and other errors. Make sure you have a screen shot field of some sort that knows whether or not you already have an image.

I have very limited capacity at the moment to do this, but I might as well use REAL data, and get real screen shots, than random ones.


select ID,URL from prefix_Links where isValidated=Yes and !screenshot

is the generate SQL query you need to issue at the SQL monitor (not via MySQLMan!) The results are properly formatted for use from the SQL Monitor page.

Also, there are some bugs in the system, and not all pages are able to be formatted and captured automatically.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] thumbshots in dmoz listing? In reply to
The http://origami.net and http://pugs.net directories are fully populated now with the thumbshots. I should have kept a before/after sites, but I didn't ... so see if you can tell which were which. Origami.net was about 60% populated by Thumbshots.org, but pugs.net was only about 30%. So 70% of the images on Pugs.net were generated by my script. 40% on Origami.net were.

There is a lot of manual intervention here, and after importing the thumbshots there was a lot of tweaking I had to do. We are working on a more automated way of getting your database into shape for imports, and I still have a lot of sites to test on, but we can use some more. If your directory is under 1000 links, preferably around 500 or less, and you have tried to use the http://thumbshots.org site and either dynamic or static linking, and find a lot of holes, we can make a deal for you. The more sites I work with, the more "errors" I'm learning to work around. Status 302's are the worst. Most are "add a /". Some are really 404's, and only a few are redirects.

Right now the script I run keeps all the images in one directory, I'm actually using the ../admin/tmp directory to hold them. This is why I need smaller directories, not large ones. The next block of code will use a hashing algorithm to store the files, and the files will be moved out of the temp directory and into their regular home -- so it will be full upgradeable. It will also "save" the Last_Good image, and try to determine if a new image is actually a 404 type problem.

The way the system works, is the URL of the site is encoded as an MD5 hash. So, Link ID's and other "unique" site identifiers are not used. It's all URL based. The image is keyed to a URL, not a "link", so the image is not stored in your Links table, or database at all, but is prepped by the template parser, then actually called/loaded by an <IMG> tag in the browser.

This creates some problems which may not be immediately obvious.

http://sitename.com and http://sitename.com/ are *NOT* the same MD5 hash, so it won't find the image.

Similarly, adding or leaving out a "www." will generate a "new" url/image. What's worse, is if the site adds a redirect.

For a small directory, you can go through all your links, and see which images didn't load, and jump to it to see why. Using two windows, one the links admin to Database->Table->Links->Modify to see what the URL in your database is, vs the URL that comes up when the link is clicked, you can "upgrade" many of the links in your database to the "new" URL. You'll be surprised how many have moved. I was. Adding a "/" will often cause the image to appear.

The good news is, this is working, and we can offer an affordable service for smaller sites looking to fill in images. It includes a scrub, and fix up, removal of links with a status code <=0 or >=400, and updating 302 links. If you do the scrubbing of your database, you save all the labor charges, which are the big deal.

We'll post instructions, helper scripts, and what you need to send us to fill in your directory. I'm going to move this discussion to http://ultranerds.com/forum since it's starting to get into a non GT business realm here.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] thumbshots in dmoz listing? In reply to
My links dir has 250 links and I would be very interested to be part of your testings.
I posted a message on your forum too.
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [maxpico] thumbshots in dmoz listing? In reply to
You can get more info on using thumbshots on your LSQL site from http://thumbshots.org/

I used the info there to implement their feed on my site. www.mychristianweb.com

I put the following in my links template
Code:

<a target="_blank" class="link" href="<%db_cgi_url%>/jump.cgi?ID=<%ID%>">
<img align="top" src="http://open.thumbshots.org/image.pxf?url=<%URL%>" border="0" onload="if (this.width>50) this.border=1" alt="Preview by Thumbshots">
</a>

CCUnet
my Christian web

Last edited by:

ccunet: Jun 18, 2004, 3:08 PM
Quote Reply
Re: [ccunet] thumbshots in dmoz listing? In reply to
I did this already but there are a lot of sites that are not in the thumbshots.org and DMOZ databases so they don't have the screenshot.
Pugdog solution is aimed just for this issue...
Max
The one with Mac OS X Server 10.4 :)
Quote Reply
Re: [maxpico] thumbshots in dmoz listing? In reply to
OK gotcha
my Christian web