Gossamer Forum
Home : Products : Gossamer Links : Discussions :

Spanning category pages, duplicate links?

Quote Reply
Spanning category pages, duplicate links?
I have been using the following for build_sort_order_category:

Priority DESC,isNew DESC,RAND()

There is a problem in that the spanned pages contain the correct number of links, but the random links (i.e. not priority or new) can appear twice or more on the subsequent spanning pages.

Is anyone else using a similar build sort order without duplicates appearing? The priority and new links display perfectly out of interest.
Quote Reply
Re: [aus_dave] Spanning category pages, duplicate links? In reply to
This is definitely a problem if you are displaying pages dynamically - each page is build separately so the script can't know what is on the other pages. I think it should work if you are using static pages though as all pages are built at the same time.

Laura.
Quote Reply
Re: [afinlr] Spanning category pages, duplicate links? In reply to
>>>I think it should work if you are using static pages though as all pages are built at the same time. <<<

Yeah, it does work ok in static. The structure for the build is something like;

-> grab links in order for specific category
-> go through the links, and order them etc
-> build pages, and sub-pages.

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Spanning category pages, duplicate links? In reply to
Thanks Laura and Andy. I do use static pages but there is definitely a problem. I found an old post where people were having the same problems with spanned pages and RAND, I will post the link when I find it again.
Quote Reply
Re: [aus_dave] Spanning category pages, duplicate links? In reply to
This thread mentioned similar problems, seemed to be related to the MySQL version (I'm running 4.0.x).

http://www.gossamer-threads.com/...orum.cgi?post=129557
Quote Reply
Re: [aus_dave] Spanning category pages, duplicate links? In reply to
Unfortunately, it looks from that discussion that you can't do it this way without creating duplicates whatever version of MySQL you have. Maybe you could follow the advice in that discussion which describes a different way to do it?
Quote Reply
Re: [afinlr] Spanning category pages, duplicate links? In reply to
Thanks Laura, I might have a go at those changes. Due to the age of the thread I thought that problem may have been worked out in more recent versions of Links SQL.

Might just be easier to sort the links alphabetically after priority/new.
Quote Reply
Re: [aus_dave] Spanning category pages, duplicate links? In reply to
Hi,

Think about how RAND() works, and how the spanning page mechanism works.

The only way to do this, would be to create caches of the entire search, and feed pages from the cache, or to create a cache of the ID numbers of the search, and then select those ID/Links based on the mh and nh and page values of the spanning system.

Temporary tables, keyed to the session ID would be the way to go, most likely. The other way would be to create hash-key system in the tables for the Search/session ID and the page/mh/nh data. If the search returned 322 results with a page size of 25, there would be 13 entries in the hash-key system, each with the search-term/session-id/page as keys, and the list of random ID's as the value those keys point to.

search_term_session_id_page_no=>[ID1, ID2, ID3, ID4....ID25]

Look up would be fairly fast, and you'd select the links based on being 'IN' the returned list.

Or, the table could be

Search_Term, session_ID, Page_Number, ID-List, timestamp

your index would be the first 3 fields, and you'd delete the records based on the timestamp on some schedule or trigger.

Actually, the session ID would not be needed in the above-->

Search_Term/Page_Size/Page_Number/ID-List/Timestamp

_ANYONE_ doing a search of the database on that search term with the same page size would hit the cache rather than the actual database until the cache is expunged after x-minutes/hours.

Just some ideas.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Spanning category pages, duplicate links? In reply to
Pugdog, my searches are sorted by Priority and Score and they are working fine. My problem is with the category builds. Is your advice above only relevant to search results?
Quote Reply
Re: [aus_dave] Spanning category pages, duplicate links? In reply to
I think if you use static pages, the category builds are done in a single loop, or in groups of 500 or less (it's been awhile since I looked at that code).

If you use the search system, each span page call, forward or back, re-calls the search routine to find the next page. This is an issue that was first addressed (pointed out) in the search logger utility. Each {next} or {back} call triggered what amounted to a new search.

If you use a sort order of RAND(), then each page will randomly sort the links, and the odds of duplicate links, and links that don't show up at all is greatly increased.

The only way around this is to slurp _all_ the search results into a temporary holding area. To save space, you can slurp only the LinkID's into an array.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Spanning category pages, duplicate links? In reply to
I appreciate all your replies Smile.

I have run some tests using RAND() on its own for build_sort_order_category. I have my directory set to span pages every 20 links per category.

Using RAND() on its own still produces duplicate URLs on subsequent pages in categories. I am convinced that the RAND() function is next to useless for category builds!

However, if anyone is using RAND() successfully in categories i would love to be proved wrong.
Quote Reply
Re: [aus_dave] Spanning category pages, duplicate links? In reply to
Ok,

If that is happening, then it's building categories the same way.

In early versions, the *entire* search was slurped into memory/hash and on very larger, and shared systems, it was causing problems. Apparantly all the logic was rewritten to do this:

Perform the search
Return the requested number of links mh, nh, page
On next/back perform the search again using mh,nh,page parameters

the way this works is by passing in a LIMIT command, with LIMIT rows offset

page*nh determines the offset, mh determins the rows

This prevents any query from returning 20,000 rows and clogging memory.

*BUT* the downside is that each new page view is a separate select command.

What that means, is that a new RAND() command is issued in the GROUP BY / ORDER BY

So, every time you call a back/next for another page, you are passing in the correct mh, nh, and page parameters, but the RAND() function yanks up a random set of links. It would be the same concept as simply calling the first page, with RAND() order each time.

This is why if you wan to traverse a random order of links, you *must* save/cache in some manner the whole query, then work from that cache. Each RAND() call will return each link only once, but every time you call it, you start over.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Spanning category pages, duplicate links? In reply to
Thanks pugdog, that is a very informative and in-depth post.

From my perspective it looks very difficult to use RAND() on spanned pages, and building the remaining links alphabetically will be far simpler and a more efficient use of my time.

Hopefully this long and drawn out thread will help someone else in future Wink.

Last edited by:

aus_dave: Dec 22, 2003, 5:53 AM