Gossamer Forum
Home : Gossamer Threads Inc. : Discussion :

New SQL version

Quote Reply
New SQL version
Some years ago :-) Alex talked about the feature only build parts of links where necessary; in the last weeks there were a lot of threads about the new features, but nothing about this.
Would it be integrated? Cause we would cjange our whole cat-system we would have about 3000 cats !?
and this feature would help to minmize the thing a lot.

Another thing i have thought about:
If the new version let users do an add in multiple cats, you need the catlist with all cats; but when having about 3000 cats it is no fun to wait for such a long list - have someone an idea how to manage this without showing the whole list?
I see no good way to do it. Maybe with a list of only the main-level cats and a javascript-pop-up for the sub cats ...

Another thing is the add only in special cats; at the moment i have another field for every cat with yes/no; only when yes, the user see the add/mod-button; when the need of a list is given, i cant work with this way any longer.
Maybe there could be a way, when users add only in one cat; and if they would have it in another cat, they must enter this cat and give in an ID they get with the first entree. (I wont use the user-system at this time)
Or we use the user-system, but the user could do as editor the multiple entree like now, but before the damin has validate it. (and must validate it after.)

Robert

Quote Reply
Re: New SQL version In reply to
If the category build stats is no longer needed, and the stats are recorded to the databsae in real-time, so that the links maintain their current counts, it would be possible to have a build-changed option.

Right now, what changes?

The count on the links -- and thus the detailed page.

The category page the link is listed on.

The Top-rated, popular and any other generated pages.

So, you'd need to have jump.cgi write to a table similar to build_update, that would track which links were changed (not how many times it was changed) and then, on a partial build it takes the links, tallies up which categories have changed, and creates a "to do" list of sub-sections to rebuild.

Logic could be added to decide if the Top or Pop pages have changed, but since they are single select/write it's probably worth it to update them.

Most time is taken writing out the detailed pages if they haven't changed.

Next is the category pages.

Most likely, on even fairly active sites, most of the category pages will change between builds, so deciding which have changed and which haven't becomes a trade off in processing time vs writing time.

_MOST_ speed gains would be from deciding which detailed pages have changed, and rebuilding only those. That would be fairly easy.

Complex sites might also benefit from selectively rebuilding the category pages, but that will be a much smaller speed gain.

For instance, I could rebuild the site without detailed pages in about 15-20 minutes. With detailed pages, it took over 3 hours.

Now, some things that _could_ be done:

1) GT has threaded modules used for the spider and the link verification. That logic could be used to spawn jobs to do sequential or parallel builds.

2) Categories could be built in parts from the top down -- starting with each main category, and if you have a big site, 2nd level categories.

3) This would have to be called from the browser, the way it was done in links 2.0, in order to get around the time-limit/cpu-limit many ISP's impose. Each succesive step would have to be a new process call, not a child process. It could also be fired from a cron job, but this is more tricky, and would need to be done at off hours.


Anyway......

Digging into the logic, it _can_ be done, but it's probably part of the next release, or it could be easily incorporated -- although probably complex to program.

My growing concern is that I've gotten so familiar with the parts of Links 1.11 that I'm using, what sort of learning curve will there be to adjust to the next release?

http://www.postcards.com
FAQ: http://www.postcards.com/FAQ/LinkSQL/

Quote Reply
Re: New SQL version In reply to
in v1.1 i made a build changed version of nph-build.. it used Mod_Date and a Last_Built field to determine whether or not it had changed or not.. and basically the select statement weeded out only the ones needed..

i did the same for categories.. but i had to do the processing in PERL..

the only thing i found not very good about it was if you added categories or something or changed the design or have a few mods that change data like review numbers or ratings.. then the page will be old.. so i still had to use nph-build.cgi every once in awhile..

there was a big difference though..
goin to nph-build.cgi would take somewhere around 50 minutes..

the first of nph-changed.cgi didn't build any detailed or category pages.. because they all were already built

so i edited a few links and then built again and only those links.. the category and subcategories were built..

took 5 seconds

so i waited about 5 days.. let things get modified and changed like usual and didn't build.. then built with nph-changed..

only 24 detailed pages built.. and a total of 63 category and subcategories built..

once again in under a minute..

after about a month of that.. i noticed the detailed pages of links not modified had static rating information..

the only thing i could think of to fix that problem is adding another field to the links database and updating it with the date every single time the link is changed at all.. mod_date is when it is updated by the user.. so basically.. in nph-build.cgi when the ratings are added to the links database.. it would need to update the new field..



Jerry Su
Quote Reply
Re: New SQL version In reply to
Jerry,

Wouldn't adding a TIMESTAMP field to the Links table solve that?

That way, you'd know any time a Link was "newer" than any given date. "reads" of a link don't update the time stamp, but "writes" do.

By scanning the Links, and tallying the categories, you'd know which categories had been modifed (via a link modification) by selecting from the stats table a Last_changed date later than the last update.

Logically, this isn't the real problem.

The problem is tracking what needs to be updated, and deciding how "lax" you can be with the decision to build.

On deep trees, with few changes, you could save more time with a careful "plan" before building, selecting only the proper subcategories to update, and then updating the main category page (for the sub-cat stats) up the tree (or not doing that if the # of links in each category is not shown).

On wide trees, you could save more time by determining which pages of which category needed to be rebuilt, and rebuilding only those pages rather than the entire category.

On an average site, rebuilding the categories is not the main thing, and rebuilding them keeps the site "fresh" to the search engines, and to visitors. You save a lot of time with the detailed pages, since the 80/20 rule probably applies --- 20% of the links get 80% of the traffic, and the rest get the rest.

It would be interesting to add a statistic to the build screen for the # of links changed since the last rebuild, % changed, and get some traffic stats that way.


http://www.postcards.com
FAQ: http://www.postcards.com/FAQ/LinkSQL/