Gossamer Forum
Home : Products : Gossamer Links : Discussions :

when to rebuild or reindex

(Page 1 of 2)
> >
Quote Reply
when to rebuild or reindex
Hi,

this is a bit of a general maintenance/newbie question I guess but I seem to be spending hours doing rebuilds and reindexes.

Is there a golden rule to follow for doing this as not really knowing any different I do both every time I make a change as I can never be sure which infuences which.

i.e do you only need to rebuild whenever you change the content of the database and reindex whenever you change the settings.

I think I am rebuilding too much but I am unclear if I change search settings if I need to rebuild the structure to take changes into account... or is rebuilding the structure of the data the reindexing bit.

My problems are that my searches do not do what I expect them to do and the results are quite poor even though the data exists so I am continually doing both just to be on the safe side - cutting one out of the equation would save me a few hours a day waiting for it to rebuild or reindex.

Cheers
KevM
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
Hi,

You only need to reindex if you change something in the database externally (i.e. in mysqlman) or if you change the search driver or other specific search settings. If you change everything through the links admin then you should never need to reindex.

You only need to rebuild if you change the contents of the database so that the changes can appear on your site.

What sort of indexing are you using to not get good results? I assume that you must be using internal indexing if you're having to wait a long time for the indexing? If so, have you tried changing the weights of the fields?

Laura.
The UK High Street
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
Hello Laura,

I've changed the weights countless number of times but I think the problem is a bit more fundamental than this in that it is the way LinksSQl actually works.

For example in my category essex there are at least 2 hairdressers. If i go to category essex and search for hairdresser I get 2 results (obviously hair dresser or hair on its own gives me different results)

BUT
if I search from the top of the tree for hairdresser essex I get no results. Clearly this is wrong and a user would expect soemthing back, there are minimum 2 results so without knowing any different a user will think the search is a waste of time.

Which comes back to a couple of things, possibly it is case sensitive or it is not including the category name in the search but just looking at the links for the string 'hairdresser essex' .

This could do with being improved to include the category names as part of the search and being intelligent enough to tie up the two together.

I am also getting stuck with using 'and, or, 'or' . Or is too wishy washy and 'and' is too harsh, a third search option would be beneficial 'and' or 'or' or 'and/or' would be better, at least it would return some results if the 'and' bombed out.

I have spent hours on this trying to work it out but it eludes me which is why I asked about the rebuild/reindex stuff because 3 times a day for both is quite a slog.

I'll keep searching.

Cheers
KevM
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
Hi,

I totally agree that the search would be vastly improved if it could return first exact phrase matches and then 'and' matches and then or matches.

The problem that you've outlined is exactly what you guessed - when you search for a link it only searches in the links table - it doesn't include the name of the category that the link is in. I think that the easiest way to fix this might be to write a short perl script that looks up the category names that a link is in and inserts them into a keyword field in your table. I think this should be relatively simple to write and you could have it as a cron job which updates each night.

Unless anyone has any better ideas?

Laura.
The UK High Street
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
You could try the attached file.

You'll need to change the path/to/admin twice and FIELDNAME to the name of the field in your Links table - this should be a new field.

Back up your Links table before using as I haven't tested it.

Laura.
The UK High Street
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
on the subject of "rebuild"

my site has about 3200 links - all using "detailed" pages

after a several new links have been added - I do a "build changed"

we are getting about 75 new links per day - so to keep the site continually up to date I do a "build changed" about every 20 mins...

I notice that occasionally - when checking the site - linkssql appears to be mixing up some of my links-detailed pages

So that for instance if i check the link ID 3409 - the link on the category page appears OK - when I look at the detailed page - it has some othe "link details" ????

At present - when i see this - i quickly do a "build staggered"

any ideas?
Colin Thompson
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
Hello Laura,

I'm a little bit slow with the extra links bit so I need to ask a few questions about the attached.
Firstly what exactly will it do, I'm sort of guessing it will add category keywords to links?
Do I just execute this from my browser after uploading to links sql cgi-bin/search/admin

I'm not very hot with code and get out of my depth pretty quickly Unsure

Cheers
KevM
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
It should add the 'Name' field from the Category table into a column in the Links table. If you want the values from a different field in the Category table just change 'Name' to the name of the field you want.

You do need to add an extra field to your Links table before you can use it - and change FIELDNAME in the file to the name of this field. And the two path/to/admin's need changing.

Then upload the file and chmod to 755. Then you should be able to run it from your browser.
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
Hi Laura,

I've hit some walls with this.

Internal server error so i changed it to 777 but then permissions error.

Created a new links table called Name
changed catnames.cgi FIELDNAME to Name
added path from Links SQL setup menu so that is ok.

Uploaded to root and also cgi-bin/search/admin/

and called from browser but it doesn't like it.

So am i right in assuming that it will make the categorys part of the search table.

I'll push it a bit harder to try and figure it out, internal server errors normally have a simple explanation.

many thanks again,

Cheers
KevM
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
Hi, sorry - wrote this in a bit of a rush. Can you change the line


my ($sth);

to

my ($sth,$sth2);

and remove the $links_sth in this line

$cat_tb->update({FIELDNAME=>$fieldfiller},{ID=>$LinkID});
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
>> It should add the 'Name' field from the Category table into a column in the Links table.
>> If you want the values from a different field in the Category table just change 'Name' to the name of the field you want.

My skin is crawling, and I'm breaking out in hives...

This violates the rules of data normalization, and also breaks the "logic" in links.

If links, categories, etc are not coming up the way you want, you edit/change the search logic, the keywords in a link, or something else *NOT* the database structure.

These sort of changes will come back to haunt you at some time, even if you can't imagine where or when.

I'm not sure the problem here, with the "hairdressers essex", but if adding a city/region/local to a link listing will fix that, then _THAT_ is the change you need to make. That still works with the rules of data normalization, and allows semi-automated re-linking.

In the search, you need to understand how the logic works, before "complaining" it's not working ;)

The target data is a "link" or "node", "nugget", "object" etc of information.

Categories are simply classification folders, for browsing, and have _no_ effect on the "link" itself, which is an independent data unit.

The suggestion above tries to link the category and link as a dependent data item. This is not how Links SQL works, and goes against the whole CatLinks concept of ONE data nugget, multiple categories/browsable means to get to it -- BUT only one data nugget to search/maintain.

When links searches, it applies the search term(s) to the categories -- which will turn up BROAD potential matches, then will apply them to the Link, which turns up more specific matches.

If you look at the search tables, you'll see how the words are ranked and associated with links.

Before mucking with table structures, database structures, and such, figure out *WHAT* data you need to track, and how it has to be related for both searching and browsing.

Then, see if adding an INDEPENDENT field to a table will solve the problem. Such as keywords (this is a quasi-violation, but is remedied by the use of the Search index), city, country, zip code, color, etc.

I hope this makes some sense, and I can list some basic links for data normalization, or you can search on it.

While it seems esoteric, it has _real_ implications and impact on growing databases that plan on becoming larger, or containing more searchable/indexable data.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
Please don't apologise, I appreciate what you are doing to help.

Made the changes and it's set off (had to run via telnet as admin panel timed out)

Almost a full loop to the start of this thread but will I need to reindex or rebuild once this has done its stuff?

I'm thinking a reindex might be on the cards ?

many thanks Laura

Kevin

Cheers
KevM
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
No - the file uses links methods to update the table so you *shouldn't* need to reindex - I use a lot of files like this for updating the products on my site and they all seem to appear in the search results ok without reindexing.

Edit: Actually, you will need to add a weight to this new field so that it is included in the search results and I think this might need a reindex - sorry.

Last edited by:

afinlr: Apr 8, 2004, 9:54 AM
Quote Reply
Re: [pugdog] when to rebuild or reindex In reply to
In Reply To:
These sort of changes will come back to haunt you at some time, even if you can't imagine where or when[/quote]
Not sure whether you're understanding what Kev is trying to do - just adding some extra keywords to the Links table. Not changing the underlying structure of the databases or anything drastic like that. He just wants links to appear when searching for the names of the categories that they are in. I can't see anything wrong with this. This is an INDEPENDENT field as you put it.

Unless I'm misunderstanding what you're saying?

Laura.
The UK High Street
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
Well I've done it now, read PugDogs post after it was underway so I don't know whether I have done a good thing or a bad one.

Is this reversible if it's a bad thing ?

Do I need to reindex or rebuild, this is a new change to links and/or data so I am a bit unsure.

Cheers
KevM
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
I know am a bit of a novice but in some ways that helps as I approach the result output as a general Joe Public. If this actually works (and I am reindexing as we speak so I don't know yet) I think this is a excellent thing to add.

Even though you can drill down to categories a lot of people don't, they just use the search box on the first page. This in itself I think is a haunting prospect as it isn't much use for building up traffic if users just get no results found all the time and it is unlikely they will stop to ask why, they'll just move on to the next search facility.

This all came about while I was testing on real people, one of the people tapped in hairdresser essex and it said no result but there are results and so my questions went from there. I wouldn't have stopped to try that side of things so in some ways I'm glad I did.

As I said I don't know if or how it will work and there may be a case sensitive problem (london London or Essex essex) but that is the next step to tackle.

Have a good easter everyone

Cheers
KevM
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
yes, I understand, but I was not commenting so much on this one particular instance, but rather on anyone trying to generalize on it even a little for another project -- eg: using this logic to modify or update or change the behaviour of Links SQL. This was a quick and dirty fix for one specific problem, not a generalized one applicable to other similar situations.

For this specific project, what would have been "better" to do, would be to add a field "Keywords" to the Links record, then "seed" them with the category name, parsed out so that the "/" were turned into ", " or simply " "

Give this keyword field a low priority so it's indexed, but not going to screw up the search results too much by skewing them.

*IF* only the last/leaf category was important, then just adding *that* as a single keyword would be _much_ better as it will not only save massive amounts of index space, but also be much more targeted, than if a long pathname was included, and was the same for 4000 links.

In this case, what you would do is use a "split on /" and seed the Keyword field with the last item in the splitted array

Firstly, the field would have a more useful name. Secondly, it would not violate the rules of normalicy too greatly. Thirdly it would vastly improve the accurcy of the searches to eliminate virtually useless path data from the keyword fields. and Forthly it would consume far less diskspace. And minor Fifthly, the data in the field would appear to be "useful" to a human viewing and trying to understand the database and/or improve the ranking of a link.

Just some observations and suggestions.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] when to rebuild or reindex In reply to
Unsureoh dear, it's well past my time to go and it's just bombed out in the reindex

loads of erros but it starts
Can't call method "fetchrow_arrayref" on an undefined value at /home/ukbis/cgi-bin/search/admin/GT/SQL/Search/INTERNAL/Indexer.pm line 244.

Maybe I should take this mod out and leave it alone, I'm well over my head with this.

Laura, can this be removed?

I'll check in again tomorrow

rgds
KevM

Cheers
KevM
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
If you have telnet access to your server, and assuming it's a Unix box, you can set up a cron job to automatically do a rebuild as often as you want. I have done this, to automatically rebuild every 30 minutes. And I have another job which, every night, backs up my SQL database and all Links SQL files into a single .zip file, the name of which automatically includes the date and time. Thus I have everything I need to recreate the system in the event of a crash.

Rob
Quote Reply
Re: [pugdog] when to rebuild or reindex In reply to
Hi Pugdog,

In Reply To:
For this specific project, what would have been "better" to do, would be to add a field "Keywords" to the Links record, then "seed" them with the category name, parsed out so that the "/" were turned into ", " or simply " "

Give this keyword field a low priority so it's indexed, but not going to screw up the search results too much by skewing them.[/quote]

I think that's exactly what I was trying to do - although I suggested just trying 'Name' rather than 'Full_Name' so there wasn't a problem with the "/" but this might end up not being enough information.


In Reply To:

*IF* only the last/leaf category was important, then just adding *that* as a single keyword would be _much_ better as it will not only save massive amounts of index space, but also be much more targeted, than if a long pathname was included, and was the same for 4000 links.

In this case, what you would do is use a "split on /" and seed the Keyword field with the last item in the splitted array[/quote]

Isn't this just the 'Name' field?

In Reply To:

Firstly, the field would have a more useful name.[/quote]

I completely left the name of the field up to Kev as I didn't know whether he already had a keyword field.

In Reply To:

Secondly, it would not violate the rules of normalicy too greatly. Thirdly it would vastly improve the accurcy of the searches to eliminate virtually useless path data from the keyword fields. and Forthly it would consume far less diskspace. And minor Fifthly, the data in the field would appear to be "useful" to a human viewing and trying to understand the database and/or improve the ranking of a link.[/quote]

Still not understanding what it is that you're suggesting that is different from what I was suggesting Wink (Although from Kev's post above - what I was intending and what I programmed might not have been the same thing!)

Laura.
The UK High Street
Quote Reply
Re: [KevM] when to rebuild or reindex In reply to
I've sent you a private message. This mod should only add some text to a field in the database - so to undo it you could just delete the field - but I really can't see why this would affect the indexing.
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
Laura,

I've been wanting to do this for some time, thanks very much for posting about this.

I tried running your script, and it processed, but didn't update the new Keywords field I created in the Links table. I wonder if this line is the problem?

$cat_tb->update({FIELDNAME=>$fieldfiller},{ID=>$LinkID});

Should that be:

$table->update({FIELDNAME=>$fieldfiller},{ID=>$LinkID});

Also, would this work if a link was in three categories? For example if the names of the three categories were:

Category_One
Category_Two
Category_Three

would it loop through and overwrite the Keywords field until it just got the last value:

Category_Three

Or, is there a way that it could add all three category names separated by a comma and space like:

Category_One, Category_Two, Category_Three

--FrankM
Quote Reply
Re: [FrankM] when to rebuild or reindex In reply to
In Reply To:
$cat_tb->update({FIELDNAME=>$fieldfiller},{ID=>$LinkID});

Should that be:

$table->update({FIELDNAME=>$fieldfiller},{ID=>$LinkID});[/quote]

Oops Blush - thanks for that.

In Reply To:

Also, would this work if a link was in three categories? For example if the names of the three categories were:

Category_One
Category_Two
Category_Three

would it loop through and overwrite the Keywords field until it just got the last value:

Category_Three

Or, is there a way that it could add all three category names separated by a comma and space like:

Category_One, Category_Two, Category_Three[/quote]
Yes it should add all of them. At the moment in a long string! I was just about to go out when I originally wrote this which is why there are so many errors - next time I'll wait till I get back!

Just change

$fieldfiller.="$name";

to

$fieldfiller.="$name, ";

to separate the names with commas.
Quote Reply
Re: [afinlr] when to rebuild or reindex In reply to
Thanks Laura! I gave that a try and it worked perfectly. I don't know a lot about perl, but one worry that I had was about:

$table->update({FIELDNAME=>$fieldfiller},{ID=>$LinkID});

Does this mean that the Link ID number in the Links table is getting updated each time this script is run? Somehow, that makes me a little nervous. Or, am I misunderstanding (most likely), and the FIELDNAME field is getting updated where the LinkID is equal to something else?

I'm just a little worried about a case where the Link ID number could somehow get changed which would then change the path to the static detailed page.

Probably, I'm just misunderstanding though. I really appreciate that you took the time to post this.

--Frank
Quote Reply
Re: [FrankM] when to rebuild or reindex In reply to
$table->update({FIELDNAME=>$fieldfiller},{ID=>$LinkID});

This line means: look in the $table and find the row where ID has the value $LinkID and change the entry in the column FIELDNAME to the value of $fieldfiller.

It will definitely not change the value of ID.
> >