Gossamer Forum
Quote Reply
Indexes
My links sql database holds html in a field. I only want to index for the search results, and not the html markup. I could get around this by creating a new field, and insert into that a text only version of the field. However, this will double the size of the database - a very large size increase.

Is there a way to block certain words or block <tags> from being indexed in linksql?
Quote Reply
Re: [Donald Rumsfeld] Indexes In reply to
If you're willing to a little bit of hacking, you can take a look at the admin/GT/SQL/Search/... directories for the modules that will handle the indexing. If you are using the internal indexing method, in the file admin/GT/SQL/Search/Base/Common.pm, the function tokenize will be of interest. (throw in some code that will strip out the html before tokenizing in there)

Otherwise, a quick but not perfect fix might be to put all the tag words into the stop word list. At the top of the admin/GT/SQL/Search/Base/Common.pm file, you'll find the $STOPWORDS array ref and put stuff like href, style, etc into it. You don't have to worry about words smaller than 3 as they are automatically ignored by the indexer.
Quote Reply
Re: [Aki] Indexes In reply to
Thanks for the helpful and quick response Aki. Very usefulSmile
Quote Reply
Re: [Donald Rumsfeld] Indexes In reply to
Wow.... I feel so privileged, a "thank you" from the secretary of defense. Cool

Good luck with either option
Aki
Quote Reply
Re: [Aki] Indexes In reply to
lol.

BTW I've probably just run the biggest test of the indexing system it's likely to get for a while (lots of text, lots of records) and it still performs really well. I wasn't expecting that, so congrats.

I was expecting to have to go and buy something else, but it looks as though links does the job extremely well.

Last edited by:

Donald Rumsfeld: Jul 12, 2002, 12:05 AM