Gossamer Forum
Home : Products : Gossamer Links : Discussions :

Search not finding C# or C++

Quote Reply
Search not finding C# or C++
Just wondered if I've missed something. My directory contains entries in the title and description fields for programming languages like C++ and C#. However, if I try to search on these terms nothing is returned. Can't Links SQL cope with these terms????

TIA for any guidance

Smokey
Quote Reply
Re: [biglouis] Search not finding C# or C++ In reply to
It looks as though a lot of queries with special characters are not working correctly, because the special characters are ignored (either in the indexes, or the search code). For example, simply doing a search for microsoft.com, doesn't seem to yield the proper result (it seems to search for "microsoft" and "com" and ignore the "."). If the code (index and search) could be altered to allow special characters within search terms, and yet still ignore them if they are by themselves (ex: Fishing & Boats), it would clear up a lot of problems with the search. I would consider it a BUG to not be able to search for domain names or words with special characters such as C++. However, I know that some special characters are used for boolean searching, but the search engine needs to determine the positioning of the character. For example, +plus would be considered a boolean search because the "+" is the first character of the term, yet plus+ would be considered a literal term and should be searched as "plus+". I hope this makes sense.

Sean
Quote Reply
Search not finding C# or C++ In reply to
Hopefully a staff member can provide a fix for this. This is messing up some of my searches.
Quote Reply
Re: [biglouis] Search not finding C# or C++ In reply to
I just received an email from Alex with a possible fix. First, by default the index (if running INTERNAL), only indexes words that are 3 characters or more. To change this (like I have), so that it will index C# and C+ properly:

In your admin dir under GT/SQL/Search/Base/Indexer.pm, change:
Code:
'min_word_size' => 3,
to
Code:
'min_word_size' => 2,


After that is done, here is Alex's email on the fix:

Code:
Hmm, you could do this by editing GT/SQL/Search/Base/Common.pm and
change (around line 45):

@words = split /[^\w\x80-\xFF\-]+/, lc $text;

to:

@words = split /[^\w\x80-\xFF\-\+]+/, lc $text;

Then do a full re-index, and you should have c++ in your word list
(however by default it takes at least 3 chars to be indexed, so c+
wouldn't get indexed).

Try ou the search and if it doesn't work, you may need to edit
GT/SQL/Search/Base/Search.pm and change (around line 362):

| (\+?[\w\x80-\xFF\-\*]+),?

to:

| (\+?[\w\x80-\xFF\-\*\+]+),?

I'm not 100% sure about that part. =)

Cheers,

Alex