Gossamer Forum
Home : Products : Gossamer Links : Pre Sales :

Korean Language Indexed Search

Quote Reply
Korean Language Indexed Search
I'm currently trying to get LinksSQL's indexed search to work with Korean language. I've read multiple discussion threads regarding Asian Languages and this is what I know so far

1) Search.cgi implements search using the LinksSQL created index scheme where as Search-ni.cgi does not use LinksSQL index scheme and relies on the full database scan for search
2) Chineses languages must use Search-ni.cgi since Chinese words are not separated by spaces thus LinksSQL cannot parse words into the indexed table in the database

Here is my situation:
1) Unlike Chinese, Korean words are separated by spaces
2) I cannot search in Korean using Search.cgi but I can get Search-ni.cgi to work in Korean.
3) I receive "Lacks alphnumerics" error if I search in Korean using Search.cgi. This is an example: ѱ - Lacks alphnumerics

Here are my questions:
1) What does the error "Lacks alphnumerics" really mean? Does LinksSQL parse out only western fonts separated by spaces? Again, Korean words are separated by spaces thus LinksSQL should be able to index words based on spaces between words.
2) What part of LinksSQL module actually does the job of parsing out of words into the indexed table? I'm trying to look at the codes but I don't know where the codes are located

I've gone through multiple discussion threads regarding Asian languages but I could not find the right answers.

Thank you for your help,
Michael





Quote Reply
Re: Korean Language Indexed Search In reply to
Ok,

I think the main part of this is the "alpha numeric" as I figured it would be once you said the language has discrete words (unlike German <G>) (Sorry, guys, couldn't resist!)

I'm not sure how the foreign language versions work, but I'm assuming you did turn the "foreign_char" flag on in the Links.pm file?

Code:
# BUILD OPTIONS
# ---------------------------------------------------
# These options determine how Links will create your HTML.

# Use foreign characters? This will change how Links SQL builds it's
# category pages, it will use the ID number instead of the Category Name as
# a directory. It also removes some restrictions on searching.
$LINKS{foreign_char} = 0;
In the 1.1b3 version, you also had to do:

Code:
NOTE: You will
# also need to edit Links/DBSQL.pm and set $FOREIGN_CHAR to 1.
This is not needed in the 1.11 version, apparantly.

Other than that, hopefully someone else who is actually using a foreign character set (Links SQL was written in English, so any other language is "foreign") with extended characters may have some help for you.

You'd have to dig into the search.cgi and follow the path of a request through the DBSQL.pm and Search.pm files to find out exactly where the binary codes are being screened out. The problem with that, is by eliminating that sort of protection, you may open other "holes" in the program. You'd need to make sure that user-passed input is _ONLY_ sent to internal routines that do very defined things with them, and that they cannot get passed out into the operating system.

This is a complicated issue, but I'm sure there are solutions that can be applied. You might want to check out the mysql.org site, and see how they suggst handling foreign characters (they might have some hints) then you can back track through the Links SQL program seeing if that can be applied.

Since the searching is really a matter of parsing the Links records into small bite sized pieces (words) and inserting them into a searchable matrix, then accepting another bite sized piece and seeing if you get any matches, it's certainly possible to do. The question is how easily it can be done within this framework, and I really don't know.



http://www.postcards.com
FAQ: http://www.postcards.com/FAQ/LinkSQL/

Quote Reply
Re: Korean Language Indexed Search In reply to
Pugdog,

I think I need to look through somewhere within admin.cgi for the code where the code parses words within a link/category description to put in Links/Category_Word_Index table. Due to my unfamiliarity with LinksSQL codes that I get lost admin.cgi and I don't know place where this parsing actually takes place. Can you help me out? I think it happens at a module referenced within the admin.cgi and I just don't know for the life of me.

Thank you,
Michael

Quote Reply
Re: Korean Language Indexed Search In reply to
This would happen in the "Validate" routines, which get passed into the DBSQL.pm and DB_Utils.pm files.

All the data is checked, and the indexes updated at that point.

I would wait digging around though, since that code is almost sure to change in the next release, and you might create some incompatibilities in your database that would be hard to fix. (Not likely, since you could just do a re-index, but it's possible.)



http://www.postcards.com
FAQ: http://www.postcards.com/FAQ/LinkSQL/