Gossamer Forum: Products: Links 2.0: Customization: NON-english search problem, PLEASE HELP (little problem for you)

Oct 4, 1999, 8:20 AM

ferko

Novice (5 posts)

Oct 4, 1999, 8:20 AM

Post #1 of 10

Shortcut

NON-english search problem, PLEASE HELP (little problem for you)

Hi please I have problem in nonenglish searching,
I need on the proviso that look for word which includes Á(special character) that find term which too contain a á Á A or e.g. look for word malá and finds term malÁ malA malá mala
what is it possible???...

between little and big special character is difference, in normal character e.g. A a not.

very thank you within your help
PLEASE answer, thank you

------------------

[This message has been edited by ferko (edited October 04, 1999).]

Oct 5, 1999, 4:54 PM

Alex

Administrator (9387 posts)

Oct 5, 1999, 4:54 PM

Post #2 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

Hi,

Unfortunately perl treats a different then á so when comparing it thinks the two are different. What I would do is strip all accents. In search.cgi after:

%in = &parse_form();

add:

$in{'query'} =~ s/á/a/g;

for all the accents (it's a bit ugly, I know).

Cheers,

Alex

Oct 6, 1999, 1:03 AM

Sergio

Novice (44 posts)

Oct 6, 1999, 1:03 AM

Post #3 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

Please, I would like know if I understood...

This explanation above said:

in search:
á = a

if is correct, i can do:

a = á = à = ã????

for example:

"... eu tenho uma pá à mão..."

for my search this is equal:

"... eu tenho uma pa a mao..."

Please, this is above is correct???

thanks in regards,

Sergio

------------------

Oct 9, 1999, 7:23 PM

ferko

Novice (5 posts)

Oct 9, 1999, 7:23 PM

Post #4 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

Hi
ALEX this is only change character, but I need as so searching this character with anothers e.g. a=A=Á=á or e=E=É=é etc. If somebody search sonémá or SONEMA or sonéma search script must find all resources presence sonema, sonémá, SONÉMÁ, SonémÁ, SONEMÁ etc..

[This message has been edited by ferko (edited October 09, 1999).]

Oct 18, 1999, 1:38 PM

Diego

Novice (24 posts)

Oct 18, 1999, 1:38 PM

Post #5 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

Ferko,
If you have the solution please tell us.
We are with the same discussion in http://www.gossamer-threads.com/...um3/HTML/002958.html and we are losing our hopes.

Thanks in advance.

Oct 18, 1999, 8:00 PM

ferko

Novice (5 posts)

Oct 18, 1999, 8:00 PM

Post #6 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

no,

don't have it, I still waiting for this mod.

Oct 18, 1999, 8:15 PM

ferko

Novice (5 posts)

Oct 18, 1999, 8:15 PM

Post #7 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

But I have idea, auto attaching anothers words if include special characters, of course in this function must disable advanced search (becouse search frase don't work)

if word

sonema

script auto attach all modification of this
word

sonema+sonémá+sónémá+sónemá etc.
what do you thing?
I want advanced search, but in this MOD is not acceptable Frown

Apr 29, 2000, 3:20 AM

JAC

New User (4 posts)

Apr 29, 2000, 3:20 AM

Post #8 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

Hi everyone,
I've been monitoring this thread and others for some time now, hoping someone would come up with a reasonable solution to the problem. No luck :-)
So, this is the first time I'm posting something in the hope that anyone can solve our problem

First a bit of language knowledge so we can all be tuned to the same frequency.

In latin languages such as French, Spanish, and Portuguese, the only accentuated characters are vowels:
a à à e é, etc.

Apart from these, we have the ñ in Spanish and in Portuguese (don't no if ç is used in Spanish)

So here is what I would try if I was a Perl guru:

Let's concentrate on the English word, CONDUCTION
conducción in ES (and FR if I'm not mistaken)
condução in PT

If someone entered this keyword in a search query, the search script could change all accentuated characters to normal characters by:

$in{'query'} = (s/a/\á/g, $in{'query'});
$in{'query'} = (s/e/\é/g, $in{'query'});
$in{'query'} = (s/i/\í/g, $in{'query'});
$in{'query'} = (s/o/\ó/g, $in{'query'});
$in{'query'} = (s/u/\ú/g, $in{'query'});
etc, etc.

Then, it would do the same thing to the words searched by the script on the database searchable fields so the comparison would be done based on normal ASCII characters, although there are accentuated words inside the database fields.

So searching "conducción or conduccion (without the stress)" would match "conducción - conduccion - CONDUCCION, etc.", because what the script would really be looking for would be lowercase "conduccion", plain and simple.

The results page however, should present the actual stressed words and frases that exist in the database.

Any Perl guru around who is willing to follow this line of thought and give us a solution MOD?

Thanks

Jose

Apr 29, 2000, 8:23 AM

Stealth

Veteran (17240 posts)

Apr 29, 2000, 8:23 AM

Post #9 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

One Reply to a relevant Topic will suffice.

Thank you.

Regards,

------------------
Eliot Lee....
Former Handle: Eliot
Anthro TECH, L.L.C
anthrotech.com
* Check Resource Center
* Search Forums
* Thinking out of the box (codes) is not only fun, but effective.

Apr 30, 2000, 8:22 AM

JAC

New User (4 posts)

Apr 30, 2000, 8:22 AM

Post #10 of 10

Shortcut

Re: NON-english search problem, PLEASE HELP (little problem for you) In reply to

Sorry Eliot and other users of this Forum. You must give us newbies a break. I really thought multiple postings to relevant threads would be the best thing to do.

Apologies presented - back to the subject:

As I said I'm no Perl guru but here is a little bit of progress I got by using Alex's example and lots of trial and error:

In search.cgi, if you add just below the

%in = &parse_form();

these lines (as suggested by Alex)

$in{'query'} =~ s/ó/o/g;
$in{'query'} =~ s/ç/c/g;
$in{'query'} =~ s/ã/a/g;
$in{'query'} =~ s/ñ/n/g;
(and repeat for all accentuated characters you have in your language + The upercase ones)

and in db_utils.pl look for
sub split_decode {
# --------------------------------------------------------
# Takes one line of the database as input and returns an
# array of all the values. It replaces special mark up that
# join_encode makes such as replacing the '``' symbol with a
# newline and the '~~' symbol with a database delimeter.

my ($input) = shift;
my (@array) = split (/\Q$db_delim\E/o, $input, $#db_cols+1);
foreach (@array) {
s/~~/$db_delim/g; # Retrieve Delimiter..
s/``/\n/g; # Change '' back to newlines..
}
return @array;

add the following just after s/``/\n/g;:
s/ó/o/g;
s/ç/c/g;
s/ã/a/g;
s/ñ/n/g;
(and repeat for all accentuated character you have in your language + The upercase ones)

This does the trick when searching a word with stressed characters or not. It doesn't matter how the word is written. The results do come back in the search results.

The only problem is that the results also appear without accentuation :-(

Not a very neat result I'm afraid...

[This message has been edited by JAC (edited April 30, 2000).]