Hi everyone,
I've been monitoring this thread and others for some time now, hoping someone would come up with a reasonable solution to the problem. No luck :-)
So, this is the first time I'm posting something in the hope that anyone can solve our problem
First a bit of language knowledge so we can all be tuned to the same frequency.
In latin languages such as French, Spanish, and Portuguese, the only accentuated characters are vowels:
a à à e é, etc.
Apart from these, we have the " ñ " in Spanish and " ç " in Portuguese (don't no if ç is used in Spanish)
So here is what I would try if I was a Perl guru:
Let's concentrate on the English word, CONDUCTION
conducción in ES (and FR if I'm not mistaken)
condução in PT
If someone entered this keyword in a search query, the search script could change all accentuated characters to normal characters by:
$in{'query'} = (s/a/\á/g, $in{'query'});
$in{'query'} = (s/e/\é/g, $in{'query'});
$in{'query'} = (s/i/\í/g, $in{'query'});
$in{'query'} = (s/o/\ó/g, $in{'query'});
$in{'query'} = (s/u/\ú/g, $in{'query'});
etc, etc.
Then, it would do the same thing to the words searched by the script on the database searchable fields so the comparison would be done based on normal ASCII characters, although there are accentuated words inside the database fields.
So searching "conducción or conduccion (without the stress)" would match "conducción - conduccion - CONDUCCION, etc.", because what the script would really be looking for would be lowercase "conduccion", plain and simple.
The results page however, should present the actual stressed words and frases that exist in the database.
Any Perl guru around who is willing to follow this line of thought and give us a solution MOD?
Thanks
Jose
I've been monitoring this thread and others for some time now, hoping someone would come up with a reasonable solution to the problem. No luck :-)
So, this is the first time I'm posting something in the hope that anyone can solve our problem
First a bit of language knowledge so we can all be tuned to the same frequency.
In latin languages such as French, Spanish, and Portuguese, the only accentuated characters are vowels:
a à à e é, etc.
Apart from these, we have the " ñ " in Spanish and " ç " in Portuguese (don't no if ç is used in Spanish)
So here is what I would try if I was a Perl guru:
Let's concentrate on the English word, CONDUCTION
conducción in ES (and FR if I'm not mistaken)
condução in PT
If someone entered this keyword in a search query, the search script could change all accentuated characters to normal characters by:
$in{'query'} = (s/a/\á/g, $in{'query'});
$in{'query'} = (s/e/\é/g, $in{'query'});
$in{'query'} = (s/i/\í/g, $in{'query'});
$in{'query'} = (s/o/\ó/g, $in{'query'});
$in{'query'} = (s/u/\ú/g, $in{'query'});
etc, etc.
Then, it would do the same thing to the words searched by the script on the database searchable fields so the comparison would be done based on normal ASCII characters, although there are accentuated words inside the database fields.
So searching "conducción or conduccion (without the stress)" would match "conducción - conduccion - CONDUCCION, etc.", because what the script would really be looking for would be lowercase "conduccion", plain and simple.
The results page however, should present the actual stressed words and frases that exist in the database.
Any Perl guru around who is willing to follow this line of thought and give us a solution MOD?
Thanks
Jose