Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Can I use Ispell dictionaries roe analizers in Lucene?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


aleboo at gmail

Nov 18, 2007, 10:09 AM

Post #1 of 4 (308 views)
Permalink
Can I use Ispell dictionaries roe analizers in Lucene?

I was wondering about methods for analyzing various languages and that
what I understand (please correct me if I wrong):

1. To analyze non English language I need to use specific analyzer.
Link to already available contributions in sandbox
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/

2. There are some universal systems in the world such as
Ispell - http://ficus-www.cs.ucla.edu/geoff/ispell.html
Snowball - http://snowball.tartarus.org/
Hunspell - http://sourceforge.net/projects/hunspell
(This info I got from Postgresql documentation chapter about
full-text-search which is part of core distribution from version 8.3
(beta 2 for that moment) )
Here is link too:
http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html

Each of such engines has various dictionaries for various languages.

3. Snowball is already supported by Lucene.
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/snowball

And here goes my question :) ...

Can I use Ispell dictionaries with Lucene? And if no, then why? Are
there some juridical issues with it or just no implementation exists jet?
If no implementation, then maybe there is some motivation not to support
it (maybe it just not worth to do), or because of complexity
or no one ever tried jet?

The problem is that I need to support some languages not listed in
snowball supported dictionaries list, but existing in Ispell, so
it is naturally to try to use Ispell in this situation.

P.S. Postgresql full text search has it, so Lucene probably need one
too, I think.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


lucenelist2007 at danielnaber

Nov 18, 2007, 11:24 AM

Post #2 of 4 (291 views)
Permalink
Re: Can I use Ispell dictionaries roe analizers in Lucene? [In reply to]

On Sonntag, 18. November 2007, Alebu wrote:

> 1. To analyze non English language I need to use specific analyzer.

You don't have to, but it helps improving recall.

> Can I use Ispell dictionaries with Lucene?

It depends on the dictionary. Some dictionary authors use the ispell
flagging system just to save space, others use it in a way so that it
really expresses the linguistic relation between a base form
(e.g. "house") and its text forms (e.g. "houses"). Only in the later case,
you could expand the dictionary to a "text form -> base form" mapping and
use it.

Some dictionaries are GPL, so they cannot be part of Lucene. But you can
use them anyway. So the reason that there are no more advanced
(dictionary-based) analyzers for Lucene are mostly because nobody has
developed and published them. Of course, an increased recall often comes
with a decrease in precision.

Regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


aleboo at gmail

Nov 18, 2007, 12:22 PM

Post #3 of 4 (292 views)
Permalink
Re: Can I use Ispell dictionaries roe analizers in Lucene? [In reply to]

So what ispell dictionary actually is? List of rules for translation
some words (or sentence?) to 'base form'? Or what?
If it is so, then as I understand it is actually possible to create some
analyzer which gets ispell dictionary as parameter
and this way to get a full power of ispell dictionaries in Lucene? Or am
I wrong somewhere?

Daniel Naber wrote:
> On Sonntag, 18. November 2007, Alebu wrote:
>
>
>> 1. To analyze non English language I need to use specific analyzer.
>>
>
> You don't have to, but it helps improving recall.
>
>
>> Can I use Ispell dictionaries with Lucene?
>>
>
> It depends on the dictionary. Some dictionary authors use the ispell
> flagging system just to save space, others use it in a way so that it
> really expresses the linguistic relation between a base form
> (e.g. "house") and its text forms (e.g. "houses"). Only in the later case,
> you could expand the dictionary to a "text form -> base form" mapping and
> use it.
>
> Some dictionaries are GPL, so they cannot be part of Lucene. But you can
> use them anyway. So the reason that there are no more advanced
> (dictionary-based) analyzers for Lucene are mostly because nobody has
> developed and published them. Of course, an increased recall often comes
> with a decrease in precision.
>
> Regards
> Daniel
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


lucenelist2007 at danielnaber

Nov 18, 2007, 12:29 PM

Post #4 of 4 (291 views)
Permalink
Re: Can I use Ispell dictionaries roe analizers in Lucene? [In reply to]

On Sonntag, 18. November 2007, Alebu wrote:

> So what ispell dictionary actually is? List of rules for translation
> some words (or sentence?) to 'base form'? Or what?

It's a list of terms with optional flags. For example:

walk/xy

In a different file, the flag "x" would then be defined as "append 'ed'"
and y as "append 'ing'" so that the expanded version is:

walk
walked
walking

The term in the dictionary does not have to be a real word. So in this
example, if normalization from "walking" and "walked" to "walk" is what
you want, you could use ispell. It just won't work for irregualar forms
(e.g. went -> go)

Regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.