Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Stemming and highlighting

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


celikik at gmail

Jan 4, 2008, 11:30 AM

Post #1 of 3 (1695 views)
Permalink
Stemming and highlighting

Dear all,

I am a new Lucene user and I would like to know the following. How does
Lucene bring together fuzzy queries and highlighting?

Let's say for the query algorithm, the word algorith is also a match,
how do the highlighter know that it should also highlight
occurrences of the word algorith? (I am not sure it does this anyway)

Thank you!

Marjan.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucenelist2007 at danielnaber

Jan 4, 2008, 2:46 PM

Post #2 of 3 (1412 views)
Permalink
Re: Stemming and highlighting [In reply to]

On Freitag, 4. Januar 2008, Marjan Celikik wrote:

> I am a new Lucene user and I would like to know the following. How does
> Lucene bring together fuzzy queries and highlighting?

You need to call rewrite() on the fuzzy query. This will expand the fuzzy
query to all similar terms (e.g. belies~ -> belief OR believe).
Highlighting then doesn't work different than for non-fuzzy queries.

Regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markharw00d at yahoo

Jan 4, 2008, 4:06 PM

Post #3 of 3 (1421 views)
Permalink
Re: Stemming and highlighting [In reply to]

>
> Let's say for the query algorithm, the word algorith is also a match,
> how do the highlighter know that it should also highlight
> occurrences of the word algorith? (I am not sure it does this anyway)

The highlighter knows to highlight stemmed words because both the query
terms and the document content are fed through (hopefully) the same
analyzer so that "algorithmic", "algorithm", "algorithms" etc become
stemmed to the same root form in both query and doc content. The tokens
produced by analyzers include the byte offsets of the *original* full
word, not just the stemmed form, so the highlighter knows the full
extent of what to highlight in text.


Cheers
Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.