Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Updated: (LUCENE-2035) TokenSources.getTokenStream() does not assign positionIncrement

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Nov 5, 2009, 5:43 AM

Post #1 of 1 (187 views)
Permalink
[jira] Updated: (LUCENE-2035) TokenSources.getTokenStream() does not assign positionIncrement

[ https://issues.apache.org/jira/browse/LUCENE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Morris updated LUCENE-2035:
---------------------------------------

Attachment: LUCENE-2305.patch

For the highlighter trunk

> TokenSources.getTokenStream() does not assign positionIncrement
> ---------------------------------------------------------------
>
> Key: LUCENE-2035
> URL: https://issues.apache.org/jira/browse/LUCENE-2035
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/highlighter
> Affects Versions: 2.4, 2.4.1, 2.9
> Reporter: Christopher Morris
> Attachments: LUCENE-2305.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> TokenSources.StoredTokenStream does not assign positionIncrement information. This means that all tokens in the stream are considered adjacent. This has implications for the phrase highlighting in QueryScorer when using non-contiguous tokens.
> For example:
> Consider a token stream that creates tokens for both the stemmed and unstemmed version of each word - the fox (jump|jumped)
> When retrieved from the index using TokenSources.getTokenStream(tpv,false), the token stream will be - the fox jump jumped
> Now try a search and highlight for the phrase query "fox jumped". The search will correctly find the document; the highlighter will fail to highlight the phrase because it thinks that there is an additional word between "fox" and "jumped". If we use the original (from the analyzer) token stream then the highlighter works.
> Also, consider the converse - the fox did not jump
> "not" is a stop word and there is an option to increment the position to account for stop words - (the,0) (fox,1) (did,2) (jump,4)
> When retrieved from the index using TokenSources.getTokenStream(tpv,false), the token stream will be - (the,0) (fox,1) (did,2) (jump,3).
> So the phrase query "did jump" will cause the "did" and "jump" terms in the text "did not jump" to be highlighted. If we use the original (from the analyzer) token stream then the highlighter works correctly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.