Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-580) Pre-analyzed fields

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jan 18, 2007, 8:21 AM

Post #1 of 2 (386 views)
Permalink
[jira] Commented: (LUCENE-580) Pre-analyzed fields

[ https://issues.apache.org/jira/browse/LUCENE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465797 ]

Nadav Har'El commented on LUCENE-580:
-------------------------------------

This patch will be useful for users LUCENE-755, the payloads patch. That patch adds "payloads" to tokens, but using it to add a few tokens with payloads in some field can be ugly because you need to split the code into two places: at one place you add the field, only text, and at another place you need to write a special analyzer which will work only on that field, recognize the specific tokens and add the payloads to them. This patch makes this easier, because when you add a field, you can add it pre-analyzed, already as a list of tokens, and these tokens will already have their special payloads in them.

I have just a few comments on this patch:

1. The description above suggests that it might not work if the same field name is used for two Field's, one stored and the other preanalyzed. I think it is important that this combination (as well as all other combinations) are supported. I actually use all these combinations in my code, and I don't see why it should cause problems.

2. The patch has some strange changes in the comments, changing the word "Index" to "NotificationService". I bet this wasn't intentional :-)

3. The new Field constructor still has a "Index" paramter, taking TOKENIZED, UN_TOKENIZED or NO_NORMS (only NO is forbidden). I wonder, what's the difference between TOKENIZED and UN_TOKENIZED in this case? The NO_NORMS is a very useful case, because it allows you to do something not previously possible in Lucene (a tokenized field, but without norms). Perhaps this parameter should be better documented in the javadoc comment.

4. In the new Field constructor's comment, the phrase "if name or reader" should be "if name or tokenStream".

Thanks!

> Pre-analyzed fields
> -------------------
>
> Key: LUCENE-580
> URL: https://issues.apache.org/jira/browse/LUCENE-580
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: 1.9
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: preanalyze.tar
>
>
> Adds the possibility to set a TokenStream at Field constrution time, available as tokenStreamValue in addition to stringValue, readerValue and binaryValue.
> There might be some problems with mixing stored fields with the same name as a field with tokenStreamValue.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jan 18, 2007, 9:04 AM

Post #2 of 2 (337 views)
Permalink
[jira] Commented: (LUCENE-580) Pre-analyzed fields [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465811 ]

Karl Wettin commented on LUCENE-580:
------------------------------------

Nadav Har'El [18/Jan/07 08:21 AM]

> The description above suggests that it might not work if the same
> field name is used for two Field's, one stored and the other preanalyzed.
> I think it is important that this combination (as well as all other
> combinations) are supported. I actually use all these combinations in my
> code, and I don't see why it should cause problems.

Actually, I can't remember why I thought there could be a problem, nor can I think of one now. This code is from my pre-tests era, and it should could need some.

If you like this strategy, I think it would be more elegant passing a factory rather than the actual token stream.

> The patch has some strange changes in the comments, changing the word
> "Index" to "NotificationService". I bet this wasn't intentional :-)

Hehe, that must be an old refactoring search and replace incident.



> Pre-analyzed fields
> -------------------
>
> Key: LUCENE-580
> URL: https://issues.apache.org/jira/browse/LUCENE-580
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: 1.9
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: preanalyze.tar
>
>
> Adds the possibility to set a TokenStream at Field constrution time, available as tokenStreamValue in addition to stringValue, readerValue and binaryValue.
> There might be some problems with mixing stored fields with the same name as a field with tokenStreamValue.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.