Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-2154) Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Feb 9, 2010, 7:17 AM

Post #1 of 5 (452 views)
Permalink
[jira] Commented: (LUCENE-2154) Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers

[ https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831483#action_12831483 ]

Renaud Delbru commented on LUCENE-2154:
---------------------------------------

Sorry in advance, maybe what I am saying is out of scope due to my partial understanding of the problem.

I have start to look at the problem, in order to be able to use my own attributes from my own DocsAdnPositionsEnum classes.
would it not be simpler to create a MultiAttributeSource that is instantiated in the MultiDocsAndPositionsEnum. At creation time, all the AttributeSource of the subreaders (which are available) will be passed in its constructor. This MultiAttributeSource will delegate the getAttribute call to the right DocsAndPositionsEnum$AttributeSource.

There is not a single AttributeSource shared by all the subreader, but each subreader keeps its own AttributeSource. In this way, attributes are not overridden. The MultiAttributeSource is in fact like a Wrapper.

One problem is when there is custom attributes, e.g. BoostAttribute. If I understand correctly, if the user tries to access the BoostAttribute, but one of the subreader does not know it, the IllegalArgumentException will be thrown. Under the hood, the MultiAttributeSource can check if the attribute exists on the current subreader, and if not it can rely on a default attribute, or a previously stored attribute (coming from a previous subreader).

I am not sure if what I am saying is making some sense. It looks to me too simple to cover all the cases. Are there cases I am not aware of ? Could you give me some examples to make me aware of other problems ?

> Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-2154
> URL: https://issues.apache.org/jira/browse/LUCENE-2154
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: Flex Branch
> Reporter: Michael McCandless
> Fix For: Flex Branch
>
>
> The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum levels, for a codec to set custom attrs.
> But, it's currently broken for Dir/MultiReader, which must somehow share attrs across all the sub-readers. Somehow we must make a single attr source, and tell each sub-reader's enum to use that instead of creating its own. Hopefully Uwe can work some magic here :)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Feb 9, 2010, 7:27 AM

Post #2 of 5 (445 views)
Permalink
[jira] Commented: (LUCENE-2154) Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831489#action_12831489 ]

Uwe Schindler commented on LUCENE-2154:
---------------------------------------

The problem is the following:

Attributes are not to be retrieved on every call to next(), they are get/added after construction. If you have a consumer of your MultiEnum, it calls attributes().getAttribute exactly one time before start to enumerate tokens/positions/whatever. If your proposed MultiAttributeSource would return the attribute of the first sub-enum, the consumer would stay with this attribute instance forever. If the MultiEnum then changes to another sub-enum, the consumer would not see the new attribute.

Because of that the right way is not to have a MultiAttributeSource. What you need are proxy attributes. The Attributes itsself must be proxies, delegating the call to the current enum's corresponding attribute. The same was done in Lucene 2.9 to emulate the backwards compatibility for TokenStreams. The proxy was TokenWrapper. These ProxyAttributes would look exactly like this TokenWrapper impl class.

> Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-2154
> URL: https://issues.apache.org/jira/browse/LUCENE-2154
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: Flex Branch
> Reporter: Michael McCandless
> Fix For: Flex Branch
>
>
> The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum levels, for a codec to set custom attrs.
> But, it's currently broken for Dir/MultiReader, which must somehow share attrs across all the sub-readers. Somehow we must make a single attr source, and tell each sub-reader's enum to use that instead of creating its own. Hopefully Uwe can work some magic here :)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Feb 9, 2010, 8:21 AM

Post #3 of 5 (441 views)
Permalink
[jira] Commented: (LUCENE-2154) Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831521#action_12831521 ]

Renaud Delbru commented on LUCENE-2154:
---------------------------------------

I see. The problem is to return to the consumer a unique attribute reference when attributes().getAttribute is called, and then updates the references when iterating the enums in order to propagate the attribute changes to the consumer.

I am trying to propose a (possible) alternative solution (if I understood the problem correctly), which can avoid reflection, but could potentially need a modification of the Attribute interface.

If the MultiAttributeSource will create its own set of unique references for each attribute (the list of different attribute classes can be retrieved by calling the getAttributeClassesIterator() method of the AttributeSource for each subreader, we can then create a list of unique references, one reference for each type of attributes), the goal is then to update these references after each enum iteration or sub-enum change (in order to propagate the changes to the consumer).

Unfortunately, I don't see any interface on the Attribute interface to 'copy' a given attribute. Each AttributeImpl could implement this 'copy method', which copies the state of a given attribute of the same class.
Then, in the MultiDocsAndPositionsEnum, after each iteration or each sub-enum change, a call to MultiAttributeSource can be made explicitly to update the unique references of the different attributes. This update method will under the hood (1) check if the sub-enum is aware of the attribute class, (2) get the attribute from the sub-enum, and (3) copy the attribute to the unique attribute reference kept by MultiAttributeSource.

Could this solution possibly work ?

> Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-2154
> URL: https://issues.apache.org/jira/browse/LUCENE-2154
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: Flex Branch
> Reporter: Michael McCandless
> Fix For: Flex Branch
>
>
> The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum levels, for a codec to set custom attrs.
> But, it's currently broken for Dir/MultiReader, which must somehow share attrs across all the sub-readers. Somehow we must make a single attr source, and tell each sub-reader's enum to use that instead of creating its own. Hopefully Uwe can work some magic here :)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Feb 9, 2010, 12:25 PM

Post #4 of 5 (436 views)
Permalink
[jira] Commented: (LUCENE-2154) Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831646#action_12831646 ]

Michael McCandless commented on LUCENE-2154:
--------------------------------------------

What if we require that all segments are the same codec, if you want to use attributes from a Multi*Enum? (I think this limitation is fine... and if it's not, one could still operate per-segment with different attr impls per segment).

This way, every segment would share the same attr impl for a given attr interface?

And then couldn't we somehow force each segment to use the same attr impl as the last segment(s)?

> Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-2154
> URL: https://issues.apache.org/jira/browse/LUCENE-2154
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: Flex Branch
> Reporter: Michael McCandless
> Fix For: Flex Branch
>
>
> The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum levels, for a codec to set custom attrs.
> But, it's currently broken for Dir/MultiReader, which must somehow share attrs across all the sub-readers. Somehow we must make a single attr source, and tell each sub-reader's enum to use that instead of creating its own. Hopefully Uwe can work some magic here :)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Feb 10, 2010, 2:47 AM

Post #5 of 5 (422 views)
Permalink
[jira] Commented: (LUCENE-2154) Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831938#action_12831938 ]

Uwe Schindler commented on LUCENE-2154:
---------------------------------------

That would work. So the MultiEnum would use its own AttributeSource and passes it downto the sub-enums. For that the ctor of *Enums should allow to pass an AttrubuteSource. I can provide patch.

> Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-2154
> URL: https://issues.apache.org/jira/browse/LUCENE-2154
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: Flex Branch
> Reporter: Michael McCandless
> Fix For: Flex Branch
>
>
> The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum levels, for a codec to set custom attrs.
> But, it's currently broken for Dir/MultiReader, which must somehow share attrs across all the sub-readers. Somehow we must make a single attr source, and tell each sub-reader's enum to use that instead of creating its own. Hopefully Uwe can work some magic here :)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.