Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Nov 10, 2009, 2:15 PM

Post #1 of 12 (580 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776107#action_12776107 ]

Robert Muir commented on LUCENE-2051:
-------------------------------------

I agree. Any one analyzer should really just serve as an example of how to put tokenstreams together.
They shouldn't try to meet all users needs, but instead be very simple and easy for the user to customize.

This complexity caused by setters was painful when implementing reusableTokenStream, these setters require special handling and code complexity.
and there might even still be some bug I introduced in this process, we try our best but these setters make life very complex.

I would like to see these setters deprecated for 3.0 so that code will be simpler in the future.


> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 13, 2009, 2:50 AM

Post #2 of 12 (544 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777445#action_12777445 ]

Simon Willnauer commented on LUCENE-2051:
-----------------------------------------

This would also include deprecating some of the constructors. I will attach a patch which adds / deprecates ctors for all analyzers having those setters too.

simon

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 14, 2009, 2:18 PM

Post #3 of 12 (529 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778004#action_12778004 ]

Uwe Schindler commented on LUCENE-2051:
---------------------------------------

Simon, are you still working on a patch?

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 14, 2009, 3:22 PM

Post #4 of 12 (528 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778015#action_12778015 ]

Simon Willnauer commented on LUCENE-2051:
-----------------------------------------

Yes I do, this one got a little lost in my schedule... I will try to get the patch done by tomorrow night, would that be ok for 3.0? What's your schedule for the release?

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 14, 2009, 3:30 PM

Post #5 of 12 (528 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778017#action_12778017 ]

Robert Muir commented on LUCENE-2051:
-------------------------------------

Simon, if you are busy, you can assign to me, or we can split the work. let me know

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 14, 2009, 3:30 PM

Post #6 of 12 (528 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778018#action_12778018 ]

Uwe Schindler commented on LUCENE-2051:
---------------------------------------

I will create an RC, as soon as we have the remaining issues closed (hopefully soon), ideally at the beginning of the week! Some remaining issues (like changing BW requirements) are not really release critical. Also the change of sysreq page.

I think only your's, Mike's with default readonly ctors and maybe the N-Gram one are important.

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 15, 2009, 12:03 PM

Post #7 of 12 (517 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778163#action_12778163 ]

Robert Muir commented on LUCENE-2051:
-------------------------------------

simon, should we expose the getDefaultStopSet() as public yet, if you are planning on refactoring this stopword stuff in 3.1 anyway? (would you have to then deprecate this in 3.1?)

also, I'm not sure i like the copy() method in CharArraySet, i think it should return a real copy even if it is an EMPTY_SET, and if you give it a CharArraySet it should call .clone() ?

other things are minor, in Czech i think there is a spurious import added (javax.print.DocFlavor.CHAR_ARRAY), etc.

nothing to do with your issue, but maybe while we are here cleaning up these ctors we should fix the fact that a lot of these never call super() ?



> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2051.patch
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 15, 2009, 2:09 PM

Post #8 of 12 (518 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778184#action_12778184 ]

Simon Willnauer commented on LUCENE-2051:
-----------------------------------------

bq. should we expose the getDefaultStopSet() as public yet,
This is different. the StopawareAnalyzer#getStopwords() is an instance method to get the "current" stopword set of the instance. while the ones I introduced here are static to get the default set instead. We need to provide a replacement for the public static final Sting[] stuff for deprecation an I thing they have to be there. thoughts?

bq. also, I'm not sure i like the copy() method in CharArraySet, i think it should return a real copy even if it is an EMPTY_SET, and if you give it a CharArraySet it should call .clone() ?

the deal with this copy method is that StopFilter converts the incoming set to a chararrayset if its not a such already. I want to have all sets in analyzers to be unmodifiable and an instance of ChararraySet. Further they shoud be a real copy as otherwise they could be modified by the caller of the Analyzer ctor. Thats why I introduced this helper as such code was duplicated all over the place.

bq. nothing to do with your issue, but maybe while we are here cleaning up these ctors we should fix the fact that a lot of these never call super() ?

Java guarantees that the default super ctor is called implicitly. I would not add all this noise (calling it explicitly) just for the sake of typing super() 20 times.

Thoughts?


> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2051.patch
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 15, 2009, 3:03 PM

Post #9 of 12 (512 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778190#action_12778190 ]

Robert Muir commented on LUCENE-2051:
-------------------------------------

bq. This is different. the StopawareAnalyzer#getStopwords() is an instance method to get the "current" stopword set of the instance. while the ones I introduced here are static to get the default set instead. We need to provide a replacement for the public static final Sting[] stuff for deprecation an I thing they have to be there. thoughts?

right, but still, will they this static method be supported after refactoring to StopAwareAnalyzer?


> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2051.patch, LUCENE-2051.patch
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 15, 2009, 3:09 PM

Post #10 of 12 (512 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778192#action_12778192 ]

Simon Willnauer commented on LUCENE-2051:
-----------------------------------------

bq. right, but still, will they this static method be supported after refactoring to StopAwareAnalyzer?
sure, they solve two different things.
static Set<?> getDefaultStopSet() -> will always return the default stopword set. (replacement for the String array
getStopwords() -> returns the currently used stopwords by the analyzer.

that is all. If we can get rid of that completely I'm fine with it. I would actually like to not expose the stopwords. We could put them all in files in 3.1 and load them from there?!

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2051.patch, LUCENE-2051.patch
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 15, 2009, 3:19 PM

Post #11 of 12 (512 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778193#action_12778193 ]

Robert Muir commented on LUCENE-2051:
-------------------------------------

{quote}
sure, they solve two different things.
static Set<?> getDefaultStopSet() -> will always return the default stopword set. (replacement for the String array
getStopwords() -> returns the currently used stopwords by the analyzer.

that is all. If we can get rid of that completely I'm fine with it. I would actually like to not expose the stopwords. We could put them all in files in 3.1 and load them from there?!
{quote}

yeah i too prefer they be hidden behind a method, (we should store them however we damn well feel, i do like files though)
just wanted to make sure it would not somehow present a problem for StopAwareAnalyzer in the future, if it can support getting this set then we are ok.

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2051.patch, LUCENE-2051.patch
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 16, 2009, 3:51 AM

Post #12 of 12 (493 views)
Permalink
[jira] Commented: (LUCENE-2051) Contrib Analyzer Setters should be deprecated and replace with ctor arguments [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778301#action_12778301 ]

Uwe Schindler commented on LUCENE-2051:
---------------------------------------

I committed it for you in revision: 880715

Simon, you can close it, as you are assigned.

> Contrib Analyzer Setters should be deprecated and replace with ctor arguments
> -----------------------------------------------------------------------------
>
> Key: LUCENE-2051
> URL: https://issues.apache.org/jira/browse/LUCENE-2051
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9.1
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-2051.patch, LUCENE-2051.patch
>
>
> Some analyzers in contrib provide setters for stopword / stem exclusion sets / hashtables etc. Those setters should be deprecated as they yield unexpected behaviour. The way they work is they set the reusable token stream instance to null in a thread local cache which only affects the tokenstream in the current thread. Analyzers itself should be immutable except of the threadlocal.
> will attach a patch soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.