Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Apr 29, 2012, 11:32 PM

Post #1 of 15 (199 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264688#comment-13264688 ]

David Smiley commented on SOLR-3424:
------------------------------------

And I also noticed that Commons-Codec's Caverphone.class was deprecated for Caverphone2.class so I made that simple change. The docs say the deprecated one simply forwards calls, so there should be no side-effects of this change.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 1:15 AM

Post #2 of 15 (200 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264720#comment-13264720 ]

Robert Muir commented on SOLR-3424:
-----------------------------------

where is the test for the bug?

why does this factory have a cache at all?!


> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 1:35 AM

Post #3 of 15 (195 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264730#comment-13264730 ]

Uwe Schindler commented on SOLR-3424:
-------------------------------------

The whole design of this cache is wrong:
- it should not be static, the resolved class lookups should only be cached per instance
- the cache only caches "resolves" not instantiations of encoders, the JVM caches Class.forName() lookups, so why cache it again?
- To fix the orginal issue, a ConcurrentHashMap would also be fine
- Solr/Lucene use Analyzers which are now enforced to reuse tokenstreams, so the lookup is only done once when IndexWriter starts and a new indexing thread is created

The last point tells me: "remove this cache"

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 2:01 AM

Post #4 of 15 (192 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264741#comment-13264741 ]

Uwe Schindler commented on SOLR-3424:
-------------------------------------

One thing to add:

bq. the cache only caches "resolves" not instantiations of encoders, the JVM caches Class.forName() lookups, so why cache it again?

You may argue, that the lookup may be more expensive, as it uses a try..catch block (first try in expected commons package, later try a full class name). I think the chain of Try...Catch with ClassNoFound should be changed to do it more smart: If the name of encoder contains a period (indexOf('.')>=0), look it up as class name, otherwise prefix it with the commons package name. This way, the JVM cache for loaded classes can be used and the cache is completely useless.


I like in your fix, that it also changed the broken uppercasing: It should only do that for the builtins, class names itsself are case sensitive.

+1 to remove the cache and only keep the static builtins (and please: as Collections.unmodifiableMap!!!), lookup non-builtins without try...catch(ClassNotFound). The error message should only list the builtins and mention that otherwise the name should be a full class name.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 8:31 AM

Post #5 of 15 (186 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264976#comment-13264976 ]

David Smiley commented on SOLR-3424:
------------------------------------

Rob: This is a (minor) thread-safety bug and consequently it's not really feasible to test it without a lot of work and without a guarantee at the end that there is no problem, and so it isn't worth it. Of course if you want to; go for it ;-)

Uwe: Thanks very much for your comments. I wasn't sure what the lifecycle of this class was or if/how it is shared; I looked at the javadocs of TokenFilterFactory just now and I think its clearer. Given that the Factory's init() method is not going to be called frequently, it seems to me that the class name based lookups need not cache at all. And I also like your suggestion of improving the fallback lookup by looking for a '.'. BTW I considered wrapping with unmodifiableMap but didn't bother because it's not exposed outside of this class, which is fairly small too, but I will since it seems to be a big deal to you (three exclamation points). I'll propose another patch later

~ David

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 8:39 AM

Post #6 of 15 (187 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264986#comment-13264986 ]

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Thanks! Thaks for taking care, as I have no time to provide a patch at the moment :(

bq. but I will since it seems to be a big deal to you (three exclamation points)

:-) It's not so important, if it's private to the class! I just don't want public code to expose Collections not intended to be modified in a modifiable way, so i am fine with or without unmodifiable. There are also places in Lucene violating this (or use public arrays, which cannot be protected - like CompositeReader.getSequentialSubReaders()).

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 1, 2012, 12:29 AM

Post #7 of 15 (187 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265696#comment-13265696 ]

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,

looks good, but there is a bug:
The refactoring of the REGISTRY static map was an anonymous inner class before (new HashMap() {{ hashmap ctor code }}; (please note double parens). Now it is assigned to a static field, but the initializer code is an inline ctor of the factory, means the initialization runs on every instantiation.

- add *static* before the anonymous ctor:
{code:java}
private static final Map<String, Class<? extends Encoder>> registry = new HashMap<String, Class<? extends Encoder>>(6);
static {
registry.put(...
{code}

- or leave the initializer as it was before (anonymous HashMap subclass with overridden ctor).

in resolve encoder maybe remove the clazz variable and directly return the result of forName(). I do't like non-final variables initialized with null because that prevents compilation failures on missing initialization and null could be returned - especially with lots of try-catch blocks.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 1, 2012, 8:29 AM

Post #8 of 15 (185 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265862#comment-13265862 ]

David Smiley commented on SOLR-3424:
------------------------------------

Uwe: Excellent catch RE my "static" bug, and I agree on your improvement RE clazz & null.

I tried to ascertain the thread-safety status of Commons-Codec and unfortunately I don't think we can assume it's thread-safe. I sent an email to their list about this just now: http://apache-commons.680414.n4.nabble.com/Thread-safety-of-Commons-Codec-Encoder-tt4600956.html

So I agree that we'll have to insatiate one of these in our create() method instead of caching it.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 2, 2012, 8:02 AM

Post #9 of 15 (183 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266617#comment-13266617 ]

David Smiley commented on SOLR-3424:
------------------------------------

I'll commit it in ~24 hours.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 2, 2012, 11:05 PM

Post #10 of 15 (188 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267229#comment-13267229 ]

Uwe Schindler commented on SOLR-3424:
-------------------------------------

+1

I wanted to review the stock Lucene Analyzer about thread safety, too.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 2, 2012, 11:09 PM

Post #11 of 15 (186 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267233#comment-13267233 ]

Uwe Schindler commented on SOLR-3424:
-------------------------------------

There seems to be no Analyzer in Lucene that uses this class. The phonetic package only contains the plain TokenFilters and those are single-thread only.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 3, 2012, 12:05 AM

Post #12 of 15 (182 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267264#comment-13267264 ]

David Smiley commented on SOLR-3424:
------------------------------------

So it appears the Commons-Codec Encoders are all thread-safe after all:
http://apache-commons.680414.n4.nabble.com/Thread-safety-of-Commons-Codec-Encoder-tt4600956.html

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 3, 2012, 12:27 AM

Post #13 of 15 (189 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267272#comment-13267272 ]

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,
I don't really trust that thread so I would really go with the current solution. In Lucene 4.0, reuse of TokenStreams is *enforced* so there is no cost at all by the reflection and the createInstance. I also improved the reflective method call (which has the side effect of being able to print a useful message when the encoder does not support setMaxLen, the old code throwed unspecified and unhelpful Exception).

In general, the commons encoders might be thread safe, but we use reflection and use any class the user provides and other classes implementing the abstract Encoder interface may not be thread safe.

As it costs us nothing (Class.newInstance() is cheap and only called once a new thread is used and the SolrAnalyzer class has to create new TokenStreamComponents), we should stay with the current code. The thread-safety is then ensured by the Analyzer class (it uses thread-locals to reuse TokenStreams).

What do you think about my latest patch?

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 3, 2012, 7:46 PM

Post #14 of 15 (181 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268053#comment-13268053 ]

David Smiley commented on SOLR-3424:
------------------------------------

Looks good Uwe.
One minor nit is that the URL in the class Javadoc to Commons-Codec's Language package should be this URL:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/language/package-summary.html which is the one to 1.6; the existing link is older with fewer classes. We've got this link in both the FilterFactory & Filter.

Feel free to commit if you want or just leave it to me.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 5, 2012, 2:29 PM

Post #15 of 15 (189 views)
Permalink
[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269064#comment-13269064 ]

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,
just commit it, I am about to leave for business trips (including Lucene Rev) so have not much time.
Uwe

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
> Key: SOLR-3424
> URL: https://issues.apache.org/jira/browse/SOLR-3424
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 3.6, 4.0
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads, which isn't just the common path but also the error messages which dump the registry into the error message.
> I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.