Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Nov 17, 2009, 5:52 PM

Post #1 of 15 (851 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-2075:
--------------------------------

Attachment: ConcurrentLRUCache.java

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 18, 2009, 9:54 AM

Post #2 of 15 (808 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-2075:
---------------------------------

Attachment: LUCENE-2075.patch

Here's a simplified version of Solr's ConcurrentLRUCache.

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 18, 2009, 10:38 AM

Post #3 of 15 (802 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-2075:
---------------------------------

Attachment: LUCENE-2075.patch

Here's a new version extending Cache<K,V>

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 18, 2009, 11:28 AM

Post #4 of 15 (802 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2075:
----------------------------------

Attachment: LUCENE-2075.patch

As PriorityQueue is generified since Lucene 3.0, I added missing generics. The class now compiles without unchecked warnings. I also removed lots of casts and parameterized the missing parts. Also added K type for inner map.

Nice work, even if I do not understand it completely :-)

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 18, 2009, 11:40 AM

Post #5 of 15 (806 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2075:
----------------------------------

Attachment: (was: LUCENE-2075.patch)

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 18, 2009, 11:44 AM

Post #6 of 15 (801 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-2075:
---------------------------------

Attachment: LUCENE-2075.patch

New patch attached - while refreshing my memory on the exact algorithm, I noticed a bug :-)
Things won't work well after 2B accesses since Integer.MAX_VALUE is used instead of Long.MAX_VALUE.
Need to go fix Solr now too :-)

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 18, 2009, 12:26 PM

Post #7 of 15 (802 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2075:
----------------------------------

Attachment: LUCENE-2075.patch

Patch that fixes the bug in javac with typed arrays (because of that it does not allow typed arrays without unchecked casts...).

I fixed the PQueue by returning a List<CacheEntry<K,V>> values() and also mad the private maxSize in the PriorityQueue protected. So it does not need to implement an own insertWithOverflow. As this class moves to Lucene Core, we should not make such bad hacks.

We need a good testcase for the whole cache class. It was hard to me to find a good test that hits the PQueue at all (its only used in special cases). Hard stuff :(

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 19, 2009, 12:53 AM

Post #8 of 15 (790 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2075:
----------------------------------

Attachment: LUCENE-2075.patch

Updated patch, adds missing @Overrides, we added in 3.0 and also makes the private PQ implement Iterable, the markAndSweep code is now synactical sugar :-)

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 20, 2009, 6:39 AM

Post #9 of 15 (758 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2075:
---------------------------------------

Attachment: LUCENE-2075.patch

First cut at a benchmark. First, download
http://concurrentlinkedhashmap.googlecode.com/files/clhm-production.jar
and put into your lib subdir, then run "ant -lib
lib/clhm-production.jar compile-core", then run it something like
this:

{code}
java -server -Xmx1g -Xms1g -cp build/classes/java:lib/clhm-production.jar org.apache.lucene.util.cache.LRUBench 4 5.0 0.0 1024 1024
{code}

The args are:

* numThreads

* runSec

* sharePct -- what %tg of the terms should be shared b/w the threads

* cacheSize

* termCountPerThread -- how many terms each thread will cycle through

The benchmark first sets up arrays of strings, per thread, based
termsCountPerThread & sharePct. Then each thread steps through the
array, and for each entry, tries to get the string, and if it's not
present, puts it. It records the hit & miss count, and prints summary
stats in the end, doing 3 rounds.

To mimic Lucene, each entry is tested twice in a row, ie, the 2nd time
we test the entry, it should be a hit. Ie we expect a hit rate of 50%
if sharePct is 0.

Here's my output from the above command line, using Java 1.6.0_14 (64
bit) on OpenSolaris:

{code}
numThreads=4 runSec=5.0 sharePct=0.0 cacheSize=1024 termCountPerThread=1024

LRU cache size is 1024; each thread steps through 1024 strings; 0 of which are common

round 0
sync(LinkedHashMap): Mops/sec=2.472 hitRate=50.734
DoubleBarreLRU: Mops/sec=20.502 hitRate=50
ConcurrentLRU: Mops/sec=17.936 hitRate=84.409
ConcurrentLinkedHashMap: Mops/sec=1.248 hitRate=50.033

round 1
sync(LinkedHashMap): Mops/sec=2.766 hitRate=50.031
DoubleBarreLRU: Mops/sec=17.66 hitRate=50
ConcurrentLRU: Mops/sec=17.82 hitRate=83.726
ConcurrentLinkedHashMap: Mops/sec=1.266 hitRate=50.331

round 2
sync(LinkedHashMap): Mops/sec=2.714 hitRate=50.168
DoubleBarreLRU: Mops/sec=17.912 hitRate=50
ConcurrentLRU: Mops/sec=17.866 hitRate=84.156
ConcurrentLinkedHashMap: Mops/sec=1.26 hitRate=50.254
{code}

NOTE: I'm not sure about the correctness of DoubleBarrelLRU -- I just
quickly wrote it.

Also, the results for ConcurrentLRUCache are invalid (its hit rate is
way too high) -- I think this is because its eviction process can take
a longish amount of time, which temporarily allows the map to hold way
too many entries, and means it's using up alot more transient RAM than
it should.

In theory DoubleBarrelLRU should be vulnerable to the same issue, but
in practice it seems to affect it much less (I guess because
CHM.clear() must be very fast).

I'm not sure how to fix the benchmark to workaround that... maybe we
bring back the cleaning thread (from Solr's version), and give it a
high priority?

Another idea: I wonder whether a simple cache-line like cache would be
sufficient. Ie, we hash to a fixed slot and we evict whatever is
there.


> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 23, 2009, 3:45 AM

Post #10 of 15 (682 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2075:
---------------------------------------

Attachment: LUCENE-2075.patch

Attached patch; all tests pass:

* Switches the terms dict cache away from per-thread cache to shared
(DoubleBarrelLRU) cache

* Still uses the cache when seeking the term enum

However, I'm baffled: I re-ran the BenchWildcard test and saw no
measurable improvement in ????NNN query (yet, I confirmed it's now
storing into and then hitting on the cache), but I did see a gain in
the *N query (from ~4300 msec before to ~3500 msec) which I can't
explain because that query doens't use the cache at all (just the
linear scan). I'm confused....

Robert maybe you can try this patch plus automaton patch and see if
you see this same odd behavior?


> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 23, 2009, 5:22 AM

Post #11 of 15 (681 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2075:
----------------------------------

Attachment: LUCENE-2075.patch

I updated the patch to add overrides. I also had to add one SupressWarnings, because the get() method does an unchecked cast (because it modifies the map, which is not in the contract of get(), but it's safe, because it only adds the key to the second map, if the first map already contains it, and therefore the key has correct type).

I will start now my tests with NRQ.

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 23, 2009, 5:48 AM

Post #12 of 15 (682 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2075:
---------------------------------------

Attachment: LUCENE-2075.patch

Thanks Uwe!

I attached another one: made DBLRU final, tweaked javadocs, fixed spelling in the saturation comment, add you guys to the CHANGES entry.

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 23, 2009, 10:15 AM

Post #13 of 15 (679 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2075:
---------------------------------------

Attachment: LUCENE-2075.patch

New patch, folding in Yonik's suggestions, adding a unit test (carried
over from TestSimpleLRUCache -- I'll "svn mv" when I commit it), and
deprecating SimpleLRUCache.


> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 24, 2009, 11:11 AM

Post #14 of 15 (654 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2075:
---------------------------------------

Attachment: LUCENE-2075.patch

New patch attached -- restores (deprecated) TestSimpleLRUCache. I think this one is ready to commit?

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 24, 2009, 11:35 AM

Post #15 of 15 (648 views)
Permalink
[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2075:
---------------------------------------

Attachment: LUCENE-2075.patch

Also deprecates SimpleMapCache.

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
> Key: LUCENE-2075
> URL: https://issues.apache.org/jira/browse/LUCENE-2075
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage. You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap. One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary). You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary. Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.