Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Apr 22, 2012, 4:27 PM

Post #1 of 7 (168 views)
Permalink
[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache

[ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259301#comment-13259301 ]

Shawn Heisey commented on SOLR-3393:
------------------------------------

After all the bugs are worked out of the implementation covered by this issue, I can see a future scenario where the existing (slow) LFUCache is scrapped, FastLFUCache is renamed LFUCache, and a new FastLFUCache implementation that uses a Concurrent class is created. If the committers would prefer that I go ahead and scrap LFUCache now and just use that name, let me know.

I have no plans to work on SOLR-2889, so it may be best to just close that issue. I don't know if SOLR-2906 should be changed from a subtask to a full task.


> Implement an optimized LFUCache
> -------------------------------
>
> Key: SOLR-3393
> URL: https://issues.apache.org/jira/browse/SOLR-3393
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 3.6, 4.0
> Reporter: Shawn Heisey
> Priority: Minor
> Fix For: 4.0
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing FastLFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 23, 2012, 6:59 AM

Post #2 of 7 (156 views)
Permalink
[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259609#comment-13259609 ]

Shawn Heisey commented on SOLR-3393:
------------------------------------

I've been working on this. I've come to realize that I don't completely understand how CacheRegenerator works. I suspect that it is geared around LRU caches and that the new cache won't have any of the frequency information from the old one, it will just put the entries into the cache as if they were new. Can anyone confirm this? If I am right, I think my SOLR-2906 implementation is incorrect at warm time.

After the new cache is regenerated, should I go through the new cache, grab the frequency information from the old cache with each key, and fix the new cache up?

Yonik, you were the one that came up with the timeDecay option for SOLR-2906. It was added to the markAndSweep routine (which evicts old entries). Should it also be in the warm routine, or possibly only exist in the warm routine?


> Implement an optimized LFUCache
> -------------------------------
>
> Key: SOLR-3393
> URL: https://issues.apache.org/jira/browse/SOLR-3393
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 3.6, 4.0
> Reporter: Shawn Heisey
> Priority: Minor
> Fix For: 4.0
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing LFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 24, 2012, 3:36 PM

Post #3 of 7 (155 views)
Permalink
[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261098#comment-13261098 ]

Shawn Heisey commented on SOLR-3393:
------------------------------------

Further work, but no patch yet because it requires a bunch of test tweaks: I have split timeDecay into evictDecay and warmDecay. The timeDecay option still works, and sets both to true. In the absence of the timeDecay option, I've made evictDecay default to false and warmDecay default to true.

In order to cause elements to decay over time, every entry in the cache must be removed from one LinkedHashSet and added to another. This is probably pretty fast for this implementation, but it does still require time. My use case scenario will have commits relatively often, up to once a minute. For me, doing the decay at warm time is enough, and results in fewer items to touch. To have it happen at eviction time is overhead I don't need, but it would be correct for some use cases. What are some other people's opinions about the default settings?


> Implement an optimized LFUCache
> -------------------------------
>
> Key: SOLR-3393
> URL: https://issues.apache.org/jira/browse/SOLR-3393
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 3.6, 4.0
> Reporter: Shawn Heisey
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3393.patch, SOLR-3393.patch
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing LFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 24, 2012, 6:10 PM

Post #4 of 7 (161 views)
Permalink
[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261197#comment-13261197 ]

Hoss Man commented on SOLR-3393:
--------------------------------

bq. I will attempt to make a new O(1) cache called FastLFUCache

{{#OhDearGodPleaseNotAnotherClassWithFastInTheName}}

Please, please, please lets end the madness of subjective adjectives in class names ... if it's an LFU cache wrapped around a "hawtdb" why don't we just call it "HawtDbLFUCache" ?

bq. I've been working on this. I've come to realize that I don't completely understand how CacheRegenerator works. I suspect that it is geared around LRU caches and that the new cache won't have any of the frequency information from the old one, it will just put the entries into the cache as if they were new. Can anyone confirm this?

The idea behind the CacheRegenerator API is to be as simple as possible and agnostic to:
* the Cache Impl (ie: LRUCache vs LFUCache vs HawtDbLFUCache)
* the cache usage (ie: Query->DocSets vs Query->DocList vs String->MyCustomClass)
* the means of generating values from keys (ie: how do you know which MyCustomClass should be cached for which String)

... so you can have a custom (named) cache instance declared in your solrconfig.xml with your own MySpecialCacheRegenerator that knows about your usecase and might do something special with the keys/values (like: short-circut part of the generation if it can see the data hasn't changed, or read from authoritative data files outside of solr, etc...) and then use *any* Cache impl class that you're heart desires, and things will still work right.

bq. After the new cache is regenerated, should I go through the new cache, grab the frequency information from the old cache with each key, and fix the new cache up?

you certainly could -- when {{(new HawtDbLFUCache(...)).warm(...)}} is called, it needs to delegate to the regenerator for pulling values from the "old" cache, but that doesn't mean it can't also directly ask the "old" cache instance for stats about each of the old keys as it loops over them -- remember: the "new" cache is the one inspecting the "old" cache and deciding what things to ask the regenerator to generate.

But i question whether you really want any sort of stats from the "old" cache copied over to the "new" cache. it is, after all, a completely new cache -- with new usage. should the stats really be preserved forever? regardless of how popular an object was in the "old" cache instance, should we automatically assume it's equally popular in the "new" cache instance?

> Implement an optimized LFUCache
> -------------------------------
>
> Key: SOLR-3393
> URL: https://issues.apache.org/jira/browse/SOLR-3393
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 3.6, 4.0
> Reporter: Shawn Heisey
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3393.patch, SOLR-3393.patch
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing LFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 24, 2012, 6:32 PM

Post #5 of 7 (154 views)
Permalink
[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261203#comment-13261203 ]

Shawn Heisey commented on SOLR-3393:
------------------------------------

Hoss, thanks for your comments.

Since I was the one who wrote the previous version of this, I just opted to completely replace LFUCache rather than go with a "Fast" appendage. I hadn't considered the naming problem in quite the same light as "yet another fast*" name, but it did seem like a bad idea.

Yonik had the same concern about stats being preserved forever on SOLR-2906, and he helped with a decay option to deal with that. I think the decay is a good idea. There was only one kind of decay before, applied to all elements anytime there were evictions, defaulted to on.

In a new version of the patch for this issue (which I have not yet uploaded) I have now included two kinds of decay. There is the kind applied at eviction, now defaulting to off, and one applied at warming, defaulting to on. I will expand the documentation on the Wiki, making it clear that turning off the decay option will probably lead to an undesirable cache state. Currently the decay is implemented with a bit shift (>>> 1), I may make another option available that just subtracts one, and we can bikeshed about which option should be default.


> Implement an optimized LFUCache
> -------------------------------
>
> Key: SOLR-3393
> URL: https://issues.apache.org/jira/browse/SOLR-3393
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 3.6, 4.0
> Reporter: Shawn Heisey
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3393.patch, SOLR-3393.patch
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing LFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 27, 2012, 12:38 AM

Post #6 of 7 (152 views)
Permalink
[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263445#comment-13263445 ]

Shawn Heisey commented on SOLR-3393:
------------------------------------

Is this too big a change to be included in 3.6.1? I will likely be trying this out with 3.6, once I clear my plate a little bit.

> Implement an optimized LFUCache
> -------------------------------
>
> Key: SOLR-3393
> URL: https://issues.apache.org/jira/browse/SOLR-3393
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 3.6, 4.0
> Reporter: Shawn Heisey
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing LFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 27, 2012, 11:19 PM

Post #7 of 7 (152 views)
Permalink
[jira] [Commented] (SOLR-3393) Implement an optimized LFUCache [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264231#comment-13264231 ]

Shawn Heisey commented on SOLR-3393:
------------------------------------

The commit for SOLR-1893 means that my patch needs more work. I have a business trip next week that takes priority, though.

> Implement an optimized LFUCache
> -------------------------------
>
> Key: SOLR-3393
> URL: https://issues.apache.org/jira/browse/SOLR-3393
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 3.6, 4.0
> Reporter: Shawn Heisey
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing LFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.