
tsturge at metaweb
Jul 25, 2007, 11:41 AM
Views: 1217
Permalink
|
|
java gc with a frequently changing index?
|
|
Hi, I am indexing a set of constantly changing documents. The change rate is moderate (about 10 docs/sec over a 10M document collection with a 6G total size) but I want to be right up to date (ideally within a second but within 5 seconds is acceptable) with the index. Right now I have code that adds new documents to the index and deletes old ones using updateDocument() in the 2.1 IndexWriter. In order to see the changes, I need to recreate the IndexReader/IndexSearcher every second or so. I am not calling optimize() on the index in the writer, and the mergeFactor is 10. The problem I am facing is that java gc is terrible at collecting the IndexSearchers I am discarding. I usually have a 3msec query time, but I get gc pauses of 300msec to 3 sec (I assume is is collecting the "tenured" generation in these pauses, which is my old IndexSearcher) I've tried "-Xincgc", "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" and calling System.gc() right after I close the old index without much luck (I get the pauses down to 1sec, but get 3x as many. I want < 25 msec pauses). So my question is, should I be avoiding reloading my index in this way? Should I keep a separate IndexReader (which only deletes old documents) and one for new documents? Is there a standard technique for a quickly changing index? Thanks, Tim --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org For additional commands, e-mail: java-user-help[at]lucene.apache.org
|