
ken.mccracken at gmail
Mar 20, 2012, 3:20 PM
Post #3 of 3
(188 views)
Permalink
|
Hi Mike, Thanks for the response. We will do some more investigation. We will look to see if there is a clean way to suppress at least the extra 3 array allocations. Cheers, -Ken On Mar 19, 2012, at 5:32 PM, Michael McCandless <lucene [at] mikemccandless > wrote: > Hmm, I agree we could be more RAM efficient if the field is DOCS_ONLY. > > We shouldn't have to allocate/use docFreqs, lastDocCodes, > lastPositions arrays (3 of the 7); the others are still needed, I > think. > > But, that said, you shouldn't hit OOME, as long as your max heap sizes > is large enough (and, your IndexWriterConfig's RAMBufferSizeMB is > small enough); Lucene should simply flush a new segment once the > buffered documents are using too much RAM. > > Hmm, and you don't index massive documents. How many UUIDs per > document? > > Mike McCandless > > http://blog.mikemccandless.com > > > > On Mon, Mar 19, 2012 at 3:29 PM, Ken McCracken <ken.mccracken [at] gmail > > wrote: >> Hi, >> >> I am using lucene-3.5 and getting an OutOfMemoryError on a large >> indexing >> task of 100M documents. I am creating an index with 3 UUIDs as >> separate >> field values. I am using Store.YES on 1 of them and Store.NO on the >> others; I am using Index.NOT_ANALYZED_NO_NORMS on all three; >> explicitly >> setting >> field.setIndexOptions(IndexOptions.DOCS_ONLY); and >> indexWriterConfig.setTermIndexInterval(termIndexInterval); to >> 1024. I am >> trying to index 100M records into my index. >> >> Is there any reason >> FreqProxTermsWriterPerField.FreqProxPostingsArray needs >> to be constructed even though I have the positions etc suppressed? >> It >> seems that the reason I get an OutOfMemoryError is that 7 int[] of >> size >> proportional to number of unique fields are being constructed; >> however, at >> least some of them are probably wasteful given my indexing >> configurations. >> >> Any help is appreciated. >> >> Thanks, >> -Ken >> >> [junit] Error: >> [junit] Exception in thread "Thread-18" >> java.lang.OutOfMemoryError: >> Java heap space >> [junit] at >> org.apache.lucene.index.ParallelPostingsArray.<init> >> (ParallelPostingsArray.java:35) >> [junit] at >> org.apache.lucene.index.FreqProxTermsWriterPerField >> $FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:190) >> [junit] at >> org.apache.lucene.index.FreqProxTermsWriterPerField >> $FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java: >> 204) >> [junit] at >> org.apache.lucene.index.ParallelPostingsArray.grow >> (ParallelPostingsArray.java:48) >> [junit] at >> org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray >> (TermsHashPerField.java:137) >> [junit] at >> org.apache.lucene.index.TermsHashPerField.add >> (TermsHashPerField.java:440) >> [junit] at >> org.apache.lucene.index.DocInverterPerField.processFields >> (DocInverterPerField.java:94) >> [junit] at >> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument >> (DocFieldProcessorPerThread.java:278) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene > For additional commands, e-mail: java-user-help [at] lucene > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|