
lucene at mikemccandless
Mar 17, 2009, 5:11 AM
Post #5 of 5
(879 views)
Permalink
|
|
Re: problems with large Lucene index (reason found)
[In reply to]
|
|
I've opened: https://issues.apache.org/jira/browse/LUCENE-1566 for this. Cameron, could you attach your patch to that issue? Thanks. Mike lucene [at] digiatlas wrote: > > Yes, I overrode the read() method in > FSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mb > chunks and do an arraycopy() into the array created by Lucene. It > now works with any heap size and doesn't get OOM. > > There may be other areas this could happen in the Lucene code > (although at present it seems to be working fine for me on our > largest, 17Gb, index but I haven't tried accessing data yet - only > getting the result size - so perhaps there are other calls to read() > with large buffer sizes). > > As this bug does not look like it will be fixed in the near future, > it might be an idea to put in place a fix in the Lucene code. I > think it would be safe to read in chunks of up to 100Mb without a > problem and I don't think it will affect performance to any great > degree. > > It's pleasing to see that Lucene can easily handle such huge > indexes, although this bug is obviously quite an impediment to doing > so. > > regards, > Cameron Newham > > Quoting Michael McCandless <lucene [at] mikemccandless>: > >> >> Gak, what a horrible bug! >> >> It seems widespread (JRE 1.5, 1.6, on Linux & Windows OSs). And it's >> been open for almost 2.5 years. I just added a comment & voted for >> the >> bug. >> >> Does it also occur on a 64 bit JRE? >> >> If you still allocate the full array, but read several smaller chunks >> into it, do you still hit the bug? >> >> Mike >> >> lucene [at] digiatlas wrote: >> >>> >>> I now know the cause of the problem. Increasing heap space >>> actually breaks Lucene when reading large indexes. >>> >>> Details on why can be found here: >>> >>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 >>> >>> >>> Lucene is trying to read a huge block (about 2580Mb) at the point >>> of failure. While it allocates the required bytes in method >>> MultiSegmentReader.norms(), line 334, just fine, it is only when >>> it attempts to use this array in a call to >>> RandomAccessFile.readBytes() that it gets OOM. This is caused by >>> a bug in the native code for the Java IO. >>> >>> As observed in the bug report, large heap space actually causes >>> the bug to appear. When I reduced my heap from 1200M to 1000M >>> the exception was never generated and the code completed >>> correctly and it reported the correct number of search hits in >>> the Hibernate Search version of my program. >>> >>> This isn't good - I need as much memory as possible because I >>> intend to run my search as a web service. >>> >>> The work-around would be to read the file in small chunks, but I >>> am not familiar with the Lucene code so I am unsure how that >>> would be done in a global sense (i.e.: does it really need to >>> allocate a buffer of that size in MultiSegmentReader?) >>> >>> The obvious solution (which I haven't tried yet) would be to >>> patch the point in FSDirectory where the java IO read occurs - >>> looping with a small buffer for the read and then concatenating >>> the result back into Lucene's byte array. >>> >>> Thanks for the comments on this problem from people on this list. >>> >>> >>> >>> Quoting Ted Dunning <ted.dunning [at] gmail>: >>> >>>> try running with verbose gc. That will give you more details >>>> about what is >>>> happening. >>>> >>>> Even better, run with jconsole on the side so that you get really >>>> detailed >>>> information on memory pools. >>>> >>>> On Thu, Mar 12, 2009 at 7:30 AM, <lucene [at] digiatlas> wrote: >>>> >>>>> Thanks Mike and Jokin for your comments on the memory problem. I >>>>> have >>>>> submitted the query to the Hibernate Search list although I >>>>> haven't seen a >>>>> response yet. >>>>> >>>>> In the meantime I did my own investigating in the code (I'd >>>>> rather have >>>>> avoided this!). I'm seeing results that don't make any sense and >>>>> maybe >>>>> someone here with more experience of Lucene and the way memory >>>>> is allocated >>>>> by the JVM may shed light on, what to me, are quite illogical >>>>> observations. >>>>> >>>>> As you may recall I had a stand-alone Lucene search and a >>>>> Hibernate Search >>>>> version. Looking in the HS code did not shed any light on the >>>>> issue. I took >>>>> my stand-alone Lucene code and put it in a method and replaced >>>>> the search in >>>>> the HS class (the constructor of QueryHits.java) with the call >>>>> to my method. >>>>> Bear in mind this method is the same code as posted in my >>>>> earlier message - >>>>> it sets up the Lucene search from scratch (i.e.: no data >>>>> structures created >>>>> by HS were used). So, effectively I was calling my stand-alone >>>>> code after >>>>> any setup done by Hibernate and any memory it may have allocated >>>>> (which >>>>> turned out to be a few Mb). >>>>> >>>>> I get OOM! Printing the free memory at this point shows bags of >>>>> memory >>>>> left. Indeed, the same free memory (+/- a few Mb) as the stand- >>>>> alone Lucene >>>>> version! >>>>> >>>>> I then instrumented the Lucene method where the OOM is occuring >>>>> (FSDirectory.readInternal()). I cannot understand the results >>>>> I am seeing. >>>>> Below is a snippet of the output of each with the code around >>>>> FSDirectory >>>>> line 598 as follows: >>>>> >>>>> ... >>>>> do { >>>>> long tot = Runtime.getRuntime().totalMemory(); >>>>> long free =Runtime.getRuntime().freeMemory(); >>>>> System.out.println("LUCENE: offset="+offset+" total="+total+" >>>>> len-total="+(len-total)+" free mem="+free+" used ="+(tot- >>>>> free)); >>>>> int i = file.read(b, offset+total, len-total); >>>>> ... >>>>> >>>>> >>>>> >>>>> The stand-alone version: >>>>> >>>>> ... >>>>> LUCENE: offset=0 total=0 len-total=401 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=883 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 >>>>> used >>>>> =330080544 >>>>> LUCENE: offset=0 total=0 len-total=209000000 free >>>>> mem=631122912 used >>>>> =617534496 >>>>> LUCENE: offset=209000000 total=0 len-total=20900000 free >>>>> mem=631122912 >>>>> used =617534496 >>>>> LUCENE: offset=229900000 total=0 len-total=20900000 free >>>>> mem=631122912 >>>>> used =617534496 >>>>> LUCENE: offset=250800000 total=0 len-total=20900000 free >>>>> mem=631122912 >>>>> used =617534496 >>>>> ... >>>>> completes successfully! >>>>> >>>>> >>>>> The method called via Hibernate Search: >>>>> >>>>> ... >>>>> LUCENE: offset=0 total=0 len-total=401 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=883 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 >>>>> used >>>>> =334892152 >>>>> LUCENE: offset=0 total=0 len-total=209000000 free >>>>> mem=636731528 used >>>>> =622346104 >>>>> Exception in thread "main" java.lang.OutOfMemoryError >>>>> at java.io.RandomAccessFile.readBytes(Native Method) >>>>> at java.io.RandomAccessFile.read(Unknown Source) >>>>> at >>>>> org.apache.lucene.store.FSDirectory >>>>> $FSIndexInput.readInternal(FSDirectory.java:599) >>>>> ... fails with exception! >>>>> >>>>> >>>>> Note that the HS version has slightly more free memory because I >>>>> ran it >>>>> with -Xms1210M as opposed to -Xms1200M for the stand-alone to >>>>> offset any >>>>> memory used by HS when it starts up. >>>>> >>>>> As you can see, these are identical for all practical purposes. >>>>> So what >>>>> gives? >>>>> >>>>> I'm stumped, so any suggestions appreciated. >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> Quoting Michael McCandless <lucene [at] mikemccandless>: >>>>> >>>>> >>>>>> Unfortunately, I'm not familiar with exactly what Hibernate >>>>>> search does >>>>>> with the Lucene APIs. >>>>>> >>>>>> It must be doing something beyond what your standalone Lucene >>>>>> test case >>>>>> does. >>>>>> >>>>>> Maybe ask this question on the Hibernate list? >>>>>> >>>>>> Mike >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Ted Dunning, CTO >>>> DeepDyve >>>> >>>> 111 West Evelyn Ave. Ste. 202 >>>> Sunnyvale, CA 94086 >>>> www.deepdyve.com >>>> 408-773-0110 ext. 738 >>>> 858-414-0013 (m) >>>> 408-773-0220 (fax) >>>> >>> >>> >>> > > >
|