Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

Re: problems with large Lucene index (reason found)

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


lucene at digiatlas

Mar 13, 2009, 2:38 AM

Post #1 of 5 (973 views)
Permalink
Re: problems with large Lucene index (reason found)

I now know the cause of the problem. Increasing heap space actually
breaks Lucene when reading large indexes.

Details on why can be found here:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546


Lucene is trying to read a huge block (about 2580Mb) at the point of
failure. While it allocates the required bytes in method
MultiSegmentReader.norms(), line 334, just fine, it is only when it
attempts to use this array in a call to RandomAccessFile.readBytes()
that it gets OOM. This is caused by a bug in the native code for the
Java IO.

As observed in the bug report, large heap space actually causes the
bug to appear. When I reduced my heap from 1200M to 1000M the
exception was never generated and the code completed correctly and it
reported the correct number of search hits in the Hibernate Search
version of my program.

This isn't good - I need as much memory as possible because I intend
to run my search as a web service.

The work-around would be to read the file in small chunks, but I am
not familiar with the Lucene code so I am unsure how that would be
done in a global sense (i.e.: does it really need to allocate a buffer
of that size in MultiSegmentReader?)

The obvious solution (which I haven't tried yet) would be to patch the
point in FSDirectory where the java IO read occurs - looping with a
small buffer for the read and then concatenating the result back into
Lucene's byte array.

Thanks for the comments on this problem from people on this list.



Quoting Ted Dunning <ted.dunning [at] gmail>:

> try running with verbose gc. That will give you more details about what is
> happening.
>
> Even better, run with jconsole on the side so that you get really detailed
> information on memory pools.
>
> On Thu, Mar 12, 2009 at 7:30 AM, <lucene [at] digiatlas> wrote:
>
>> Thanks Mike and Jokin for your comments on the memory problem. I have
>> submitted the query to the Hibernate Search list although I haven't seen a
>> response yet.
>>
>> In the meantime I did my own investigating in the code (I'd rather have
>> avoided this!). I'm seeing results that don't make any sense and maybe
>> someone here with more experience of Lucene and the way memory is allocated
>> by the JVM may shed light on, what to me, are quite illogical observations.
>>
>> As you may recall I had a stand-alone Lucene search and a Hibernate Search
>> version. Looking in the HS code did not shed any light on the issue. I took
>> my stand-alone Lucene code and put it in a method and replaced the search in
>> the HS class (the constructor of QueryHits.java) with the call to my method.
>> Bear in mind this method is the same code as posted in my earlier message -
>> it sets up the Lucene search from scratch (i.e.: no data structures created
>> by HS were used). So, effectively I was calling my stand-alone code after
>> any setup done by Hibernate and any memory it may have allocated (which
>> turned out to be a few Mb).
>>
>> I get OOM! Printing the free memory at this point shows bags of memory
>> left. Indeed, the same free memory (+/- a few Mb) as the stand-alone Lucene
>> version!
>>
>> I then instrumented the Lucene method where the OOM is occuring
>> (FSDirectory.readInternal()). I cannot understand the results I am seeing.
>> Below is a snippet of the output of each with the code around FSDirectory
>> line 598 as follows:
>>
>> ...
>> do {
>> long tot = Runtime.getRuntime().totalMemory();
>> long free =Runtime.getRuntime().freeMemory();
>> System.out.println("LUCENE: offset="+offset+" total="+total+"
>> len-total="+(len-total)+" free mem="+free+" used ="+(tot-free));
>> int i = file.read(b, offset+total, len-total);
>> ...
>>
>>
>>
>> The stand-alone version:
>>
>> ...
>> LUCENE: offset=0 total=0 len-total=401 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=883 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>> =330080544
>> LUCENE: offset=0 total=0 len-total=209000000 free mem=631122912 used
>> =617534496
>> LUCENE: offset=209000000 total=0 len-total=20900000 free mem=631122912
>> used =617534496
>> LUCENE: offset=229900000 total=0 len-total=20900000 free mem=631122912
>> used =617534496
>> LUCENE: offset=250800000 total=0 len-total=20900000 free mem=631122912
>> used =617534496
>> ...
>> completes successfully!
>>
>>
>> The method called via Hibernate Search:
>>
>> ...
>> LUCENE: offset=0 total=0 len-total=401 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=883 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>> =334892152
>> LUCENE: offset=0 total=0 len-total=209000000 free mem=636731528 used
>> =622346104
>> Exception in thread "main" java.lang.OutOfMemoryError
>> at java.io.RandomAccessFile.readBytes(Native Method)
>> at java.io.RandomAccessFile.read(Unknown Source)
>> at
>> org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:599)
>> ... fails with exception!
>>
>>
>> Note that the HS version has slightly more free memory because I ran it
>> with -Xms1210M as opposed to -Xms1200M for the stand-alone to offset any
>> memory used by HS when it starts up.
>>
>> As you can see, these are identical for all practical purposes. So what
>> gives?
>>
>> I'm stumped, so any suggestions appreciated.
>>
>> Thanks.
>>
>>
>> Quoting Michael McCandless <lucene [at] mikemccandless>:
>>
>>
>>> Unfortunately, I'm not familiar with exactly what Hibernate search does
>>> with the Lucene APIs.
>>>
>>> It must be doing something beyond what your standalone Lucene test case
>>> does.
>>>
>>> Maybe ask this question on the Hibernate list?
>>>
>>> Mike
>>>
>>
>>
>>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> www.deepdyve.com
> 408-773-0110 ext. 738
> 858-414-0013 (m)
> 408-773-0220 (fax)
>


lucene at mikemccandless

Mar 13, 2009, 2:55 AM

Post #2 of 5 (907 views)
Permalink
Re: problems with large Lucene index (reason found) [In reply to]

Gak, what a horrible bug!

It seems widespread (JRE 1.5, 1.6, on Linux & Windows OSs). And it's
been open for almost 2.5 years. I just added a comment & voted for
the bug.

Does it also occur on a 64 bit JRE?

If you still allocate the full array, but read several smaller chunks
into it, do you still hit the bug?

Mike

lucene [at] digiatlas wrote:

>
> I now know the cause of the problem. Increasing heap space actually
> breaks Lucene when reading large indexes.
>
> Details on why can be found here:
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
>
>
> Lucene is trying to read a huge block (about 2580Mb) at the point of
> failure. While it allocates the required bytes in method
> MultiSegmentReader.norms(), line 334, just fine, it is only when it
> attempts to use this array in a call to RandomAccessFile.readBytes()
> that it gets OOM. This is caused by a bug in the native code for the
> Java IO.
>
> As observed in the bug report, large heap space actually causes the
> bug to appear. When I reduced my heap from 1200M to 1000M the
> exception was never generated and the code completed correctly and
> it reported the correct number of search hits in the Hibernate
> Search version of my program.
>
> This isn't good - I need as much memory as possible because I intend
> to run my search as a web service.
>
> The work-around would be to read the file in small chunks, but I am
> not familiar with the Lucene code so I am unsure how that would be
> done in a global sense (i.e.: does it really need to allocate a
> buffer of that size in MultiSegmentReader?)
>
> The obvious solution (which I haven't tried yet) would be to patch
> the point in FSDirectory where the java IO read occurs - looping
> with a small buffer for the read and then concatenating the result
> back into Lucene's byte array.
>
> Thanks for the comments on this problem from people on this list.
>
>
>
> Quoting Ted Dunning <ted.dunning [at] gmail>:
>
>> try running with verbose gc. That will give you more details about
>> what is
>> happening.
>>
>> Even better, run with jconsole on the side so that you get really
>> detailed
>> information on memory pools.
>>
>> On Thu, Mar 12, 2009 at 7:30 AM, <lucene [at] digiatlas> wrote:
>>
>>> Thanks Mike and Jokin for your comments on the memory problem. I
>>> have
>>> submitted the query to the Hibernate Search list although I
>>> haven't seen a
>>> response yet.
>>>
>>> In the meantime I did my own investigating in the code (I'd rather
>>> have
>>> avoided this!). I'm seeing results that don't make any sense and
>>> maybe
>>> someone here with more experience of Lucene and the way memory is
>>> allocated
>>> by the JVM may shed light on, what to me, are quite illogical
>>> observations.
>>>
>>> As you may recall I had a stand-alone Lucene search and a
>>> Hibernate Search
>>> version. Looking in the HS code did not shed any light on the
>>> issue. I took
>>> my stand-alone Lucene code and put it in a method and replaced the
>>> search in
>>> the HS class (the constructor of QueryHits.java) with the call to
>>> my method.
>>> Bear in mind this method is the same code as posted in my earlier
>>> message -
>>> it sets up the Lucene search from scratch (i.e.: no data
>>> structures created
>>> by HS were used). So, effectively I was calling my stand-alone
>>> code after
>>> any setup done by Hibernate and any memory it may have allocated
>>> (which
>>> turned out to be a few Mb).
>>>
>>> I get OOM! Printing the free memory at this point shows bags of
>>> memory
>>> left. Indeed, the same free memory (+/- a few Mb) as the stand-
>>> alone Lucene
>>> version!
>>>
>>> I then instrumented the Lucene method where the OOM is occuring
>>> (FSDirectory.readInternal()). I cannot understand the results I
>>> am seeing.
>>> Below is a snippet of the output of each with the code around
>>> FSDirectory
>>> line 598 as follows:
>>>
>>> ...
>>> do {
>>> long tot = Runtime.getRuntime().totalMemory();
>>> long free =Runtime.getRuntime().freeMemory();
>>> System.out.println("LUCENE: offset="+offset+" total="+total+"
>>> len-total="+(len-total)+" free mem="+free+" used ="+(tot-free));
>>> int i = file.read(b, offset+total, len-total);
>>> ...
>>>
>>>
>>>
>>> The stand-alone version:
>>>
>>> ...
>>> LUCENE: offset=0 total=0 len-total=401 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=883 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>> =330080544
>>> LUCENE: offset=0 total=0 len-total=209000000 free
>>> mem=631122912 used
>>> =617534496
>>> LUCENE: offset=209000000 total=0 len-total=20900000 free
>>> mem=631122912
>>> used =617534496
>>> LUCENE: offset=229900000 total=0 len-total=20900000 free
>>> mem=631122912
>>> used =617534496
>>> LUCENE: offset=250800000 total=0 len-total=20900000 free
>>> mem=631122912
>>> used =617534496
>>> ...
>>> completes successfully!
>>>
>>>
>>> The method called via Hibernate Search:
>>>
>>> ...
>>> LUCENE: offset=0 total=0 len-total=401 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=883 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>> =334892152
>>> LUCENE: offset=0 total=0 len-total=209000000 free
>>> mem=636731528 used
>>> =622346104
>>> Exception in thread "main" java.lang.OutOfMemoryError
>>> at java.io.RandomAccessFile.readBytes(Native Method)
>>> at java.io.RandomAccessFile.read(Unknown Source)
>>> at
>>> org.apache.lucene.store.FSDirectory
>>> $FSIndexInput.readInternal(FSDirectory.java:599)
>>> ... fails with exception!
>>>
>>>
>>> Note that the HS version has slightly more free memory because I
>>> ran it
>>> with -Xms1210M as opposed to -Xms1200M for the stand-alone to
>>> offset any
>>> memory used by HS when it starts up.
>>>
>>> As you can see, these are identical for all practical purposes.
>>> So what
>>> gives?
>>>
>>> I'm stumped, so any suggestions appreciated.
>>>
>>> Thanks.
>>>
>>>
>>> Quoting Michael McCandless <lucene [at] mikemccandless>:
>>>
>>>
>>>> Unfortunately, I'm not familiar with exactly what Hibernate
>>>> search does
>>>> with the Lucene APIs.
>>>>
>>>> It must be doing something beyond what your standalone Lucene
>>>> test case
>>>> does.
>>>>
>>>> Maybe ask this question on the Hibernate list?
>>>>
>>>> Mike
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>> 111 West Evelyn Ave. Ste. 202
>> Sunnyvale, CA 94086
>> www.deepdyve.com
>> 408-773-0110 ext. 738
>> 858-414-0013 (m)
>> 408-773-0220 (fax)
>>
>
>
>


lucene at digiatlas

Mar 13, 2009, 7:26 AM

Post #3 of 5 (906 views)
Permalink
Re: problems with large Lucene index (reason found) [In reply to]

Yes, I overrode the read() method in
FSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mb
chunks and do an arraycopy() into the array created by Lucene. It now
works with any heap size and doesn't get OOM.

There may be other areas this could happen in the Lucene code
(although at present it seems to be working fine for me on our
largest, 17Gb, index but I haven't tried accessing data yet - only
getting the result size - so perhaps there are other calls to read()
with large buffer sizes).

As this bug does not look like it will be fixed in the near future, it
might be an idea to put in place a fix in the Lucene code. I think it
would be safe to read in chunks of up to 100Mb without a problem and I
don't think it will affect performance to any great degree.

It's pleasing to see that Lucene can easily handle such huge indexes,
although this bug is obviously quite an impediment to doing so.

regards,
Cameron Newham

Quoting Michael McCandless <lucene [at] mikemccandless>:

>
> Gak, what a horrible bug!
>
> It seems widespread (JRE 1.5, 1.6, on Linux & Windows OSs). And it's
> been open for almost 2.5 years. I just added a comment & voted for the
> bug.
>
> Does it also occur on a 64 bit JRE?
>
> If you still allocate the full array, but read several smaller chunks
> into it, do you still hit the bug?
>
> Mike
>
> lucene [at] digiatlas wrote:
>
>>
>> I now know the cause of the problem. Increasing heap space actually
>> breaks Lucene when reading large indexes.
>>
>> Details on why can be found here:
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
>>
>>
>> Lucene is trying to read a huge block (about 2580Mb) at the point
>> of failure. While it allocates the required bytes in method
>> MultiSegmentReader.norms(), line 334, just fine, it is only when it
>> attempts to use this array in a call to
>> RandomAccessFile.readBytes() that it gets OOM. This is caused by a
>> bug in the native code for the Java IO.
>>
>> As observed in the bug report, large heap space actually causes the
>> bug to appear. When I reduced my heap from 1200M to 1000M the
>> exception was never generated and the code completed correctly and
>> it reported the correct number of search hits in the Hibernate
>> Search version of my program.
>>
>> This isn't good - I need as much memory as possible because I
>> intend to run my search as a web service.
>>
>> The work-around would be to read the file in small chunks, but I am
>> not familiar with the Lucene code so I am unsure how that would be
>> done in a global sense (i.e.: does it really need to allocate a
>> buffer of that size in MultiSegmentReader?)
>>
>> The obvious solution (which I haven't tried yet) would be to patch
>> the point in FSDirectory where the java IO read occurs - looping
>> with a small buffer for the read and then concatenating the result
>> back into Lucene's byte array.
>>
>> Thanks for the comments on this problem from people on this list.
>>
>>
>>
>> Quoting Ted Dunning <ted.dunning [at] gmail>:
>>
>>> try running with verbose gc. That will give you more details about what is
>>> happening.
>>>
>>> Even better, run with jconsole on the side so that you get really detailed
>>> information on memory pools.
>>>
>>> On Thu, Mar 12, 2009 at 7:30 AM, <lucene [at] digiatlas> wrote:
>>>
>>>> Thanks Mike and Jokin for your comments on the memory problem. I have
>>>> submitted the query to the Hibernate Search list although I haven't seen a
>>>> response yet.
>>>>
>>>> In the meantime I did my own investigating in the code (I'd rather have
>>>> avoided this!). I'm seeing results that don't make any sense and maybe
>>>> someone here with more experience of Lucene and the way memory is
>>>> allocated
>>>> by the JVM may shed light on, what to me, are quite illogical
>>>> observations.
>>>>
>>>> As you may recall I had a stand-alone Lucene search and a Hibernate Search
>>>> version. Looking in the HS code did not shed any light on the
>>>> issue. I took
>>>> my stand-alone Lucene code and put it in a method and replaced
>>>> the search in
>>>> the HS class (the constructor of QueryHits.java) with the call to
>>>> my method.
>>>> Bear in mind this method is the same code as posted in my earlier
>>>> message -
>>>> it sets up the Lucene search from scratch (i.e.: no data
>>>> structures created
>>>> by HS were used). So, effectively I was calling my stand-alone code after
>>>> any setup done by Hibernate and any memory it may have allocated (which
>>>> turned out to be a few Mb).
>>>>
>>>> I get OOM! Printing the free memory at this point shows bags of memory
>>>> left. Indeed, the same free memory (+/- a few Mb) as the
>>>> stand-alone Lucene
>>>> version!
>>>>
>>>> I then instrumented the Lucene method where the OOM is occuring
>>>> (FSDirectory.readInternal()). I cannot understand the results I
>>>> am seeing.
>>>> Below is a snippet of the output of each with the code around FSDirectory
>>>> line 598 as follows:
>>>>
>>>> ...
>>>> do {
>>>> long tot = Runtime.getRuntime().totalMemory();
>>>> long free =Runtime.getRuntime().freeMemory();
>>>> System.out.println("LUCENE: offset="+offset+" total="+total+"
>>>> len-total="+(len-total)+" free mem="+free+" used ="+(tot-free));
>>>> int i = file.read(b, offset+total, len-total);
>>>> ...
>>>>
>>>>
>>>>
>>>> The stand-alone version:
>>>>
>>>> ...
>>>> LUCENE: offset=0 total=0 len-total=401 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=883 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used
>>>> =330080544
>>>> LUCENE: offset=0 total=0 len-total=209000000 free mem=631122912 used
>>>> =617534496
>>>> LUCENE: offset=209000000 total=0 len-total=20900000 free mem=631122912
>>>> used =617534496
>>>> LUCENE: offset=229900000 total=0 len-total=20900000 free mem=631122912
>>>> used =617534496
>>>> LUCENE: offset=250800000 total=0 len-total=20900000 free mem=631122912
>>>> used =617534496
>>>> ...
>>>> completes successfully!
>>>>
>>>>
>>>> The method called via Hibernate Search:
>>>>
>>>> ...
>>>> LUCENE: offset=0 total=0 len-total=401 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=883 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used
>>>> =334892152
>>>> LUCENE: offset=0 total=0 len-total=209000000 free mem=636731528 used
>>>> =622346104
>>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>> at java.io.RandomAccessFile.readBytes(Native Method)
>>>> at java.io.RandomAccessFile.read(Unknown Source)
>>>> at
>>>> org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:599)
>>>> ... fails with exception!
>>>>
>>>>
>>>> Note that the HS version has slightly more free memory because I ran it
>>>> with -Xms1210M as opposed to -Xms1200M for the stand-alone to offset any
>>>> memory used by HS when it starts up.
>>>>
>>>> As you can see, these are identical for all practical purposes. So what
>>>> gives?
>>>>
>>>> I'm stumped, so any suggestions appreciated.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> Quoting Michael McCandless <lucene [at] mikemccandless>:
>>>>
>>>>
>>>>> Unfortunately, I'm not familiar with exactly what Hibernate search does
>>>>> with the Lucene APIs.
>>>>>
>>>>> It must be doing something beyond what your standalone Lucene test case
>>>>> does.
>>>>>
>>>>> Maybe ask this question on the Hibernate list?
>>>>>
>>>>> Mike
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Ted Dunning, CTO
>>> DeepDyve
>>>
>>> 111 West Evelyn Ave. Ste. 202
>>> Sunnyvale, CA 94086
>>> www.deepdyve.com
>>> 408-773-0110 ext. 738
>>> 858-414-0013 (m)
>>> 408-773-0220 (fax)
>>>
>>
>>
>>


lucene at mikemccandless

Mar 13, 2009, 8:16 AM

Post #4 of 5 (910 views)
Permalink
Re: problems with large Lucene index (reason found) [In reply to]

lucene [at] digiatlas wrote:

> Yes, I overrode the read() method in
> FSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mb
> chunks and do an arraycopy() into the array created by Lucene. It
> now works with any heap size and doesn't get OOM.

You shouldn't need to do the extra arraycopy? RandomAccessFile can
read into a particular offset/len inside the array. Does that not work?

> There may be other areas this could happen in the Lucene code
> (although at present it seems to be working fine for me on our
> largest, 17Gb, index but I haven't tried accessing data yet - only
> getting the result size - so perhaps there are other calls to read()
> with large buffer sizes).
>
> As this bug does not look like it will be fixed in the near future,
> it might be an idea to put in place a fix in the Lucene code. I
> think it would be safe to read in chunks of up to 100Mb without a
> problem and I don't think it will affect performance to any great
> degree.

I agree. Can you open a Jira issue and post a patch?

> It's pleasing to see that Lucene can easily handle such huge
> indexes, although this bug is obviously quite an impediment to doing
> so.

Yes indeed. This is one crazy bug.

Mike


lucene at mikemccandless

Mar 17, 2009, 5:11 AM

Post #5 of 5 (879 views)
Permalink
Re: problems with large Lucene index (reason found) [In reply to]

I've opened:

https://issues.apache.org/jira/browse/LUCENE-1566

for this. Cameron, could you attach your patch to that issue? Thanks.

Mike

lucene [at] digiatlas wrote:

>
> Yes, I overrode the read() method in
> FSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mb
> chunks and do an arraycopy() into the array created by Lucene. It
> now works with any heap size and doesn't get OOM.
>
> There may be other areas this could happen in the Lucene code
> (although at present it seems to be working fine for me on our
> largest, 17Gb, index but I haven't tried accessing data yet - only
> getting the result size - so perhaps there are other calls to read()
> with large buffer sizes).
>
> As this bug does not look like it will be fixed in the near future,
> it might be an idea to put in place a fix in the Lucene code. I
> think it would be safe to read in chunks of up to 100Mb without a
> problem and I don't think it will affect performance to any great
> degree.
>
> It's pleasing to see that Lucene can easily handle such huge
> indexes, although this bug is obviously quite an impediment to doing
> so.
>
> regards,
> Cameron Newham
>
> Quoting Michael McCandless <lucene [at] mikemccandless>:
>
>>
>> Gak, what a horrible bug!
>>
>> It seems widespread (JRE 1.5, 1.6, on Linux & Windows OSs). And it's
>> been open for almost 2.5 years. I just added a comment & voted for
>> the
>> bug.
>>
>> Does it also occur on a 64 bit JRE?
>>
>> If you still allocate the full array, but read several smaller chunks
>> into it, do you still hit the bug?
>>
>> Mike
>>
>> lucene [at] digiatlas wrote:
>>
>>>
>>> I now know the cause of the problem. Increasing heap space
>>> actually breaks Lucene when reading large indexes.
>>>
>>> Details on why can be found here:
>>>
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
>>>
>>>
>>> Lucene is trying to read a huge block (about 2580Mb) at the point
>>> of failure. While it allocates the required bytes in method
>>> MultiSegmentReader.norms(), line 334, just fine, it is only when
>>> it attempts to use this array in a call to
>>> RandomAccessFile.readBytes() that it gets OOM. This is caused by
>>> a bug in the native code for the Java IO.
>>>
>>> As observed in the bug report, large heap space actually causes
>>> the bug to appear. When I reduced my heap from 1200M to 1000M
>>> the exception was never generated and the code completed
>>> correctly and it reported the correct number of search hits in
>>> the Hibernate Search version of my program.
>>>
>>> This isn't good - I need as much memory as possible because I
>>> intend to run my search as a web service.
>>>
>>> The work-around would be to read the file in small chunks, but I
>>> am not familiar with the Lucene code so I am unsure how that
>>> would be done in a global sense (i.e.: does it really need to
>>> allocate a buffer of that size in MultiSegmentReader?)
>>>
>>> The obvious solution (which I haven't tried yet) would be to
>>> patch the point in FSDirectory where the java IO read occurs -
>>> looping with a small buffer for the read and then concatenating
>>> the result back into Lucene's byte array.
>>>
>>> Thanks for the comments on this problem from people on this list.
>>>
>>>
>>>
>>> Quoting Ted Dunning <ted.dunning [at] gmail>:
>>>
>>>> try running with verbose gc. That will give you more details
>>>> about what is
>>>> happening.
>>>>
>>>> Even better, run with jconsole on the side so that you get really
>>>> detailed
>>>> information on memory pools.
>>>>
>>>> On Thu, Mar 12, 2009 at 7:30 AM, <lucene [at] digiatlas> wrote:
>>>>
>>>>> Thanks Mike and Jokin for your comments on the memory problem. I
>>>>> have
>>>>> submitted the query to the Hibernate Search list although I
>>>>> haven't seen a
>>>>> response yet.
>>>>>
>>>>> In the meantime I did my own investigating in the code (I'd
>>>>> rather have
>>>>> avoided this!). I'm seeing results that don't make any sense and
>>>>> maybe
>>>>> someone here with more experience of Lucene and the way memory
>>>>> is allocated
>>>>> by the JVM may shed light on, what to me, are quite illogical
>>>>> observations.
>>>>>
>>>>> As you may recall I had a stand-alone Lucene search and a
>>>>> Hibernate Search
>>>>> version. Looking in the HS code did not shed any light on the
>>>>> issue. I took
>>>>> my stand-alone Lucene code and put it in a method and replaced
>>>>> the search in
>>>>> the HS class (the constructor of QueryHits.java) with the call
>>>>> to my method.
>>>>> Bear in mind this method is the same code as posted in my
>>>>> earlier message -
>>>>> it sets up the Lucene search from scratch (i.e.: no data
>>>>> structures created
>>>>> by HS were used). So, effectively I was calling my stand-alone
>>>>> code after
>>>>> any setup done by Hibernate and any memory it may have allocated
>>>>> (which
>>>>> turned out to be a few Mb).
>>>>>
>>>>> I get OOM! Printing the free memory at this point shows bags of
>>>>> memory
>>>>> left. Indeed, the same free memory (+/- a few Mb) as the stand-
>>>>> alone Lucene
>>>>> version!
>>>>>
>>>>> I then instrumented the Lucene method where the OOM is occuring
>>>>> (FSDirectory.readInternal()). I cannot understand the results
>>>>> I am seeing.
>>>>> Below is a snippet of the output of each with the code around
>>>>> FSDirectory
>>>>> line 598 as follows:
>>>>>
>>>>> ...
>>>>> do {
>>>>> long tot = Runtime.getRuntime().totalMemory();
>>>>> long free =Runtime.getRuntime().freeMemory();
>>>>> System.out.println("LUCENE: offset="+offset+" total="+total+"
>>>>> len-total="+(len-total)+" free mem="+free+" used ="+(tot-
>>>>> free));
>>>>> int i = file.read(b, offset+total, len-total);
>>>>> ...
>>>>>
>>>>>
>>>>>
>>>>> The stand-alone version:
>>>>>
>>>>> ...
>>>>> LUCENE: offset=0 total=0 len-total=401 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=883 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0 total=0 len-total=209000000 free
>>>>> mem=631122912 used
>>>>> =617534496
>>>>> LUCENE: offset=209000000 total=0 len-total=20900000 free
>>>>> mem=631122912
>>>>> used =617534496
>>>>> LUCENE: offset=229900000 total=0 len-total=20900000 free
>>>>> mem=631122912
>>>>> used =617534496
>>>>> LUCENE: offset=250800000 total=0 len-total=20900000 free
>>>>> mem=631122912
>>>>> used =617534496
>>>>> ...
>>>>> completes successfully!
>>>>>
>>>>>
>>>>> The method called via Hibernate Search:
>>>>>
>>>>> ...
>>>>> LUCENE: offset=0 total=0 len-total=401 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=883 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0 total=0 len-total=209000000 free
>>>>> mem=636731528 used
>>>>> =622346104
>>>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>>> at java.io.RandomAccessFile.readBytes(Native Method)
>>>>> at java.io.RandomAccessFile.read(Unknown Source)
>>>>> at
>>>>> org.apache.lucene.store.FSDirectory
>>>>> $FSIndexInput.readInternal(FSDirectory.java:599)
>>>>> ... fails with exception!
>>>>>
>>>>>
>>>>> Note that the HS version has slightly more free memory because I
>>>>> ran it
>>>>> with -Xms1210M as opposed to -Xms1200M for the stand-alone to
>>>>> offset any
>>>>> memory used by HS when it starts up.
>>>>>
>>>>> As you can see, these are identical for all practical purposes.
>>>>> So what
>>>>> gives?
>>>>>
>>>>> I'm stumped, so any suggestions appreciated.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> Quoting Michael McCandless <lucene [at] mikemccandless>:
>>>>>
>>>>>
>>>>>> Unfortunately, I'm not familiar with exactly what Hibernate
>>>>>> search does
>>>>>> with the Lucene APIs.
>>>>>>
>>>>>> It must be doing something beyond what your standalone Lucene
>>>>>> test case
>>>>>> does.
>>>>>>
>>>>>> Maybe ask this question on the Hibernate list?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ted Dunning, CTO
>>>> DeepDyve
>>>>
>>>> 111 West Evelyn Ave. Ste. 202
>>>> Sunnyvale, CA 94086
>>>> www.deepdyve.com
>>>> 408-773-0110 ext. 738
>>>> 858-414-0013 (m)
>>>> 408-773-0220 (fax)
>>>>
>>>
>>>
>>>
>
>
>

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.