Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

More frustration with Lucene/Java file i/o on Windows

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


MModrall at glgroup

Aug 18, 2006, 7:33 AM

Post #1 of 5 (5215 views)
Permalink
More frustration with Lucene/Java file i/o on Windows

Hi...



It was a little comforting to know that other people have
seen Windows Explorer refreshes crash java Lucene on Windows. We seem
to be running into a long list of file system issues with Lucene, and I
was wondering if other people had noticed these sort of things (and
hopefully any tips and tricks for working around them).



We've got a process running as a Windows service that's
trying to keep a set of Lucene indexes up-to-date from a database. The
corpus is pretty small, so we copy the last index build to a temp
directory and then try to do an incremental index of the changes on the
working copy. Since our software is evolving, we've put a version
number in the meta data of the index files which we check when we're
starting up. If the version numbers don't match, we scrap the whole
thing and start over. The problem is that java Lucene on Windows
doesn't do "scrap" very well.



The current hypothesis is that File.renameTo, File.delete,
and other jvm operations on windows fail if there is any other handle
open on the file and that Lucene objects aren't closing/finalizing their
file handles cleanly/reliably so other things blow up later.



Here's the chain of events we had in our service process
last night:

20060817T165611.682,EDT [Indexer.java 813]: Exception occurred deleting
document 183971: Lock obtain timed out:
Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock

20060817T165612.682,EDT [Indexer.java 813]: Exception occurred deleting
document 257265: Lock obtain timed out:
Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock

20060817T165613.744,EDT [Indexer.java 1184]: Indexing failure, db
changes will be rolled back and partial index deleted.

java.io.IOException: Lock obtain timed out:
Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock

at org.apache.lucene.store.Lock.obtain(Lock.java:56)

at
org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:489
)

at
org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:514)

at
org.apache.lucene.index.IndexReader.deleteDocuments(IndexReader.java:541
)

at ...Indexer.buildIncrementalIndex(Indexer.java:915)

...



Why it couldn't get that temporary lock file, I don't know. The same
process runs continuously, and I don't know if Lucene reuses the same
tmpnames from run to run. If the files were left around because of
these JVM File system errors, maybe that would explain it.



The "partial index deleted" part of our message means that we did a
recursive (java) delete of all the index directories we were working
with. Our attempt to clean the slate got everything but
contact_index\_3zhe.cfs.



An hour later, we come back and try to start another index build. We
find the directory still exists, so we try to validate the index version
number using

searcher = new
IndexSearcher(FSDirectory.getDirectory(indexFile, false));

TermQuery tq = new TermQuery(new
Term(METADATA_DOCUMENT_FIELD, METADATA_DOCUMENT_FIELD_VALUE));

Hits h = searcher.search(tq);

if (h.length() == 1)

...

finally

{

if (searcher != null)

{

try { searcher.close(); } catch (Exception e) { /*
ignore */ }

}

}



Obviously with only the _3zhe.cfs file left, it's not a valid index, so
the attempt to get the metadata fails. No matter what happens, we do a
searcher.close(). My suspicion is that IndexSearch.close() isn't really
doing a File.close() on all the files it's using, so you have to wait
until searcher is garbage collected and its file objects finalized
before things will work - because immediately after this check, Lucene
fails a full build with

20060817T175614.323,EDT [Indexer.java 843]: Error building full index

java.io.IOException: Cannot delete
\\xx.xx.xx.xx\indexbuild2\contact_index\_3zhe.cfs

at
org.apache.lucene.store.FSDirectory.create(FSDirectory.java:198)

at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:144)

at
org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:224)

...



Doing a full build, Lucene does it's own attempt to clear out the
leftovers which fails because it can't delete the file. And we're stuck
in this loop all night.



The guy who wrote the Indexing code says the version check is the only
place in the code where we have an IndexSearcher created using a file
path string - that all others have a pre-existing IndexReader. He wants
me to try it that way instead, so that we can explicitly close the
reader and hopefully clear that loose file handle.



Sorry for the long-winded vent, but does anyone have any advice for
getting java lucene working on windows? Any idea why it would seize up
on the lock files? This service process is the only lucene process on
the system, and the finished indexes are copied off to another server to
serve the search requests, so it's puzzling that the daemon process
would block itself... Anyone know if an IndexReader.close() would do a
better job of cleaning up the file handles than IndexSearcher.close()?



Thanks

-Mark


This e-mail message, and any attachments, is intended only for the use of the individual or entity identified in the alias address of this message and may contain information that is confidential, privileged and subject to legal restrictions and penalties regarding its unauthorized disclosure and use. Any unauthorized review, copying, disclosure, use or distribution is strictly prohibited. If you have received this e-mail message in error, please notify the sender immediately by reply e-mail and delete this message, and any attachments, from your system. Thank you.


lucene at mikemccandless

Aug 18, 2006, 12:10 PM

Post #2 of 5 (4907 views)
Permalink
Re: More frustration with Lucene/Java file i/o on Windows [In reply to]

> It was a little comforting to know that other people have
> seen Windows Explorer refreshes crash java Lucene on Windows. We seem
> to be running into a long list of file system issues with Lucene, and I
> was wondering if other people had noticed these sort of things (and
> hopefully any tips and tricks for working around them).

Sorry you're having so many troubles. Keep these emails, questions &
issues coming because this is how we [gradually] fix Lucene to be more
robust!

OK a few quick possibilities / suggestions:

* Make sure in your Indexer.java that when you delete docs, you
close any open IndexWriter's before you try to call
deleteDocuments from your IndexReader. Only one writer
(IndexWriter adding docs or IndexReader deleting docs) can be open
at once and if you fail to do this you'll get exactly that "lock
obtain timed out" error. You could also use IndexModifier which
under the hood is doing this open-close logic for you. But: try
to buffer up adds and deletes together if possible to minimize
cost of open/closes.

* That one file really seems to have an open file handle on it. Are
you sure you called close on all IndexReaders (IndexSearchers)?
That file is a "compound file format" segment, and IndexReaders
hold an open file handle to these files (IndexWriters do as well,
but they quickly close the file handles after writing to them).

* There was a thread recently, similar to this issue, where
File.renameTo was failing, and there was a suggestion that this is
a bug in some JVMs and to get the JVM to GC (System.gc()) to see
if that then closes the underlying file.

* IndexSearcher.close() will only close the underlying IndexReader
if you created it with a String. If you create it with just an
IndexReader it will not close that reader. You have to separately
call IndexReader.close to close the reader.

* If the JVM exited un-gracefully then the lock files will be left
on disk and Lucene will incorrectly think the lock is held by
another process (and then hit that "lock obtain timed out"
error). You can just remove the lock files (from
c:\windows\temp\...) if you are certain no Lucene processes are
running.

We are working towards using native locks in Lucene (for a future
release) so that even un-graceful exits of the JVM will properly
free the lock.

* Perhaps, change your "build a new index" logic so that it does so
in an entirely fresh directory? Just to avoid any hazards at all
of anything holding files open in the old directory ...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


chris.lu at gmail

Aug 18, 2006, 12:17 PM

Post #3 of 5 (4877 views)
Permalink
Re: More frustration with Lucene/Java file i/o on Windows [In reply to]

Hi, Mark,

I had the same issue with Lucene when maintaining Lucene index on
Windows. It's mostly due to Windows OS can not delete a file
correctly. While some versioning can alivate the problem somehow, my
advice is to move to Linux according to my experience.

Regarding the lock, you need to delete all locks before the indexing starts.

Regarding the leftover index, you can create a new index in a new
directory and copy the new index to your target directory.

Chris Lu
------------------------------------
Lucene Search Server for Any Databases/Applications
http://www.dbsight.net

On 8/18/06, Mark Modrall <MModrall [at] glgroup> wrote:
> Hi...
>
>
>
> It was a little comforting to know that other people have
> seen Windows Explorer refreshes crash java Lucene on Windows. We seem
> to be running into a long list of file system issues with Lucene, and I
> was wondering if other people had noticed these sort of things (and
> hopefully any tips and tricks for working around them).
>
>
>
> We've got a process running as a Windows service that's
> trying to keep a set of Lucene indexes up-to-date from a database. The
> corpus is pretty small, so we copy the last index build to a temp
> directory and then try to do an incremental index of the changes on the
> working copy. Since our software is evolving, we've put a version
> number in the meta data of the index files which we check when we're
> starting up. If the version numbers don't match, we scrap the whole
> thing and start over. The problem is that java Lucene on Windows
> doesn't do "scrap" very well.
>
>
>
> The current hypothesis is that File.renameTo, File.delete,
> and other jvm operations on windows fail if there is any other handle
> open on the file and that Lucene objects aren't closing/finalizing their
> file handles cleanly/reliably so other things blow up later.
>
>
>
> Here's the chain of events we had in our service process
> last night:
>
> 20060817T165611.682,EDT [Indexer.java 813]: Exception occurred deleting
> document 183971: Lock obtain timed out:
> Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock
>
> 20060817T165612.682,EDT [Indexer.java 813]: Exception occurred deleting
> document 257265: Lock obtain timed out:
> Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock
>
> 20060817T165613.744,EDT [Indexer.java 1184]: Indexing failure, db
> changes will be rolled back and partial index deleted.
>
> java.io.IOException: Lock obtain timed out:
> Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock
>
> at org.apache.lucene.store.Lock.obtain(Lock.java:56)
>
> at
> org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:489
> )
>
> at
> org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:514)
>
> at
> org.apache.lucene.index.IndexReader.deleteDocuments(IndexReader.java:541
> )
>
> at ...Indexer.buildIncrementalIndex(Indexer.java:915)
>
> ...
>
>
>
> Why it couldn't get that temporary lock file, I don't know. The same
> process runs continuously, and I don't know if Lucene reuses the same
> tmpnames from run to run. If the files were left around because of
> these JVM File system errors, maybe that would explain it.
>
>
>
> The "partial index deleted" part of our message means that we did a
> recursive (java) delete of all the index directories we were working
> with. Our attempt to clean the slate got everything but
> contact_index\_3zhe.cfs.
>
>
>
> An hour later, we come back and try to start another index build. We
> find the directory still exists, so we try to validate the index version
> number using
>
> searcher = new
> IndexSearcher(FSDirectory.getDirectory(indexFile, false));
>
> TermQuery tq = new TermQuery(new
> Term(METADATA_DOCUMENT_FIELD, METADATA_DOCUMENT_FIELD_VALUE));
>
> Hits h = searcher.search(tq);
>
> if (h.length() == 1)
>
> ...
>
> finally
>
> {
>
> if (searcher != null)
>
> {
>
> try { searcher.close(); } catch (Exception e) { /*
> ignore */ }
>
> }
>
> }
>
>
>
> Obviously with only the _3zhe.cfs file left, it's not a valid index, so
> the attempt to get the metadata fails. No matter what happens, we do a
> searcher.close(). My suspicion is that IndexSearch.close() isn't really
> doing a File.close() on all the files it's using, so you have to wait
> until searcher is garbage collected and its file objects finalized
> before things will work - because immediately after this check, Lucene
> fails a full build with
>
> 20060817T175614.323,EDT [Indexer.java 843]: Error building full index
>
> java.io.IOException: Cannot delete
> \\xx.xx.xx.xx\indexbuild2\contact_index\_3zhe.cfs
>
> at
> org.apache.lucene.store.FSDirectory.create(FSDirectory.java:198)
>
> at
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:144)
>
> at
> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:224)
>
> ...
>
>
>
> Doing a full build, Lucene does it's own attempt to clear out the
> leftovers which fails because it can't delete the file. And we're stuck
> in this loop all night.
>
>
>
> The guy who wrote the Indexing code says the version check is the only
> place in the code where we have an IndexSearcher created using a file
> path string - that all others have a pre-existing IndexReader. He wants
> me to try it that way instead, so that we can explicitly close the
> reader and hopefully clear that loose file handle.
>
>
>
> Sorry for the long-winded vent, but does anyone have any advice for
> getting java lucene working on windows? Any idea why it would seize up
> on the lock files? This service process is the only lucene process on
> the system, and the finished indexes are copied off to another server to
> serve the search requests, so it's puzzling that the daemon process
> would block itself... Anyone know if an IndexReader.close() would do a
> better job of cleaning up the file handles than IndexSearcher.close()?
>
>
>
> Thanks
>
> -Mark
>
>
> This e-mail message, and any attachments, is intended only for the use of the individual or entity identified in the alias address of this message and may contain information that is confidential, privileged and subject to legal restrictions and penalties regarding its unauthorized disclosure and use. Any unauthorized review, copying, disclosure, use or distribution is strictly prohibited. If you have received this e-mail message in error, please notify the sender immediately by reply e-mail and delete this message, and any attachments, from your system. Thank you.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


MModrall at glgroup

Aug 18, 2006, 2:53 PM

Post #4 of 5 (4849 views)
Permalink
RE: More frustration with Lucene/Java file i/o on Windows [In reply to]

Hi Mike,

I do appreciate the thoroughness and graciousness of your
responses, and I hope there's nothing in my frustration that you would
take personally. Googling around, I've found other references to the
sun jvm handling of the Windows file system to be, well, quixotic at
best.

In our current system, we have two modes of operation, full
index recreation and incremental indexing. Which to use is determined
by a quick validate check (check to see if the path exists, see if it is
a directory. If it is, make an IndexSearcher to check the meta data as
below. If the reader passes the test, build incremental; otherwise
delete the directory and start fresh
searcher = new IndexSearcher(FSDirectory.getDirectory(indexFile,
false));
TermQuery tq = new TermQuery(new Term(METADATA_DOCUMENT_FIELD,
METADATA_DOCUMENT_FIELD_VALUE));
Hits h = searcher.search(tq);
).

The validation IndexSearcher gets closed in a finally block, so
there shouldn't be anything left over from that.

If it's a full rebuild, we just have an IndexWriter (no reader).
If it's incremental, there's an IndexReader to delete old documents,
which is closed, followed by an IndexWriter that is also closed (when
things go well).

I haven't gone looking in the source to figure out what goes
into the middle of the lucene-<xxx>-write.lock naming convention, but as
you say they could have been left over from some abnormal termination.

Our indexing schema bats back and forth between 2 build dirs;
one's supposed to be the last successful build, the other is the one you
can work on. When a successful build is finished, all the files are
copied over into the scratch dir and the next build goes in the scratch
dir. If part of the glorp in the lock file name is a hash of the
directory path, we could run for a while and not hit the locking issue
for a couple of builds.

I still can't figure out how the .cfs file delete would fail,
though, unless the IndexSearcher.close() hadn't really let go of the
file. What would happen with an IndexSearcher on a malformed directory?
I.e. if there was only a .cfs file there? Would .close() know to
release the one handle it had?

Anyway, I'll implement something at the root to delete the lock
files before starting to do anything to make sure the slate is clean and
cross my fingers.

Thanks
-Mark





This e-mail message, and any attachments, is intended only for the use of the individual or entity identified in the alias address of this message and may contain information that is confidential, privileged and subject to legal restrictions and penalties regarding its unauthorized disclosure and use. Any unauthorized review, copying, disclosure, use or distribution is strictly prohibited. If you have received this e-mail message in error, please notify the sender immediately by reply e-mail and delete this message, and any attachments, from your system. Thank you.

-----Original Message-----

From: Michael McCandless [mailto:lucene [at] mikemccandless]
Sent: Friday, August 18, 2006 3:11 PM
To: java-user [at] lucene
Subject: Re: More frustration with Lucene/Java file i/o on Windows


> It was a little comforting to know that other people have
> seen Windows Explorer refreshes crash java Lucene on Windows. We seem
> to be running into a long list of file system issues with Lucene, and
I
> was wondering if other people had noticed these sort of things (and
> hopefully any tips and tricks for working around them).

Sorry you're having so many troubles. Keep these emails, questions &
issues coming because this is how we [gradually] fix Lucene to be more
robust!

OK a few quick possibilities / suggestions:

* Make sure in your Indexer.java that when you delete docs, you
close any open IndexWriter's before you try to call
deleteDocuments from your IndexReader. Only one writer
(IndexWriter adding docs or IndexReader deleting docs) can be open
at once and if you fail to do this you'll get exactly that "lock
obtain timed out" error. You could also use IndexModifier which
under the hood is doing this open-close logic for you. But: try
to buffer up adds and deletes together if possible to minimize
cost of open/closes.

* That one file really seems to have an open file handle on it. Are
you sure you called close on all IndexReaders (IndexSearchers)?
That file is a "compound file format" segment, and IndexReaders
hold an open file handle to these files (IndexWriters do as well,
but they quickly close the file handles after writing to them).

* There was a thread recently, similar to this issue, where
File.renameTo was failing, and there was a suggestion that this is
a bug in some JVMs and to get the JVM to GC (System.gc()) to see
if that then closes the underlying file.

* IndexSearcher.close() will only close the underlying IndexReader
if you created it with a String. If you create it with just an
IndexReader it will not close that reader. You have to separately
call IndexReader.close to close the reader.

* If the JVM exited un-gracefully then the lock files will be left
on disk and Lucene will incorrectly think the lock is held by
another process (and then hit that "lock obtain timed out"
error). You can just remove the lock files (from
c:\windows\temp\...) if you are certain no Lucene processes are
running.

We are working towards using native locks in Lucene (for a future
release) so that even un-graceful exits of the JVM will properly
free the lock.

* Perhaps, change your "build a new index" logic so that it does so
in an entirely fresh directory? Just to avoid any hazards at all
of anything holding files open in the old directory ...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Aug 18, 2006, 3:34 PM

Post #5 of 5 (4850 views)
Permalink
Re: More frustration with Lucene/Java file i/o on Windows [In reply to]

> I do appreciate the thoroughness and graciousness of your
> responses, and I hope there's nothing in my frustration that you would
> take personally. Googling around, I've found other references to the
> sun jvm handling of the Windows file system to be, well, quixotic at
> best.

No problem!

And I suspect Sun doesn't like Microsoft :)

> In our current system, we have two modes of operation, full
> index recreation and incremental indexing. Which to use is determined
> by a quick validate check (check to see if the path exists, see if it is
> a directory. If it is, make an IndexSearcher to check the meta data as
> below. If the reader passes the test, build incremental; otherwise
> delete the directory and start fresh
> searcher = new IndexSearcher(FSDirectory.getDirectory(indexFile,
> false));
> TermQuery tq = new TermQuery(new Term(METADATA_DOCUMENT_FIELD,
> METADATA_DOCUMENT_FIELD_VALUE));
> Hits h = searcher.search(tq);
> ).
>
> The validation IndexSearcher gets closed in a finally block, so
> there shouldn't be anything left over from that.

OK, this sounds fine.

> If it's a full rebuild, we just have an IndexWriter (no reader).
> If it's incremental, there's an IndexReader to delete old documents,
> which is closed, followed by an IndexWriter that is also closed (when
> things go well).

OK but be real careful on the incremental case: you can only have
exactly one of IndexReader or IndexWriter open at a time. In other
words, you have to close one in order to open the other, and vice/versa.
It sounds like you do all deletes with an IndexReader, then close it,
then open an IndexWriter, do all your adds, then close it? In which
case that should be fine... the closes are also in finally blocks?

> I haven't gone looking in the source to figure out what goes
> into the middle of the lucene-<xxx>-write.lock naming convention, but as
> you say they could have been left over from some abnormal termination.

The Lucene classes have finalizers that try to release these locks so
"in theory" (cross fingers) it should only be a hard KILL or C-level
exception in the JVM that would cause these lock files to be left behind.

> Our indexing schema bats back and forth between 2 build dirs;
> one's supposed to be the last successful build, the other is the one you
> can work on. When a successful build is finished, all the files are
> copied over into the scratch dir and the next build goes in the scratch
> dir. If part of the glorp in the lock file name is a hash of the
> directory path, we could run for a while and not hit the locking issue
> for a couple of builds.

OK I see. Yes indeed the glorp is a "digest" from the directory name ...

> I still can't figure out how the .cfs file delete would fail,
> though, unless the IndexSearcher.close() hadn't really let go of the
> file. What would happen with an IndexSearcher on a malformed directory?
> I.e. if there was only a .cfs file there? Would .close() know to
> release the one handle it had?

Yeah the fact that the OS wouldn't let Lucene nor you delete the CFS
file means it was indeed still open. That combined with write locks
stuck in the filesystem really sorta feels like there was an
IndexSearcher that didn't get closed. Or it could indeed be the lurking
[possible] bug in the JVM that fails to really close a file even when
you call File.close() from Java.

What JVM & version of Lucene are you using?

> Anyway, I'll implement something at the root to delete the lock
> files before starting to do anything to make sure the slate is clean and
> cross my fingers.

OK good luck!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.