paul.elschot at xs4all
Sep 11, 2006, 10:04 AM
Post #8 of 15
On Monday 11 September 2006 09:50, Chuck Williams wrote:
> Paul Elschot wrote on 09/10/2006 09:15 PM:
> > On Monday 11 September 2006 02:24, Chuck Williams wrote:
> >> Hi All,
> >> An application of ours under development had a memory link that caused
> >> it to slow interminably. On linux, the application did not response to
> >> kill -15 in a reasonable time, so kill -9 was used to forcibly terminate
> >> it. After this the segments file contained a reference to a segment
> >> whose index files were not present. I.e., the index was corrupt and
> >> Lucene could not open it.
> >> A thread dump at the time of the kill -9 shows that Lucene was merging
> >> segments inside IndexWriter.close(). Since segment merging only commits
> >> (updates the segments file) after the newly merged segment(s) are
> >> complete, I expect this is not the actual problem.
> >> Could a kill -9 prevent data from reaching disk for files that were
> >> previously closed? If so, then Lucene's index can become corrupt after
> >> kill -9. In this case, it is possible that a prior merge created new
> >> segment index files, updated the segments file, closed everything, the
> >> segments file made it to disk, but the index data files and/or their
> >> directory entries did not.
> >> If this is the case, it seems to me that flush() and
> >> FileDescriptor.sync() are required on each index file prior to close()
> >> to guarantee no corruption. Additionally a FileDescriptor.sync() is
> >> also probably required on the index directory to ensure the directory
> >> entries have been persisted.
> > Shouldn't the sync be done after closing the files? I'm using sync in a
> > (un*x) shell script after merges before backups. I'd prefer to have some
> > more of this syncing built into Lucene because the shell sync syncs all
> > disks which might be more than needed. So far I've had no problems,
> > so there was no need to investigate further.
> I believe FileDescriptor,sync() uses fsync and not sync on linux. A
> FileDescriptor is no longer valid after the stream is closed, so sync()
> could not be done on a closed stream. I think the correct protocol is
> flush() the stream, sync() it's FD, then close() it.
From Sun's javadocs: flush(), fsync(), close() is indeed the right order
for a single file.
> Paul, do you know if kill -9 can create the situation where bytes from a
> closed file never make it to disk in linux? I think Lucene needs sync()
What do mean by "never"? The problem with not using flush() is that
the jvm simply does _not_ guarantee that data will ever end up on disk,
which is why I added the sync in the shell script after the document merging.
With flush() and sync the guarantee is only given as far as the OS can use
the disk driver, if the disk actually does not write, there is nothing to be
done about that, see the link in the other post.
> in any event to be robust with respect to OS crashes, but am wondering
> if this explains my kill -9 problem as well. It seems bogus to me that
This can explain your problem, data will eventually be written to the
disk by the OS, but when?
> a closed file's bytes would fail to be persisted unless the OS crashed,
> but I can't find any other explanation and I can't find any definitive
> information to affirm or refute this possible side effect of kill -9.
> The issue I've got is that my index can never lose documents. So I've
> implemented journaling on top of Lucene where only the last
> maxBufferedDocs documents are journaled and the whole journal is reset
> after close(). My application has no way to know when the bytes make it
> to disk, and so cannot manage its journal properly unless Lucene ensures
> index integrity with sync()'s.
Do you also flush/sync the journal to disk? If you need to recover from the
journal, it has to be written to disk before doing "transactions" (adding
docs) in lucene.
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene