lucene at mikemccandless
Jan 12, 2007, 3:02 AM
Post #4 of 9
Marvin Humphrey wrote:
> On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote:
>> I too am happy that we have no more commit lock :)
> Not just that. :)
> No more lock directory, since we can put write.lock in the index
> directory itself.
> No more lock file name munging, since lock files from different indexes
> no longer need to avoid collisions within a shared namespace.
> No more need to deal with any files outside of the index directory.
> Those three changes have a bigger impact on Lucy than they do on Lucene,
> and since I'm writing a lot of KS 0.20 code with the notion that it will
> be submitted to Lucy, they're having an impact on what I'm doing right
> now. C doesn't provide a number of the dependencies needed to support
> the old lock system, so we would either have had to include them, write
> them ourselves, or supply the needed functionality via PITA callbacks to
> the host language (Perl, Ruby, etc).
> Since the lock directory lived in the system's tmp directory, we needed
> code to discover where it was. Now we don't.
> The lock file name munging required a checksum string generator. We
> don't need that now.
> Lastly, a failure of imagination had left me blind to the fact that we
> didn't need sophisticated, portable filepath manipulating routines: just
> knowing a directory separator suffices. Previously, I'd wrapped Perl's
> File::Spec::Functions to make catfile() and canonpath() available from
> C. That hadn't been necessary, because we could have built up the
> lockfile paths given the location of the tmp directory and the dir_sep.
> However, as is often the case, simplifying the implementation reveals
> unnecessary cruft, and when all of a sudden everything ended up in one
> directory with a splash, it became obvious that generating filepaths
> didn't require heavy machinery.
>> But I have to say the lockless changes pale in comparison to what you
>> have done/are doing with KinoSearch, specifically the clean merge
>> model with an external sorter and other related file format changes
>> look very interesting.
Ooh, excellent points!
In fact, we haven't done this follow-through for Lucene but I think we
now should? I think having only one directory (the index directory)
where things happen, and simple file name for the write lock
("write.lock") is a great simplification to our users.
Now that readers are read-only, I think it makes sense to default the
write lock into the index directory, and as you describe, no longer
generate a "unique namespace" hash lock ID since the index dir gives
us that scoping.
Are there any reasons not to do this? I will open a JIRA issue to
> Well, I look forward to seeing whether you can suggest improvements on
> some of the algos I'll bring up in this forum once KS 0.20_01 is out. :)
I will try, but I'm already behind just trying to understand how we
could improve Lucene based on your current KS release! Is there any
preview/general summary of what's being done for KS 2.0/Lucy? I tried
to quickly search the KS archives and look through Lucy's archives but
didn't find any solid hit.
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene