
lucene at mikemccandless
Jan 12, 2007, 3:02 AM
Post #4 of 9
(2820 views)
Permalink
|
Marvin Humphrey wrote: > > On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote: > >> I too am happy that we have no more commit lock :) > > Not just that. :) > > No more lock directory, since we can put write.lock in the index > directory itself. > > No more lock file name munging, since lock files from different indexes > no longer need to avoid collisions within a shared namespace. > > No more need to deal with any files outside of the index directory. > > Those three changes have a bigger impact on Lucy than they do on Lucene, > and since I'm writing a lot of KS 0.20 code with the notion that it will > be submitted to Lucy, they're having an impact on what I'm doing right > now. C doesn't provide a number of the dependencies needed to support > the old lock system, so we would either have had to include them, write > them ourselves, or supply the needed functionality via PITA callbacks to > the host language (Perl, Ruby, etc). > > Since the lock directory lived in the system's tmp directory, we needed > code to discover where it was. Now we don't. > > The lock file name munging required a checksum string generator. We > don't need that now. > > Lastly, a failure of imagination had left me blind to the fact that we > didn't need sophisticated, portable filepath manipulating routines: just > knowing a directory separator suffices. Previously, I'd wrapped Perl's > File::Spec::Functions to make catfile() and canonpath() available from > C. That hadn't been necessary, because we could have built up the > lockfile paths given the location of the tmp directory and the dir_sep. > However, as is often the case, simplifying the implementation reveals > unnecessary cruft, and when all of a sudden everything ended up in one > directory with a splash, it became obvious that generating filepaths > didn't require heavy machinery. > >> But I have to say the lockless changes pale in comparison to what you >> have done/are doing with KinoSearch, specifically the clean merge >> model with an external sorter and other related file format changes >> look very interesting. Ooh, excellent points! In fact, we haven't done this follow-through for Lucene but I think we now should? I think having only one directory (the index directory) where things happen, and simple file name for the write lock ("write.lock") is a great simplification to our users. Now that readers are read-only, I think it makes sense to default the write lock into the index directory, and as you describe, no longer generate a "unique namespace" hash lock ID since the index dir gives us that scoping. Are there any reasons not to do this? I will open a JIRA issue to track this. > Well, I look forward to seeing whether you can suggest improvements on > some of the algos I'll bring up in this forum once KS 0.20_01 is out. :) I will try, but I'm already behind just trying to understand how we could improve Lucene based on your current KS release! Is there any preview/general summary of what's being done for KS 2.0/Lucy? I tried to quickly search the KS archives and look through Lucy's archives but didn't find any solid hit. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene For additional commands, e-mail: java-dev-help [at] lucene
|