
lucene at mikemccandless
Oct 31, 2006, 5:40 PM
Post #9 of 9
(2907 views)
Permalink
|
Marvin Humphrey wrote: > On Oct 31, 2006, at 11:47 AM, Doug Cutting wrote: > >> I think the need for that would disappear if the lockless commit patch >> gets committed. Then there'd be no reason not to put lock files >> directly in the index directory, since only writers would need to lock >> things. > > Unless the index is on an NFS volume. Then a Reader and a Writer can > come into conflict because delete-on-last-close isn't supported. Some > sort of read lock would be handy. Right, Lucene's nice "point in time" searching feature currently relies on the filesystem semantics and NFS doesn't give us "delete on last close". This means searchers over NFS need to expect the "stale NFS handle" IOException when searching and re-open. This is true with or without lock-less commits patch. > One possibility is to extend our file-based locking system to read locks > by appending an integer increment to the lock-file name, so that we > could tell how many readers were live by how many read-lock files were > present. > > Maybe we could have such files and compare modification dates against > the incrementing segments.N files to identify which version of the index > a Reader was opened against? Then, when it was time to delete files, > the writer could discern which files were no longer needed and zap 'em. > > One problem is that if a reader crashes, you don't get a fatal error -- > the only effect is that the Writer just stops deleting files. Might be > other problems, too, but I thought I'd throw the idea out there. I think this is one of the important things that lock-less commits makes possible: implementing "point in time" searching explicitly instead of relying on [rather variable] filesystem semantics. The less we can assume about the filesystem, the more portable Lucene is! If we do this we could have different policies perhaps, ie, "save the past M commits", or, "save any commits newer than N days", or "save any commits that are in use by readers". That last policy is indeed tricky as to how the readers actually communicate to the writer that they are still using "generation N". I like your approach above. If each reader writes to its own unique file, and that file records (either by name or contents) which segments.N that reader is using, then the writers would look for these files and know what not to delete. I think these can just be normal files (ie not lock files)? But the problem of a crashed reader is important to fix. Though if a given reader X always re-used the same file name once it restarted then that should greatly decrease this. An important (I think?) improvement of such an explicit approach would be that readers could be re-opened against previous "point in time" snapshots. Whereas now when you open a reader you always get the most recent commit. Also note that this approach would leave more segments files in your index. However, no additional disk space will actually be consumed because the way it works now disk space is still consumed too (until the readers close and the file really gets deleted). Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene For additional commands, e-mail: java-dev-help [at] lucene
|