Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

lock path thoughts

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


DORONC at il

Oct 30, 2006, 3:12 PM

Post #1 of 9 (2992 views)
Permalink
lock path thoughts

(extracted from issue 665 (turned to be non related to that issue).)

In NFS or other shared fs situations, Locks are maintained in a specified
folder, but a lock file name is derived from the full path of the index
dir, actually the canonical name of this dir. So, if the same index is
accessed by two machines, the <drive> / <mount> / <fs> root of that index
dir must be named the same in all the machines on which Lucene is invoked
to access/maintain that index.

Since File.getCanonicalPath() is system dependent, and since sometimes even
for the same type of OS the mount names differ, Lucene has the
setLockPrefix() API that allows users to configure locks prefix path in
each machine.

This seems like a source for possible problems, when users mis configure
their lock prefixes. - if the index path was not configured correctly, the
index would not be found, and this is likely to be found and fixed pretty
soon. But if lock path prefixes are misconfigured, chances are that the
index would get corrupted.

This would be avoided if index locks are maintained in the index folder. I
searched the lists for previous discussions on this 'design decision' -
i.e. where the index locks reside - found none. Wouldn't it simplify
matters to have the locks in the index dir? Any disadvantages of this?

Thanks,
Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


marvin at rectangular

Oct 30, 2006, 3:27 PM

Post #2 of 9 (2927 views)
Permalink
Re: lock path thoughts [In reply to]

On Oct 30, 2006, at 3:12 PM, Doron Cohen wrote:

> This would be avoided if index locks are maintained in the index
> folder. I
> searched the lists for previous discussions on this 'design
> decision' -
> i.e. where the index locks reside - found none. Wouldn't it simplify
> matters to have the locks in the index dir? Any disadvantages of this?

Doug explains the rationale here:

http://xrl.us/svsz (Link to mail-archives.apache.org)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Oct 30, 2006, 4:14 PM

Post #3 of 9 (2912 views)
Permalink
Re: lock path thoughts [In reply to]

: Doug explains the rationale here:
:
: http://xrl.us/svsz (Link to mail-archives.apache.org)

That rationale makes a lot of sense for FSDirectory/SimpleLockFactory to
use by default (since it already doesn't work in a distributed disk
system like NFS) but as we start getting other Directory/LockFactory
implementations which may not have these problems, we need to make sure
that those new classes aren't limited by this.

My initial thought was that this would be something the lockFactory
already had control over, but then i realized this is really driven by
Directory.getLockID, and LockFactory.setLockPrefix ... it looks like
perhaps newer LockFactories that can work on distributed drives might need
to have non-trivial setLockPrefix methods that ignore their input.

Either that or we punt on the issue and just have really good
documentation to the effect that apps on systems using shared drives need
to call lockFactory.setLockPrefix explicitly.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Oct 30, 2006, 4:29 PM

Post #4 of 9 (2931 views)
Permalink
Re: lock path thoughts [In reply to]

I you may be overcomplicating the lock design. Unix never had any OS
file locking at all (until Windows came around...).

If you are going to use Lucene in a high performing multi-user/multi-
server environment, having the Lucene server process control the
locks (i.e. move Lucene API into a server process) will give FAR
better through-put, and easier manageability. If you are not using
this, then the simple existing file based locks should be more than
adequate.

Even if you want to span the index over several NFS volumes, you
should have a front-end controller, (or controller per volume) that
can communicate the locks if needed.

Using the filesystem/OS to manage the locks is simple, but inferior
in almost all cases, especially if you want to increase the
parallelism of the backend - you ordinarily need much finer lock
control than an OS will provide.

The nice things is... as long as 'disable locks' is always supported,
I'll be happy :)

Just my thoughts.

Robert



On Oct 30, 2006, at 6:14 PM, Chris Hostetter wrote:

>
> : Doug explains the rationale here:
> :
> : http://xrl.us/svsz (Link to mail-archives.apache.org)
>
> That rationale makes a lot of sense for FSDirectory/
> SimpleLockFactory to
> use by default (since it already doesn't work in a distributed disk
> system like NFS) but as we start getting other Directory/LockFactory
> implementations which may not have these problems, we need to make
> sure
> that those new classes aren't limited by this.
>
> My initial thought was that this would be something the lockFactory
> already had control over, but then i realized this is really driven by
> Directory.getLockID, and LockFactory.setLockPrefix ... it looks like
> perhaps newer LockFactories that can work on distributed drives
> might need
> to have non-trivial setLockPrefix methods that ignore their input.
>
> Either that or we punt on the issue and just have really good
> documentation to the effect that apps on systems using shared
> drives need
> to call lockFactory.setLockPrefix explicitly.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


DORONC at il

Oct 30, 2006, 5:59 PM

Post #5 of 9 (2910 views)
Permalink
Re: lock path thoughts [In reply to]

Not having to assign index readers a write permission in the index dir is a
nice feature, I didn't think of it that way.

I looked at having it the other way around - i.e. that by default locks
would be maintained in the index dir and only when inadequate - like the
readers/writers scenario, allow to override with an equivalent of
setLockPath. But (1) now reader would write in the index dir; (2) might be
that there are more use cases like the latter; and (3) anyhow a setLockPath
method would be required. So it seems to me best not to change this.

- Doron

> : Doug explains the rationale here:
> :
> : http://xrl.us/svsz (Link to mail-archives.apache.org)
>
> That rationale makes a lot of sense for FSDirectory/SimpleLockFactory to
> use by default (since it already doesn't work in a distributed disk
> system like NFS) but as we start getting other Directory/LockFactory
> implementations which may not have these problems, we need to make sure
> that those new classes aren't limited by this.
>
> My initial thought was that this would be something the lockFactory
> already had control over, but then i realized this is really driven by
> Directory.getLockID, and LockFactory.setLockPrefix ... it looks like
> perhaps newer LockFactories that can work on distributed drives might
need
> to have non-trivial setLockPrefix methods that ignore their input.
>
> Either that or we punt on the issue and just have really good
> documentation to the effect that apps on systems using shared drives need
> to call lockFactory.setLockPrefix explicitly.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


cutting at apache

Oct 31, 2006, 11:47 AM

Post #6 of 9 (2937 views)
Permalink
Re: lock path thoughts [In reply to]

Doron Cohen wrote:
> Not having to assign index readers a write permission in the index dir is a
> nice feature, I didn't think of it that way.

I think the need for that would disappear if the lockless commit patch
gets committed. Then there'd be no reason not to put lock files
directly in the index directory, since only writers would need to lock
things.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


DORONC at il

Oct 31, 2006, 2:24 PM

Post #7 of 9 (2905 views)
Permalink
Re: lock path thoughts [In reply to]

Doug Cutting wrote:
> I think the need for that would disappear if the lockless commit patch
> gets committed. Then there'd be no reason not to put lock files
> directly in the index directory, since only writers would need to lock
> things.

Great! and so we also get rid of this risk:

> .. source for possible problems, when users mis configure
> their lock prefixes. - if the index path was not configured
> correctly, the index would not be found, and this is likely
> to be found and fixed pretty soon. But if lock path prefixes
> are misconfigured, chances are that the index would get corrupted.

So this is an additional advantage of lock-less commits (patch 701).


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


marvin at rectangular

Oct 31, 2006, 2:59 PM

Post #8 of 9 (2907 views)
Permalink
Re: lock path thoughts [In reply to]

On Oct 31, 2006, at 11:47 AM, Doug Cutting wrote:

> I think the need for that would disappear if the lockless commit
> patch gets committed. Then there'd be no reason not to put lock
> files directly in the index directory, since only writers would
> need to lock things.

Unless the index is on an NFS volume. Then a Reader and a Writer can
come into conflict because delete-on-last-close isn't supported.
Some sort of read lock would be handy.

One possibility is to extend our file-based locking system to read
locks by appending an integer increment to the lock-file name, so
that we could tell how many readers were live by how many read-lock
files were present.

Maybe we could have such files and compare modification dates against
the incrementing segments.N files to identify which version of the
index a Reader was opened against? Then, when it was time to delete
files, the writer could discern which files were no longer needed and
zap 'em.

One problem is that if a reader crashes, you don't get a fatal error
-- the only effect is that the Writer just stops deleting files.
Might be other problems, too, but I thought I'd throw the idea out
there.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Oct 31, 2006, 5:40 PM

Post #9 of 9 (2909 views)
Permalink
Re: lock path thoughts [In reply to]

Marvin Humphrey wrote:
> On Oct 31, 2006, at 11:47 AM, Doug Cutting wrote:
>
>> I think the need for that would disappear if the lockless commit patch
>> gets committed. Then there'd be no reason not to put lock files
>> directly in the index directory, since only writers would need to lock
>> things.
>
> Unless the index is on an NFS volume. Then a Reader and a Writer can
> come into conflict because delete-on-last-close isn't supported. Some
> sort of read lock would be handy.

Right, Lucene's nice "point in time" searching feature currently
relies on the filesystem semantics and NFS doesn't give us "delete on
last close". This means searchers over NFS need to expect the "stale
NFS handle" IOException when searching and re-open. This is true with
or without lock-less commits patch.

> One possibility is to extend our file-based locking system to read locks
> by appending an integer increment to the lock-file name, so that we
> could tell how many readers were live by how many read-lock files were
> present.
>
> Maybe we could have such files and compare modification dates against
> the incrementing segments.N files to identify which version of the index
> a Reader was opened against? Then, when it was time to delete files,
> the writer could discern which files were no longer needed and zap 'em.
>
> One problem is that if a reader crashes, you don't get a fatal error --
> the only effect is that the Writer just stops deleting files. Might be
> other problems, too, but I thought I'd throw the idea out there.

I think this is one of the important things that lock-less commits
makes possible: implementing "point in time" searching explicitly
instead of relying on [rather variable] filesystem semantics. The less
we can assume about the filesystem, the more portable Lucene is!

If we do this we could have different policies perhaps, ie, "save the
past M commits", or, "save any commits newer than N days", or "save
any commits that are in use by readers". That last policy is indeed
tricky as to how the readers actually communicate to the writer that
they are still using "generation N".

I like your approach above. If each reader writes to its own unique
file, and that file records (either by name or contents) which
segments.N that reader is using, then the writers would look for these
files and know what not to delete. I think these can just be normal
files (ie not lock files)? But the problem of a crashed reader is
important to fix. Though if a given reader X always re-used the same
file name once it restarted then that should greatly decrease this.

An important (I think?) improvement of such an explicit approach would
be that readers could be re-opened against previous "point in time"
snapshots. Whereas now when you open a reader you always get the most
recent commit.

Also note that this approach would leave more segments files in your
index. However, no additional disk space will actually be consumed
because the way it works now disk space is still consumed too (until
the readers close and the file really gets deleted).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.