Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Back Compatibility

 

 

First page Previous page 1 2 3 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


gsingers at apache

Jan 23, 2008, 6:42 PM

Post #51 of 70 (27285 views)
Permalink
Re: Back Compatibility [In reply to]

Yes, I agree these are what is about (despite the divergence into
locking).

As I see, it the question is about whether we should try to do major
releases on the order of a year, rather than the current 2+ year
schedule and also how to best handle bad behavior when producing
tokens that previous applications rely on.

On the first case, we said we would try to do minor releases more
frequently (on the order of once a quarter) in the past, but this, so
far hasn't happened. However, it has only been one release, and it
did have a lot of big changes that warranted longer testing. I do
agree with Michael M. that we have done a good job of keeping back
compatibility. I still don't know if trying to clean out deprecations
once a year puts some onerous task on people when it comes to
upgrading as opposed to doing every two years. Do people really have
code that they never compile or work on in over a year? If they do,
do they care about upgrading? It clearly means they are happy w/
Lucene and don't need any bug fixes. I can understand this being a
bigger issue if it were on the order of every 6 months or less, but
that isn't what I am proposing. I guess my suggestion would be that
we try to get back onto the once a quarter release goal, which will
more than likely lead to a major release in the 1-1.5 year time
frame. That being said, I am fine with maintaining the status quo
concerning back. compatibility as I think those arguments are
compelling. On the interface thing, I wish there was a @introducing
annotation that could announce the presence of a new method and would
give a warning up until the version specified is met, at which point
it would break the compile, but I realize the semantics of that are
pretty weird, so...

As for the other issue concerning things like token issues, I think it
is reasonable to fix the bug and just let people know it will change
indexing, but try to allow for the old way if it is not to onerous.
Chances are most people aren't even aware of it, and thus telling them
about may actually cause them to consider it. For things like
maxFieldLength, etc. then back compat. is a reasonable thing to
preserve.

Cheers,
Grant


On Jan 23, 2008, at 6:24 PM, DM Smith wrote:

> Top posting because this is a response to the thread as a whole.
>
> It appears that this thread has identified some different reasons
> for "needing" to break compatibility:
> 1) A current behavior is now deemed bad or wrong. Examples: the
> silent truncation of large documents or an analyzer that works
> incorrectly.
> 2) Performance tuning such as seen in Token, allowing reuse.
> 3) Support of a new language feature, e.g. generics, that make the
> code "better".
> 4) A new feature requires a change to the existing API.
>
> Perhaps there were others? Maybe specifics are in Jira.
>
> It seems to me that the Lucene developers have done an excellent job
> at figuring out how to maintain compatibility. This is a testament
> to how well grounded the design of the API actually is, from early
> on and even now. And changes seem to be well thought out, well
> designed and carefully implemented.
>
> I think that when it really gets down to it, the Lucene API will
> stay very stable because of this.
>
> On a side note, the cLucene project seems to be languishing (still
> trying to get to 2.0) and any stability of the API is a good thing
> for it. And perhaps for the other "ports" as well.
>
> Again many thanks for all your hard work,
> DM Smith, a thankful "parasite" :)
>
> On Jan 23, 2008, at 5:16 PM, Michael McCandless wrote:
>
>>
>> chris Hostetter wrote:
>>
>>>
>>> : I do like the idea of a static/system property to match legacy
>>> : behavior. For example, the bugs around how StandardTokenizer
>>> : mislabels tokens (eg LUCENE-1100), this would be the perfect
>>> solution.
>>> : Clearly those are silly bugs that should be fixed, quickly, with
>>> this
>>> : back-compatible mode to keep the bug in place.
>>> :
>>> : We might want to, instead, have ctors for many classes take a
>>> required
>>> : arg which states the version of Lucene you are using? So if you
>>> are
>>> : writing a new app you would pass in the current version. Then, on
>>> : dropping in a future Lucene JAR, we could use that arg to
>>> enforce the
>>> : right backwards compatibility. This would save users from
>>> having to
>>> : realize they are hitting one of these situations and then know
>>> to go
>>> : set the right static/property to retain the buggy behavior.
>>>
>>> I'm not sure that this would be better though ... when i write my
>>> code, i
>>> pass "2.3" to all these constructors (or factory methods) and then
>>> later i
>>> want to upgrade to 2.3 to get all the new performance goodness ... i
>>> shouldn't have to change all those constructor calls to get all
>>> the 2.4
>>> goodness, i should be able to leave my code as is -- but if i do
>>> that,
>>> then i might not get all the 2.4 goodness, (like improved
>>> tokenization, or more precise segment merging) because some of that
>>> goodness violates previous assumptions that some code might have
>>> had ...
>>> my code doesn't have those assumptions, i know nothing about them,
>>> i'll
>>> take whatever behavior the Lucene Developers recommend (unless i see
>>> evidence that it breaks something, in which case i'll happily set a
>>> system property or something that the release notes say will force
>>> the
>>> old behavior.
>>>
>>> The basic principle being: by default, give users the behavior
>>> that is
>>> generally viewed as "correct" -- but give them the option to force
>>> "uncorrect" legacy behavior.
>>
>> OK, I agree: the vast majority of users upgrading would in fact
>> want all of the changes in the new release. And then the rare user
>> who is affected by that bug fix to StandardTokenizer would have to
>> set the compatibility mode. So it makes sense for you to get all
>> changes on upgrading (and NOT specify the legacy version in all
>> ctors).
>>
>>> : Also, backporting is extremely costly over time. I'd much
>>> rather keep
>>> : compatibility for longer on our forward releases, than spend our
>>> : scarce resources moving changes back.
>>>
>>> +1
>>>
>>> : So to summarize ... I think we should have (keep) a high
>>> tolerance for
>>> : cruft to maintain API compatibility. I think our current approach
>>> : (try hard to keep compatibility during "minor" releases, then
>>> : deprecate, then remove APIs on a major release; do major
>>> releases only
>>> : when truly required) is a good one.
>>>
>>> i'm with you for the most part, it's just the defintion of "when
>>> truly
>>> required" that tends to hang people up ... there's a chicken vs egg
>>> problem of deciding wether the code should drive what the next
>>> release
>>> number is: "i've added a bitch'n feature but it requires adding a
>>> method
>>> to an interface, therefor the next release must be called 4.0" ...
>>> vs the
>>> mindset that "we just had a 3.0 release, it's too soon for another
>>> major
>>> release, the next release should be called 3.1, so we need to hold
>>> off on
>>> commiting non backwards compatible changes for a while."
>>>
>>> I'm in the first camp: version numbers should be descriptive,
>>> information
>>> carrying, labels for releases -- but the version number of a release
>>> should be dicated by the code contained in that release. (if that
>>> means
>>> the next version after 3.0.0 is 4.0.0, then so be it.)
>>
>> Well, I am weary of doing major releases too often. Though I do
>> agree that the version number should be a "fastmatch" for reading
>> through CHANGES.txt.
>>
>> Say we do this, and zoom forward 2 years when we're up to 6.0, then
>> poor users stuck on 1.9 will dread upgrading, but probably shouldn't.
>>
>> One of the amazing things about Lucene, to me, is how many really
>> major changes we have been able to make while not in fact breaking
>> backwards compatibility (too much). Being very careful not to make
>> things public, intentionally not committing to things like exactly
>> when does a flush or commit or merge actually happen, marking new
>> APIs as experimental and freely subject to change, using abstract
>> classes not interfaces, are all wonderful tools that Lucene employs
>> (and should continue to do so), to enable sizable changes in the
>> future while keeping backwards compatibility.
>>
>> Allowing for future backwards compatibility is one of the most
>> important things we all do when we make changes to Lucene!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


cdoronc at gmail

Jan 24, 2008, 12:13 AM

Post #52 of 70 (27286 views)
Permalink
Re: Back Compatibility [In reply to]

On Jan 24, 2008 12:31 AM, robert engels <rengels [at] ix> wrote:

> You must get the write lock before opening the reader if you want
> transactional consistency and are performing updates.
>
> No other way to do it.
>
> Otherwise.
>
> A opens reader.
> B opens reader.
> A performs query decides an update is needed based on results
> B performs query decides an update is needed based on results
> B gets write lock
> B updates
> B releases
> A gets write lock


Lucene actually protects from this - 'A' would fail to acquire the write
lock, with a stale-index-exception (this is tested in TesIndexReader -
testDeleteReaderReaderConflict).


> A performs update - ERROR. A is performing an update based on stale data
>
> If A & B want to update an index, it must work as:
>
> A gets lock
> A opens reader
> A updates
> A releases lock
> B get lcoks
> B opens reader
> B updates
> B releases lock
>
> The only way you can avoid this is if system can determine that B's
> query results in the first case would not change based on A's updates.
>


lucene at mikemccandless

Jan 24, 2008, 1:16 AM

Post #53 of 70 (27269 views)
Permalink
Re: Back Compatibility [In reply to]

Doron Cohen wrote:

> ------=_Part_11325_2615585.1201162438596
> Content-Type: text/plain; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> On Jan 24, 2008 12:31 AM, robert engels <rengels [at] ix> wrote:
>
>> You must get the write lock before opening the reader if you want
>> transactional consistency and are performing updates.
>>
>> No other way to do it.
>>
>> Otherwise.
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides an update is needed based on results
>> B performs query decides an update is needed based on results
>> B gets write lock
>> B updates
>> B releases
>> A gets write lock
>
>
> Lucene actually protects from this - 'A' would fail to acquire the
> write
> lock, with a stale-index-exception (this is tested in TesIndexReader -
> testDeleteReaderReaderConflict).

Aha, you are right Doron! Indeed Lucene effectively serializes this
case, using the write.lock.

>
>> A performs update - ERROR. A is performing an update based on
>> stale data
>>
>> If A & B want to update an index, it must work as:
>>
>> A gets lock
>> A opens reader
>> A updates
>> A releases lock
>> B get lcoks
>> B opens reader
>> B updates
>> B releases lock
>>
>> The only way you can avoid this is if system can determine that B's
>> query results in the first case would not change based on A's
>> updates.

And, in this case, B will fail when it tries to get the lock. It
must be re-opened so it first sees the changes committed by A.

So, Lucene is transactional, but forces clients to serialize their
write operations (ie, one cannot have multiple transactions open at
once).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 24, 2008, 1:27 AM

Post #54 of 70 (27284 views)
Permalink
Re: Back Compatibility [In reply to]

Grant Ingersoll wrote:

> Yes, I agree these are what is about (despite the divergence into
> locking).
>
> As I see, it the question is about whether we should try to do
> major releases on the order of a year, rather than the current 2+
> year schedule and also how to best handle bad behavior when
> producing tokens that previous applications rely on.
>
> On the first case, we said we would try to do minor releases more
> frequently (on the order of once a quarter) in the past, but this,
> so far hasn't happened. However, it has only been one release,
> and it did have a lot of big changes that warranted longer
> testing. I do agree with Michael M. that we have done a good job
> of keeping back compatibility. I still don't know if trying to
> clean out deprecations once a year puts some onerous task on people
> when it comes to upgrading as opposed to doing every two years. Do
> people really have code that they never compile or work on in over
> a year? If they do, do they care about upgrading? It clearly
> means they are happy w/ Lucene and don't need any bug fixes. I can
> understand this being a bigger issue if it were on the order of
> every 6 months or less, but that isn't what I am proposing. I
> guess my suggestion would be that we try to get back onto the once
> a quarter release goal, which will more than likely lead to a major
> release in the 1-1.5 year time frame. That being said, I am fine
> with maintaining the status quo concerning back. compatibility as I
> think those arguments are compelling. On the interface thing, I
> wish there was a @introducing annotation that could announce the
> presence of a new method and would give a warning up until the
> version specified is met, at which point it would break the
> compile, but I realize the semantics of that are pretty weird, so...

I do think we should try for minor releases more frequently,
independent of the backwards compatibility question (how often to do
major releases) :)

I think major releases should be done only when a major feature truly
"forces" us to (which Java 1.5 has) and not because we want to clean
out the accumulated cruft we are carrying forward to preserve
backwards compatibility.

> As for the other issue concerning things like token issues, I think
> it is reasonable to fix the bug and just let people know it will
> change indexing, but try to allow for the old way if it is not to
> onerous. Chances are most people aren't even aware of it, and thus
> telling them about may actually cause them to consider it. For
> things like maxFieldLength, etc. then back compat. is a reasonable
> thing to preserve.

So, in hindsight, the acronym/host setting for StandardAnalyzer
really should have defaulted to "true", meaning the bug is fixed, but
users who somehow depend on the bug (which should be a tiny minority)
have an avenue (setReplaceInvalidAcronym) to keep back compatibility
if needed even on a minor release, right? I agree. (And so in 2.4
we should fix the default to true?).

I think for such issues where it's a very minor break in backwards
compatibility, we should make the break, and very carefully document
this in the "Changes in runtime behavior" section, even within a
minor release. I don't think such changes should drive us to a major
release.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


gsingers at apache

Jan 24, 2008, 5:42 AM

Post #55 of 70 (27268 views)
Permalink
Re: Back Compatibility [In reply to]

On Jan 24, 2008, at 4:27 AM, Michael McCandless wrote:

>
> Grant Ingersoll wrote:
>
>> Yes, I agree these are what is about (despite the divergence into
>> locking).
>>
>> As I see, it the question is about whether we should try to do
>> major releases on the order of a year, rather than the current 2+
>> year schedule and also how to best handle bad behavior when
>> producing tokens that previous applications rely on.
>>
>> On the first case, we said we would try to do minor releases more
>> frequently (on the order of once a quarter) in the past, but this,
>> so far hasn't happened. However, it has only been one release,
>> and it did have a lot of big changes that warranted longer
>> testing. I do agree with Michael M. that we have done a good job
>> of keeping back compatibility. I still don't know if trying to
>> clean out deprecations once a year puts some onerous task on people
>> when it comes to upgrading as opposed to doing every two years. Do
>> people really have code that they never compile or work on in over
>> a year? If they do, do they care about upgrading? It clearly
>> means they are happy w/ Lucene and don't need any bug fixes. I can
>> understand this being a bigger issue if it were on the order of
>> every 6 months or less, but that isn't what I am proposing. I
>> guess my suggestion would be that we try to get back onto the once
>> a quarter release goal, which will more than likely lead to a major
>> release in the 1-1.5 year time frame. That being said, I am fine
>> with maintaining the status quo concerning back. compatibility as I
>> think those arguments are compelling. On the interface thing, I
>> wish there was a @introducing annotation that could announce the
>> presence of a new method and would give a warning up until the
>> version specified is met, at which point it would break the
>> compile, but I realize the semantics of that are pretty weird, so...
>
> I do think we should try for minor releases more frequently,
> independent of the backwards compatibility question (how often to do
> major releases) :)
>

+1

The question then becomes what can we do to improve our development
process?

> I think major releases should be done only when a major feature
> truly "forces" us to (which Java 1.5 has) and not because we want to
> clean out the accumulated cruft we are carrying forward to preserve
> backwards compatibility.
>
>> As for the other issue concerning things like token issues, I think
>> it is reasonable to fix the bug and just let people know it will
>> change indexing, but try to allow for the old way if it is not to
>> onerous. Chances are most people aren't even aware of it, and thus
>> telling them about may actually cause them to consider it. For
>> things like maxFieldLength, etc. then back compat. is a reasonable
>> thing to preserve.
>
> So, in hindsight, the acronym/host setting for StandardAnalyzer
> really should have defaulted to "true", meaning the bug is fixed,
> but users who somehow depend on the bug (which should be a tiny
> minority) have an avenue (setReplaceInvalidAcronym) to keep back
> compatibility if needed even on a minor release, right? I agree.
> (And so in 2.4 we should fix the default to true?).


>
>
> I think for such issues where it's a very minor break in backwards
> compatibility, we should make the break, and very carefully document
> this in the "Changes in runtime behavior" section, even within a
> minor release. I don't think such changes should drive us to a
> major release.


+1

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 24, 2008, 8:15 AM

Post #56 of 70 (27265 views)
Permalink
Re: Back Compatibility [In reply to]

Sorry, I am using "gets lock" to mean 'opening the index'. I was
simplifying the the procedure.

I think your comment is not correct in this context.

On Jan 24, 2008, at 3:16 AM, Michael McCandless wrote:

> Doron Cohen wrote:
>
>> ------=_Part_11325_2615585.1201162438596
>> Content-Type: text/plain; charset=ISO-8859-1
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline
>>
>> On Jan 24, 2008 12:31 AM, robert engels <rengels [at] ix>
>> wrote:
>>
>>> You must get the write lock before opening the reader if you want
>>> transactional consistency and are performing updates.
>>>
>>> No other way to do it.
>>>
>>> Otherwise.
>>>
>>> A opens reader.
>>> B opens reader.
>>> A performs query decides an update is needed based on results
>>> B performs query decides an update is needed based on results
>>> B gets write lock
>>> B updates
>>> B releases
>>> A gets write lock
>>
>>
>> Lucene actually protects from this - 'A' would fail to acquire the
>> write
>> lock, with a stale-index-exception (this is tested in
>> TesIndexReader -
>> testDeleteReaderReaderConflict).
>
> Aha, you are right Doron! Indeed Lucene effectively serializes
> this case, using the write.lock.
>
>>
>>> A performs update - ERROR. A is performing an update based on
>>> stale data
>>>
>>> If A & B want to update an index, it must work as:
>>>
>>> A gets lock
>>> A opens reader
>>> A updates
>>> A releases lock
>>> B get lcoks
>>> B opens reader
>>> B updates
>>> B releases lock
>>>
>>> The only way you can avoid this is if system can determine that B's
>>> query results in the first case would not change based on A's
>>> updates.
>
> And, in this case, B will fail when it tries to get the lock. It
> must be re-opened so it first sees the changes committed by A.
>
> So, Lucene is transactional, but forces clients to serialize their
> write operations (ie, one cannot have multiple transactions open at
> once).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 24, 2008, 8:55 AM

Post #57 of 70 (27263 views)
Permalink
Re: Back Compatibility [In reply to]

Thanks, you are correct, but I am not sure it covers the complete case.

Change it a bit to be:

A opens reader.
B opens reader.
A performs query decides a new document is needed
B performs query decides a new document is needed
B gets writer, adds document, closes
A gets writer, adds document, closes

There needs to be a way to manually serialize these operations. I
assume I should just do this:

A gets writer
B gets writer - can't so blocked
A opens reader
A performs query decides a new document is needed
A adds document
A closes reader
A closes writer
B now gets writer
B opens reader
B performs query sees a new document is not needed
B closes reader
B closes writer

Previously, with the read locks, I did not think you could open the
reader after you had the write lock.

Am I correct here?

On Jan 24, 2008, at 2:13 AM, Doron Cohen wrote:

> On Jan 24, 2008 12:31 AM, robert engels <rengels [at] ix> wrote:
>
>> You must get the write lock before opening the reader if you want
>> transactional consistency and are performing updates.
>>
>> No other way to do it.
>>
>> Otherwise.
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides an update is needed based on results
>> B performs query decides an update is needed based on results
>> B gets write lock
>> B updates
>> B releases
>> A gets write lock
>
>
> Lucene actually protects from this - 'A' would fail to acquire the
> write
> lock, with a stale-index-exception (this is tested in TesIndexReader -
> testDeleteReaderReaderConflict).
>
>
>> A performs update - ERROR. A is performing an update based on
>> stale data
>>
>> If A & B want to update an index, it must work as:
>>
>> A gets lock
>> A opens reader
>> A updates
>> A releases lock
>> B get lcoks
>> B opens reader
>> B updates
>> B releases lock
>>
>> The only way you can avoid this is if system can determine that B's
>> query results in the first case would not change based on A's
>> updates.
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


cdoronc at gmail

Jan 24, 2008, 10:35 AM

Post #58 of 70 (27277 views)
Permalink
Re: Back Compatibility [In reply to]

On Jan 24, 2008 6:55 PM, robert engels <rengels [at] ix> wrote:

> Thanks, you are correct, but I am not sure it covers the complete case.
>
> Change it a bit to be:
>
> A opens reader.
> B opens reader.
> A performs query decides a new document is needed
> B performs query decides a new document is needed
> B gets writer, adds document, closes
> A gets writer, adds document, closes
>
> There needs to be a way to manually serialize these operations. I
> assume I should just do this:
>
> A gets writer
> B gets writer - can't so blocked
> A opens reader
> A performs query decides a new document is needed
> A adds document
> A closes reader
> A closes writer
> B now gets writer
> B opens reader
> B performs query sees a new document is not needed
> B closes reader
> B closes writer
>
> Previously, with the read locks, I did not think you could open the
> reader after you had the write lock.
>
> Am I correct here?


If I understand you correctly then yes and no :-)

"Yes" in the sense that this would work and achieve the
required serialization, and "no" in that you could always open
readers whether there was an open writer or not.

The current locking logic with readers is that opening a reader does
not require acquiring any lock. Only when attempting to use the reader
for a write operation (e.g. delete) the reader becomes a writer, and
for that it (1) acquires a write lock and (2) verifies that the
index was not modified by any writer since the reader was
first opened (or else it throws that stale exception).

Prior to lockless-commit there were two lock types - write-lock and
commit-lock. The commit-lock was used only briefly - during file opening
during reader-opening, to guarantee that no writer modifies the files that
the
reader is reading (especially the segments file). Lockles-commits got rid
of the commit lock (mainly by changing to never modify a file once it was
written.) Write locks are still in use, but only for writers, as described
above.
(Mike feel free to correct me here...)


dmsmith555 at gmail

Jan 24, 2008, 10:44 AM

Post #59 of 70 (27279 views)
Permalink
Re: Back Compatibility [In reply to]

This is now a hijacked thread. It is very interesting, but it may be
hard to find again. Wouldn't it be better to record this thread
differently, perhaps opening a Jira issue to add XA to Lucene?

-- DM

Doron Cohen wrote:
> On Jan 24, 2008 6:55 PM, robert engels <rengels [at] ix> wrote:
>
>
>> Thanks, you are correct, but I am not sure it covers the complete case.
>>
>> Change it a bit to be:
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides a new document is needed
>> B performs query decides a new document is needed
>> B gets writer, adds document, closes
>> A gets writer, adds document, closes
>>
>> There needs to be a way to manually serialize these operations. I
>> assume I should just do this:
>>
>> A gets writer
>> B gets writer - can't so blocked
>> A opens reader
>> A performs query decides a new document is needed
>> A adds document
>> A closes reader
>> A closes writer
>> B now gets writer
>> B opens reader
>> B performs query sees a new document is not needed
>> B closes reader
>> B closes writer
>>
>> Previously, with the read locks, I did not think you could open the
>> reader after you had the write lock.
>>
>> Am I correct here?
>>
>
>
> If I understand you correctly then yes and no :-)
>
> "Yes" in the sense that this would work and achieve the
> required serialization, and "no" in that you could always open
> readers whether there was an open writer or not.
>
> The current locking logic with readers is that opening a reader does
> not require acquiring any lock. Only when attempting to use the reader
> for a write operation (e.g. delete) the reader becomes a writer, and
> for that it (1) acquires a write lock and (2) verifies that the
> index was not modified by any writer since the reader was
> first opened (or else it throws that stale exception).
>
> Prior to lockless-commit there were two lock types - write-lock and
> commit-lock. The commit-lock was used only briefly - during file opening
> during reader-opening, to guarantee that no writer modifies the files that
> the
> reader is reading (especially the segments file). Lockles-commits got rid
> of the commit lock (mainly by changing to never modify a file once it was
> written.) Write locks are still in use, but only for writers, as described
> above.
> (Mike feel free to correct me here...)
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 24, 2008, 11:43 AM

Post #60 of 70 (27285 views)
Permalink
Re: Back Compatibility [In reply to]

I will do so.

On Jan 24, 2008, at 12:44 PM, DM Smith wrote:

> This is now a hijacked thread. It is very interesting, but it may
> be hard to find again. Wouldn't it be better to record this thread
> differently, perhaps opening a Jira issue to add XA to Lucene?
>
> -- DM
>
> Doron Cohen wrote:
>> On Jan 24, 2008 6:55 PM, robert engels <rengels [at] ix> wrote:
>>
>>
>>> Thanks, you are correct, but I am not sure it covers the complete
>>> case.
>>>
>>> Change it a bit to be:
>>>
>>> A opens reader.
>>> B opens reader.
>>> A performs query decides a new document is needed
>>> B performs query decides a new document is needed
>>> B gets writer, adds document, closes
>>> A gets writer, adds document, closes
>>>
>>> There needs to be a way to manually serialize these operations. I
>>> assume I should just do this:
>>>
>>> A gets writer
>>> B gets writer - can't so blocked
>>> A opens reader
>>> A performs query decides a new document is needed
>>> A adds document
>>> A closes reader
>>> A closes writer
>>> B now gets writer
>>> B opens reader
>>> B performs query sees a new document is not needed
>>> B closes reader
>>> B closes writer
>>>
>>> Previously, with the read locks, I did not think you could open the
>>> reader after you had the write lock.
>>>
>>> Am I correct here?
>>>
>>
>>
>> If I understand you correctly then yes and no :-)
>>
>> "Yes" in the sense that this would work and achieve the
>> required serialization, and "no" in that you could always open
>> readers whether there was an open writer or not.
>>
>> The current locking logic with readers is that opening a reader does
>> not require acquiring any lock. Only when attempting to use the
>> reader
>> for a write operation (e.g. delete) the reader becomes a writer, and
>> for that it (1) acquires a write lock and (2) verifies that the
>> index was not modified by any writer since the reader was
>> first opened (or else it throws that stale exception).
>>
>> Prior to lockless-commit there were two lock types - write-lock and
>> commit-lock. The commit-lock was used only briefly - during file
>> opening
>> during reader-opening, to guarantee that no writer modifies the
>> files that
>> the
>> reader is reading (especially the segments file). Lockles-commits
>> got rid
>> of the commit lock (mainly by changing to never modify a file once
>> it was
>> written.) Write locks are still in use, but only for writers, as
>> described
>> above.
>> (Mike feel free to correct me here...)
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


gsingers at apache

Jan 25, 2008, 10:04 AM

Post #61 of 70 (27279 views)
Permalink
Re: Back Compatibility [In reply to]

One more thought on back compatibility:

Do we have the same requirements for any and all contrib modules? I
am especially thinking about the benchmark contrib, but it probably
applies to others as well.

-Grant


On Jan 24, 2008, at 8:42 AM, Grant Ingersoll wrote:

>
> On Jan 24, 2008, at 4:27 AM, Michael McCandless wrote:
>
>>
>> Grant Ingersoll wrote:
>>
>>> Yes, I agree these are what is about (despite the divergence into
>>> locking).
>>>
>>> As I see, it the question is about whether we should try to do
>>> major releases on the order of a year, rather than the current 2+
>>> year schedule and also how to best handle bad behavior when
>>> producing tokens that previous applications rely on.
>>>
>>> On the first case, we said we would try to do minor releases more
>>> frequently (on the order of once a quarter) in the past, but this,
>>> so far hasn't happened. However, it has only been one release,
>>> and it did have a lot of big changes that warranted longer
>>> testing. I do agree with Michael M. that we have done a good job
>>> of keeping back compatibility. I still don't know if trying to
>>> clean out deprecations once a year puts some onerous task on
>>> people when it comes to upgrading as opposed to doing every two
>>> years. Do people really have code that they never compile or work
>>> on in over a year? If they do, do they care about upgrading? It
>>> clearly means they are happy w/ Lucene and don't need any bug
>>> fixes. I can understand this being a bigger issue if it were on
>>> the order of every 6 months or less, but that isn't what I am
>>> proposing. I guess my suggestion would be that we try to get back
>>> onto the once a quarter release goal, which will more than likely
>>> lead to a major release in the 1-1.5 year time frame. That being
>>> said, I am fine with maintaining the status quo concerning back.
>>> compatibility as I think those arguments are compelling. On the
>>> interface thing, I wish there was a @introducing annotation that
>>> could announce the presence of a new method and would give a
>>> warning up until the version specified is met, at which point it
>>> would break the compile, but I realize the semantics of that are
>>> pretty weird, so...
>>
>> I do think we should try for minor releases more frequently,
>> independent of the backwards compatibility question (how often to
>> do major releases) :)
>>
>
> +1
>
> The question then becomes what can we do to improve our development
> process?
>
>> I think major releases should be done only when a major feature
>> truly "forces" us to (which Java 1.5 has) and not because we want
>> to clean out the accumulated cruft we are carrying forward to
>> preserve backwards compatibility.
>>
>>> As for the other issue concerning things like token issues, I
>>> think it is reasonable to fix the bug and just let people know it
>>> will change indexing, but try to allow for the old way if it is
>>> not to onerous. Chances are most people aren't even aware of it,
>>> and thus telling them about may actually cause them to consider
>>> it. For things like maxFieldLength, etc. then back compat. is a
>>> reasonable thing to preserve.
>>
>> So, in hindsight, the acronym/host setting for StandardAnalyzer
>> really should have defaulted to "true", meaning the bug is fixed,
>> but users who somehow depend on the bug (which should be a tiny
>> minority) have an avenue (setReplaceInvalidAcronym) to keep back
>> compatibility if needed even on a minor release, right? I agree.
>> (And so in 2.4 we should fix the default to true?).
>
>
>>
>>
>> I think for such issues where it's a very minor break in backwards
>> compatibility, we should make the break, and very carefully
>> document this in the "Changes in runtime behavior" section, even
>> within a minor release. I don't think such changes should drive us
>> to a major release.
>
>
> +1
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


cdoronc at gmail

Jan 25, 2008, 1:01 PM

Post #62 of 70 (27273 views)
Permalink
Re: Back Compatibility [In reply to]

On Jan 25, 2008 8:04 PM, Grant Ingersoll <gsingers [at] apache> wrote:

> One more thought on back compatibility:
>
> Do we have the same requirements for any and all contrib modules? I
> am especially thinking about the benchmark contrib, but it probably
> applies to others as well.
>
> -Grant
>

In general I think that contrib should have same requirements, because
there may be applications out there depending on it - e.g. highlighting,
spell-correction - and here too, unstable packages can be marked with
the temporary warning such those we currently have for search.function.

benchmark is different in that - I think - there are no applications that
depend on it, so perhaps we can have more flexibility in it?

Doron


gsingers at apache

Jan 25, 2008, 1:16 PM

Post #63 of 70 (27280 views)
Permalink
Re: Back Compatibility [In reply to]

Well, contrib/Wikipedia has a dependency on it, but at least it is
self contained. I would love to see the Wikipedia stuff extracted out
of benchmark and be in contrib/wikipedia (thus flipping the
dependency), but the effort isn't particularly high on my list.

But I do agree, benchmark doesn't have the same litmus test.

-Grant

On Jan 25, 2008, at 4:01 PM, Doron Cohen wrote:

> On Jan 25, 2008 8:04 PM, Grant Ingersoll <gsingers [at] apache> wrote:
>
>> One more thought on back compatibility:
>>
>> Do we have the same requirements for any and all contrib modules? I
>> am especially thinking about the benchmark contrib, but it probably
>> applies to others as well.
>>
>> -Grant
>>
>
> In general I think that contrib should have same requirements, because
> there may be applications out there depending on it - e.g.
> highlighting,
> spell-correction - and here too, unstable packages can be marked with
> the temporary warning such those we currently have for
> search.function.
>
> benchmark is different in that - I think - there are no applications
> that
> depend on it, so perhaps we can have more flexibility in it?
>
> Doron



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 27, 2008, 5:05 PM

Post #64 of 70 (27276 views)
Permalink
Re: Back Compatibility [In reply to]

: I would guess the number of people/organizations using Lucene vs. contributing
: to Lucene is much greater.
:
: The contributers work in head (should IMO). The users can select a particular
: version of Lucene and code their apps accordingly. They can also back-port
: features from a later to an earlier release. If they have limited development
: resources, they are probably not working on Lucene (they are working on their
: apps), but they can update their own code to work with later versions - which
: they would probably rather do than learning the internals and contributing to
: Lucene.

i think we have a semantic disconnect on the definition of "community"

I am including any and all people/projects that use Lucene in anyway --
wether or not they contribute back or not. If there are 1000 projects
using lucene as a library, and each project requires 5 man hours of work
to upgrade from version X to version Y becuse of a non-backwards
compatible change, but it would only take 2 man hours of work for those
projects to backport / rip out the one or two features of version Y they
really want to cram them into their code base then the community as a
whole is paying a really heavy cost for version Y ... regardless of wether
each of those 1000 projects invest the 5 hours or the 2 hours ... in the
first extreme we're all spending a cumulative total of 5000 man hours. in
the second case we're spending 2000 man hours, and now we've got 1000 apps
that are runing hacked up unofficial offshoots of version X that will
never be able to upgrade to version Z when it comes out -- the community
not only becomes very fractured but lucene as a whole gets a bad wrap,
because everybody talks about how they still run version X with local
patches instead of using version Y -- it makes new users wonder "what's
wrong with version Y?" ... "if upgrading is so hard that no one does it do
i really wnat to use this library?"

It may seem like a socialist or a communist or a free love hippy attitude,
but if contributors and committers take extra time to develop more
incrimental releases and backwards compatible API transitions it may cost
them more time upfront, but it saves the community as a whole a *lot* of
time in the long run.

By all means: we should move forward anytime really great improvements can
be made through new APIs and new features -- but we need to keep in mind
that if those new APIs and features are hard for our current user base to
adapt to, then we aren't doing the community as a whole any favors by
throwing the baby out with the bath water and prematurely throwing away
an old API in order to support the new one.

Trade offs must be made. Sometimes that may mean sacrificing committer
man hours; or performance; or API cleanliness; in order to reap the
benefit of a strong, happy, healthy, community.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 27, 2008, 5:31 PM

Post #65 of 70 (27275 views)
Permalink
Re: Back Compatibility [In reply to]

: > So, in hindsight, the acronym/host setting for StandardAnalyzer really
: > should have defaulted to "true", meaning the bug is fixed, but users who
: > somehow depend on the bug (which should be a tiny minority) have an avenue
: > (setReplaceInvalidAcronym) to keep back compatibility if needed even on a
: > minor release, right? I agree. (And so in 2.4 we should fix the default to
: > true?).

: > I think for such issues where it's a very minor break in backwards
: > compatibility, we should make the break, and very carefully document this in
: > the "Changes in runtime behavior" section, even within a minor release. I
: > don't think such changes should drive us to a major release.

: +1

I've made some verbage changes to BackwardsCompatibility to document that
we may in fact make runtime behavior hcanges which are not strictly
"backwards compatible" and what commitments we have to lettings users
force the old behavior if we make a change like this in a minor release.

most of this verbage is just me making stuff up based on this thread ...
it is absolutely open for discussion (and editing by people with more
grammer sense then me)...

http://wiki.apache.org/lucene-java/BackwardsCompatibility



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 27, 2008, 5:34 PM

Post #66 of 70 (27274 views)
Permalink
Re: Back Compatibility [In reply to]

: But I do agree, benchmark doesn't have the same litmus test.

the generalization of that statement probably being "all contribs are not
created equal."

I propose making some comments in the BackwardsCompatibility wiki page
about the compatibility commitments of contribs depends largely on their
maturity and intended usage and that the README.txt file for each contrib
will identify it's approach to compatibility.

we can put some boler plate in the README for most of the contribs, and
special verbage in the README for the special contribs.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 27, 2008, 6:54 PM

Post #67 of 70 (27247 views)
Permalink
Re: Back Compatibility [In reply to]

And then you can end up like the Soviet Union...

The basic problems of communism - those that don't contribute their
fair share, but suck out the minimum resources (but maximum in
totality), and those that want to lead (their contribution) and suck
the minimum, and then those that contribute the most to make up for
everyone else, and quickly say this SUCKS....


On Jan 27, 2008, at 7:05 PM, Chris Hostetter wrote:

> : I would guess the number of people/organizations using Lucene vs.
> contributing
> : to Lucene is much greater.
> :
> : The contributers work in head (should IMO). The users can select
> a particular
> : version of Lucene and code their apps accordingly. They can also
> back-port
> : features from a later to an earlier release. If they have limited
> development
> : resources, they are probably not working on Lucene (they are
> working on their
> : apps), but they can update their own code to work with later
> versions - which
> : they would probably rather do than learning the internals and
> contributing to
> : Lucene.
>
> i think we have a semantic disconnect on the definition of "community"
>
> I am including any and all people/projects that use Lucene in
> anyway --
> wether or not they contribute back or not. If there are 1000 projects
> using lucene as a library, and each project requires 5 man hours of
> work
> to upgrade from version X to version Y becuse of a non-backwards
> compatible change, but it would only take 2 man hours of work for
> those
> projects to backport / rip out the one or two features of version Y
> they
> really want to cram them into their code base then the community as a
> whole is paying a really heavy cost for version Y ... regardless of
> wether
> each of those 1000 projects invest the 5 hours or the 2 hours ...
> in the
> first extreme we're all spending a cumulative total of 5000 man
> hours. in
> the second case we're spending 2000 man hours, and now we've got
> 1000 apps
> that are runing hacked up unofficial offshoots of version X that will
> never be able to upgrade to version Z when it comes out -- the
> community
> not only becomes very fractured but lucene as a whole gets a bad wrap,
> because everybody talks about how they still run version X with local
> patches instead of using version Y -- it makes new users wonder
> "what's
> wrong with version Y?" ... "if upgrading is so hard that no one
> does it do
> i really wnat to use this library?"
>
> It may seem like a socialist or a communist or a free love hippy
> attitude,
> but if contributors and committers take extra time to develop more
> incrimental releases and backwards compatible API transitions it
> may cost
> them more time upfront, but it saves the community as a whole a
> *lot* of
> time in the long run.
>
> By all means: we should move forward anytime really great
> improvements can
> be made through new APIs and new features -- but we need to keep in
> mind
> that if those new APIs and features are hard for our current user
> base to
> adapt to, then we aren't doing the community as a whole any favors by
> throwing the baby out with the bath water and prematurely throwing
> away
> an old API in order to support the new one.
>
> Trade offs must be made. Sometimes that may mean sacrificing
> committer
> man hours; or performance; or API cleanliness; in order to reap the
> benefit of a strong, happy, healthy, community.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


gsingers at apache

Jan 27, 2008, 7:14 PM

Post #68 of 70 (27252 views)
Permalink
Re: Back Compatibility [In reply to]

+1
On Jan 27, 2008, at 8:34 PM, Chris Hostetter wrote:

>
> : But I do agree, benchmark doesn't have the same litmus test.
>
> the generalization of that statement probably being "all contribs
> are not
> created equal."
>
> I propose making some comments in the BackwardsCompatibility wiki page
> about the compatibility commitments of contribs depends largely on
> their
> maturity and intended usage and that the README.txt file for each
> contrib
> will identify it's approach to compatibility.
>
> we can put some boler plate in the README for most of the contribs,
> and
> special verbage in the README for the special contribs.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


gsingers at apache

Jan 27, 2008, 7:19 PM

Post #69 of 70 (27261 views)
Permalink
Re: Back Compatibility [In reply to]

+1. And, we always have the major version release at our disposal if
need be.

At any rate, I think we have beaten this one to death. I think it is
a useful to look back every now and then on the major things that
guide us and make sure we all still agree, at least for the most
part. For now, I think our plan is pretty straightforward. 2.4
pretty quickly (3 months?) and then 2.9, all of which will be back-
compat. Then onto 3.0 which will be a full upgrade to 1.5, thus
dropping support for 1.4.

-Grant


On Jan 27, 2008, at 8:05 PM, Chris Hostetter wrote:

> : I would guess the number of people/organizations using Lucene vs.
> contributing
> : to Lucene is much greater.
> :
> : The contributers work in head (should IMO). The users can select a
> particular
> : version of Lucene and code their apps accordingly. They can also
> back-port
> : features from a later to an earlier release. If they have limited
> development
> : resources, they are probably not working on Lucene (they are
> working on their
> : apps), but they can update their own code to work with later
> versions - which
> : they would probably rather do than learning the internals and
> contributing to
> : Lucene.
>
> i think we have a semantic disconnect on the definition of "community"
>
> I am including any and all people/projects that use Lucene in anyway
> --
> wether or not they contribute back or not. If there are 1000 projects
> using lucene as a library, and each project requires 5 man hours of
> work
> to upgrade from version X to version Y becuse of a non-backwards
> compatible change, but it would only take 2 man hours of work for
> those
> projects to backport / rip out the one or two features of version Y
> they
> really want to cram them into their code base then the community as a
> whole is paying a really heavy cost for version Y ... regardless of
> wether
> each of those 1000 projects invest the 5 hours or the 2 hours ... in
> the
> first extreme we're all spending a cumulative total of 5000 man
> hours. in
> the second case we're spending 2000 man hours, and now we've got
> 1000 apps
> that are runing hacked up unofficial offshoots of version X that will
> never be able to upgrade to version Z when it comes out -- the
> community
> not only becomes very fractured but lucene as a whole gets a bad wrap,
> because everybody talks about how they still run version X with local
> patches instead of using version Y -- it makes new users wonder
> "what's
> wrong with version Y?" ... "if upgrading is so hard that no one does
> it do
> i really wnat to use this library?"
>
> It may seem like a socialist or a communist or a free love hippy
> attitude,
> but if contributors and committers take extra time to develop more
> incrimental releases and backwards compatible API transitions it may
> cost
> them more time upfront, but it saves the community as a whole a
> *lot* of
> time in the long run.
>
> By all means: we should move forward anytime really great
> improvements can
> be made through new APIs and new features -- but we need to keep in
> mind
> that if those new APIs and features are hard for our current user
> base to
> adapt to, then we aren't doing the community as a whole any favors by
> throwing the baby out with the bath water and prematurely throwing
> away
> an old API in order to support the new one.
>
> Trade offs must be made. Sometimes that may mean sacrificing
> committer
> man hours; or performance; or API cleanliness; in order to reap the
> benefit of a strong, happy, healthy, community.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


Endre at stolsvik

Jan 28, 2008, 9:19 AM

Post #70 of 70 (27292 views)
Permalink
Re: Back Compatibility [In reply to]

> It may seem like a socialist or a communist or a free love hippy attitude,

It sounds like a perfect attitude.

(In particular the "free love hippie" part - does it come with LSD and
tie-dyed/batik clothes too?)

Kind regards,
Endre.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

First page Previous page 1 2 3 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.