Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Back Compatibility

 

 

First page Previous page 1 2 3 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


rengels at ix

Jan 22, 2008, 5:27 PM

Post #26 of 70 (14277 views)
Permalink
Re: Back Compatibility [In reply to]

I think there are a lot of applications using Lucene where "whether
its lost a bit of data or not" is not acceptable.

However, it is probably fine for a web search, or intranet search.

As to your first point, that is why the really great open-source
projects (eclipse, open office) have a financial backer that provides
significant direction, and contributions. They wouldn't waste their
resources developing esoteric features with little appeal, and direct
their resources to broader features that others can then develop
finer features on top off.

I don't question the abilities of Michael whatsoever - I just wish
they were directed at broader features. The review by the voters (and
this list) allows development to focused.

Frequently perfectly correct patches are rejected by the voters. Why?
Because SOMEONE needs to keep the development focused - if not there
will be chaos.

On Jan 22, 2008, at 4:19 PM, Mark Miller wrote:

> I humbly disagree about NFS. Arguing about where free time was
> invested, or wasted, or inefficient, in an open source project just
> seems silly. One of the great benefits is esoteric work that would
> normally not be allowed for. NFS is easy. A lot of Lucene users
> don't care about Lucene. They just want something easy to setup. It
> especially doesn't make send when talking about Michael. He seems
> to spit out Lucene code in his sleep. I doubt NFS stuff did
> anything but to make him more brilliant at manipulating Lucene. It
> certainly hasn't made him any less prolific.
>
> I am very in favor of your talk about transactional support. Man do
> I want Lucene to have that. But the fact that we are getting to
> where the index cannot be corrupted is still a great step forward.
> Knowing that my indexes will not be corrupted while running at a
> place that needs access 24/7 is just wonderful. I can get something
> working for them quick, whether its lost a bit of data or not. Now
> full support to guarantee that my Lucene index is consistent with
> my Database? Even better. I wish. But I am still very thankful for
> the first step of a guaranteed consistent index.
>
> Your glass is always half full ;) I aspire to your crankiness when
> I get older.
>
> - Mark
>
>
> robert engels wrote:
>> One more example on this. A lot of work was done on transaction
>> support. I would argue that this falls way short of what is
>> needed, since there is no XA transaction support. Since the lucene
>> index (unless stored in an XA db) is a separate resource, it
>> really needs XA support in order to be consistent with the other
>> resources.
>>
>> All of the transaction work that has been performed only
>> guarantees that barring a physical hardware failure the lucene
>> index can be opened and used at a known state. This index though
>> is probably not consistent with the other resources.
>>
>> All that was done is that we can now guarantee that the index is
>> consistent at SOME point in time.
>>
>> Given the work that was done, we are probably closer to adding XA
>> support, but I think this would be much easier if the concept of a
>> transaction was made first class through the API (and then XA
>> transactions need to be supported).
>>
>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>
>>> I don't think group C is interested in bug fixes. I just don't
>>> see how Lucene is at all useful if the users are encountering any
>>> bug - so they either don't use that feature, or they have already
>>> developed a work-around (or they have patched the code in a way
>>> that avoids the bug, yet is specific to their environment).
>>>
>>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>>> substantial. I think the actual number of people trying to use
>>> NFS is probably very low - as the initial implementation had so
>>> many problems (and IMO is not a very good solution for
>>> distributed indexes anyway). So all the work in trying to make
>>> NFS work "correctly" behind the scenes may have been inefficient,
>>> since a more direct, yet major fix may have solved the problem
>>> better (like distributed server support, not shared index access).
>>>
>>> I just think that trying to maintain API compatibility through
>>> major releases is a bad idea. Leads to bloat, and complex code -
>>> both internal and external. In order to achieve great gains in
>>> usability and/or performance in a mature product like Lucene
>>> almost certainly requires massive changes to the processes,
>>> algorithms and structures, and the API should change as well to
>>> reflect this.
>>>
>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : If they are " no longer actively developing the portion of the
>>>> code that's
>>>> : broken, aren't seeking the new feature, etc", and they stay
>>>> back on old
>>>> : versions... isn't that exactly what we want? They can stay on
>>>> the old version,
>>>> : and new application development uses the newer version.
>>>>
>>>> This basically mirrors a philosophy that is rising in the Perl
>>>> community evangelized by (a really smart dude named chromatic) ...
>>>> "why are we worry about the effect of upgrades on users who
>>>> don't upgrade?"
>>>>
>>>> The problem is not all users are created equal and not all users
>>>> upgrade
>>>> for the same reasons or at the same time...
>>>>
>>>> Group A: If someone is paranoid about upgrading, and is still
>>>> running
>>>> lucene1.4.3 because they are afraid if they upgrade their app
>>>> will break
>>>> and they don't want to deal with it; they don't care about known
>>>> bugs in
>>>> lucene1.4.3, as long as those bugs haven't impacted them yet --
>>>> these
>>>> people aren't going to care wether we add a bunch of new methods to
>>>> interfaces, or remove a bunch of public methods from arbitrary
>>>> releases,
>>>> because they are never going to see them. They might do a total
>>>> rewrite
>>>> of their project later, and they'll worry about it then (when
>>>> they have
>>>> lots of time and QA resources)
>>>>
>>>> Group: B: At the other extreme, are the "free-spirited"
>>>> developers (god i
>>>> hate that that the word "agile" has been co-opted) who are
>>>> always eager to
>>>> upgrade to get the latest bells and whistles, and don't mind making
>>>> changes to code and recompiling everytime they upgrades -- just
>>>> as long as
>>>> there are some decent docs on what to change.
>>>>
>>>> Croup: C: In the middle is a larg group of people who are
>>>> interested in
>>>> upgrading, who want bug fixes, are willing to write new code to
>>>> take
>>>> advantage of new features, in some cases are even willing to make
>>>> small or medium changes their code to get really good performance
>>>> improvements ... but they don't have a lot of time or energy to
>>>> constantly
>>>> rewrite big chunks of their app. For these people, knowing that
>>>> they can
>>>> "drop in" the new version and it will work is a big reason why
>>>> there are
>>>> willing to upgrade, and why they are willing to spend soem time
>>>> tweaking code to take advantage of the new features and the new
>>>> performacne enhaced APIs -- becuase they don't have to spend a
>>>> lot of time
>>>> just to get the app working as well as it was before.
>>>>
>>>> To draw an analogy...
>>>>
>>>> Group A will stand in one place for a really long time no matter
>>>> how easy
>>>> the path is. Once in a great while they will decide to march
>>>> forward
>>>> dozens of miles in one big push, but only once they feel they have
>>>> adequate resources to make the entire trip at once.
>>>>
>>>> Group B likes to frolic, and will happily take two sptens
>>>> backward and
>>>> then 3 steps forward every day.
>>>>
>>>> Group C will walk forward with you at a steady pace, and
>>>> occasionally even
>>>> take a step back before moving forward, but only if the path is
>>>> clear and
>>>> not very steap.
>>>>
>>>> : I bet, if you did a poll of all Lucene users, you would find a
>>>> majority of
>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3,
>>>> or 3.0, that is
>>>> : still going to be the case.
>>>>
>>>> That's probably true, but a nice perk of our current backwards
>>>> compatibility commitments is that when people pop up asking
>>>> questions
>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>> problem" and that advice isn't a death sentence -- the steps to
>>>> move
>>>> forward are small and easy.
>>>>
>>>> I look at things the way things like Maven v1 vs v2 worked out,
>>>> and how
>>>> that fractured the community for a long time (as far as i can
>>>> tell it's
>>>> still pretty fractured) because the path from v1 to v2 was so
>>>> steep and
>>>> involved backtracking so much and i worry that if we make
>>>> changes to our
>>>> "copatibility pledge" that don't allow for an even forward walk,
>>>> we'll
>>>> wind up with a heavily fractured community.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


markrmiller at gmail

Jan 22, 2008, 6:35 PM

Post #27 of 70 (14287 views)
Permalink
Re: Back Compatibility [In reply to]

robert engels wrote:
> I think there are a lot of applications using Lucene where "whether
> its lost a bit of data or not" is not acceptable.
Yeah, and I have one of them. Which is why I would love the support your
talking about. But its not there yet and I am just grateful that i can
get my customers back up and searching as quick as possible rather than
experience an index corruption. Access to the data is more important
than complete access to the data for my customers (though theyd say they
certainly want both). After such an experience I have to run through the
database and check if anything from the index is missing, and if it is,
re index. Not ideal, but what can you do? I find it odd that you don't
think non corruption is better than nothing. Its a big feature for me.
If the server reboots at night and causes a corruption, I have customers
that will be SOL for some time...id prefer when the server reboots, my
index - whatever is left, is searchable. My customers need to work.
Can't get behind on a daily product :)

I'd prefer what your talking about, but there are tons of other things
I'd love to see in Lucene as well. It just seems odd to complain about
them. I'd think that instead, I might spear head the development. Just
not experienced enough myself to do a lot of the deeper work. You don't
appear so limited. How about helping out with some transactional support :)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 22, 2008, 7:18 PM

Post #28 of 70 (14289 views)
Permalink
Re: Back Compatibility [In reply to]

A specific example:

You have a criminal justice system that indexes past court cases.

You do a search for cases involving Joe Smith because you are a judge
and you want to review priors before sentencing. Similar issues with
related cases, case history, etc.

Is it better to return something that may not be correct, or return
an error saying the index is offline and is being rebuilt - please
perform your search later? In this case old false positives are just
as bad as missing new records. I hope that demonstrates the position
clearly.

As I stated, there are several classes of applications where "any
data" whether it is current or valid is acceptable, but I would argue
that in MOST cases this is not the case, and if the interested
subjects fully reviewed their requirements they would not accept that
solution. It is easily summarized with the old adage "garbage in,
garbage out".

The only reason that corruption is ok is that you need to reindex
anyway, and rebuilding from scratch is often faster than determining
the affected documents and updating (especially if corruption is a
possibility).

It was in fact me that brought about the issue that none of the
"lockless commits" code fixed anything related to corruption. The
only way to ensure non-corruption is to sync all data files, then
write and sync the segments file. I think this change could have
been accomplished in about 10 lines of code, and is completely
independent of lockless commits, and in most cases makes lockless
commits obsolete. But to be honest, I am not really certain how
lockless commits can actually work in an environment that allows
updates to the documents (and or related resources), so I am sure
there are aspects I am just ignorant of.

As an aside, we engineered our software years ago to work around
these issues, which why we still use a 1.9 derivative, and monitor
the trunk for important fixes an enhancements.

On Jan 22, 2008, at 8:35 PM, Mark Miller wrote:

>
>
> robert engels wrote:
>> I think there are a lot of applications using Lucene where
>> "whether its lost a bit of data or not" is not acceptable.
> Yeah, and I have one of them. Which is why I would love the support
> your talking about. But its not there yet and I am just grateful
> that i can get my customers back up and searching as quick as
> possible rather than experience an index corruption. Access to the
> data is more important than complete access to the data for my
> customers (though theyd say they certainly want both). After such
> an experience I have to run through the database and check if
> anything from the index is missing, and if it is, re index. Not
> ideal, but what can you do? I find it odd that you don't think non
> corruption is better than nothing. Its a big feature for me. If the
> server reboots at night and causes a corruption, I have customers
> that will be SOL for some time...id prefer when the server reboots,
> my index - whatever is left, is searchable. My customers need to
> work. Can't get behind on a daily product :)
>
> I'd prefer what your talking about, but there are tons of other
> things I'd love to see in Lucene as well. It just seems odd to
> complain about them. I'd think that instead, I might spear head the
> development. Just not experienced enough myself to do a lot of the
> deeper work. You don't appear so limited. How about helping out
> with some transactional support :)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 23, 2008, 4:25 AM

Post #29 of 70 (14280 views)
Permalink
Re: Back Compatibility [In reply to]

Catching up here...

Re the fracturing when Maven went from v1 -> v2: I think Lucene is a
totally different animal. Maven is an immense framework; Lucene is a
fairly small "core" set of APIs. I think for these "core" type
packages it's very important to keep drop-in compatibility as long as
possible.

I think we _really_ want our users to upgrade. Yes, there are alot of
A people who will forever be stuck in the past, but let's not make
barriers for them to switch to class C, or for class C to upgrade.
When someone is running old versions of Lucene it only hurts their (&
their friends & their users) perception of Lucene.

I think we've done a good job keeping backwards compatibility despite
some rather major recent changes:

* We now do segment merging in a BG thread

* We now flush by RAM (16 MB default) not at 10 buffered docs

* Merge selection is based on size of segment in bytes not doc count

* We will (in 2.4) "autoCommit" far less often (LUCENE-1044)

Now, we could have forced these into a major release instead, but, I
don't think we should have. As much as possible I think we should
keep on minor releases (keep backwards compatibility) so people can
always more easily upgrade.

As far as I know, the only solid reason for 3.0 is the
non-backwards-compatible switch to Java 1.5?

I do like the idea of a static/system property to match legacy
behavior. For example, the bugs around how StandardTokenizer
mislabels tokens (eg LUCENE-1100), this would be the perfect solution.
Clearly those are silly bugs that should be fixed, quickly, with this
back-compatible mode to keep the bug in place.

We might want to, instead, have ctors for many classes take a required
arg which states the version of Lucene you are using? So if you are
writing a new app you would pass in the current version. Then, on
dropping in a future Lucene JAR, we could use that arg to enforce the
right backwards compatibility. This would save users from having to
realize they are hitting one of these situations and then know to go
set the right static/property to retain the buggy behavior.

Also, backporting is extremely costly over time. I'd much rather keep
compatibility for longer on our forward releases, than spend our
scarce resources moving changes back.

So to summarize ... I think we should have (keep) a high tolerance for
cruft to maintain API compatibility. I think our current approach
(try hard to keep compatibility during "minor" releases, then
deprecate, then remove APIs on a major release; do major releases only
when truly required) is a good one.

Mike

Chris Hostetter wrote:

>
> : To paraphrase a dead English guy: A rose by any other name is
> still the same,
> : right?
> :
> : Basically, all the version number tick saves them from is having
> to read the
> : CHANGES file, right?
>
> Correct: i'm not disagreeing with your basic premise, just pointing
> out
> that it can be done with the current model, and that predicable
> "version
> identifiers" are a good idea when dealing with backwards
> compatibility.
>
> : Thus, the version numbers become meaningless; the question is
> what do we see
> : as best for Lucene? We could just as easily call it Lucene
> Summer '08 and
> : Lucene Winter '08. Heck, we could pull the old MS Word 2.0 to MS
> Word 6.0 and
>
> well .. i would argue that with what you hpothozied *then* version
> numbers
> would becoming meaningless ... having 3.0, 3.1, 3.2, 4.0 would be no
> differnet then having 3, 4, 5, 6 -- our version numbers would be
> identifiers with no other context ... i'm just saying we should
> keep the
> context in so that you know whether or not version X is backwards
> compatible with version Y.
>
> Which is not to say that we shouldn't hcange our version number
> format...
>
> Ie: we could start using quad-tuple version numbers: 3.2.5.0
> instead of 3.5.0
>
> 3: major version #
> identifies file format back compatibility (as today)
> 2: api compat version #
> classes/methods may be removed when this changes
> 5: minor version #
> new methods may be added when this changes (as today)
> 0: patch version #
> changes only when there are serious bug fixes
>
> ...that might mean that our version numbers go...
>
> 3.0.0.0
> 3.0.1.0
> 3.1.0.0
> 3.1.1.0
> 3.1.2.0
> 3.2.0.0
>
> ...where most numbers never get above "2" but at least the version
> number
> conveys useful compatibility information (at no added developer
> "cost")
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 23, 2008, 5:57 AM

Post #30 of 70 (14275 views)
Permalink
Re: Back Compatibility [In reply to]

Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
core missing in order for you (or, someone) to build XA compliance on
top of it?

Ie, you can open a writer with autoCommit=false and no changes are
committed until you close it. You can abort the session by calling
writer.abort(). What's still missing, besides LUCENE-1044?

Mike

robert engels wrote:

> One more example on this. A lot of work was done on transaction
> support. I would argue that this falls way short of what is needed,
> since there is no XA transaction support. Since the lucene index
> (unless stored in an XA db) is a separate resource, it really needs
> XA support in order to be consistent with the other resources.
>
> All of the transaction work that has been performed only guarantees
> that barring a physical hardware failure the lucene index can be
> opened and used at a known state. This index though is probably
> not consistent with the other resources.
>
> All that was done is that we can now guarantee that the index is
> consistent at SOME point in time.
>
> Given the work that was done, we are probably closer to adding XA
> support, but I think this would be much easier if the concept of a
> transaction was made first class through the API (and then XA
> transactions need to be supported).
>
> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>
>> I don't think group C is interested in bug fixes. I just don't see
>> how Lucene is at all useful if the users are encountering any bug
>> - so they either don't use that feature, or they have already
>> developed a work-around (or they have patched the code in a way
>> that avoids the bug, yet is specific to their environment).
>>
>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>> substantial. I think the actual number of people trying to use NFS
>> is probably very low - as the initial implementation had so many
>> problems (and IMO is not a very good solution for distributed
>> indexes anyway). So all the work in trying to make NFS work
>> "correctly" behind the scenes may have been inefficient, since a
>> more direct, yet major fix may have solved the problem better
>> (like distributed server support, not shared index access).
>>
>> I just think that trying to maintain API compatibility through
>> major releases is a bad idea. Leads to bloat, and complex code -
>> both internal and external. In order to achieve great gains in
>> usability and/or performance in a mature product like Lucene
>> almost certainly requires massive changes to the processes,
>> algorithms and structures, and the API should change as well to
>> reflect this.
>>
>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>
>>>
>>> : If they are " no longer actively developing the portion of the
>>> code that's
>>> : broken, aren't seeking the new feature, etc", and they stay
>>> back on old
>>> : versions... isn't that exactly what we want? They can stay on
>>> the old version,
>>> : and new application development uses the newer version.
>>>
>>> This basically mirrors a philosophy that is rising in the Perl
>>> community evangelized by (a really smart dude named chromatic) ...
>>> "why are we worry about the effect of upgrades on users who don't
>>> upgrade?"
>>>
>>> The problem is not all users are created equal and not all users
>>> upgrade
>>> for the same reasons or at the same time...
>>>
>>> Group A: If someone is paranoid about upgrading, and is still
>>> running
>>> lucene1.4.3 because they are afraid if they upgrade their app
>>> will break
>>> and they don't want to deal with it; they don't care about known
>>> bugs in
>>> lucene1.4.3, as long as those bugs haven't impacted them yet --
>>> these
>>> people aren't going to care wether we add a bunch of new methods to
>>> interfaces, or remove a bunch of public methods from arbitrary
>>> releases,
>>> because they are never going to see them. They might do a total
>>> rewrite
>>> of their project later, and they'll worry about it then (when
>>> they have
>>> lots of time and QA resources)
>>>
>>> Group: B: At the other extreme, are the "free-spirited"
>>> developers (god i
>>> hate that that the word "agile" has been co-opted) who are always
>>> eager to
>>> upgrade to get the latest bells and whistles, and don't mind making
>>> changes to code and recompiling everytime they upgrades -- just
>>> as long as
>>> there are some decent docs on what to change.
>>>
>>> Croup: C: In the middle is a larg group of people who are
>>> interested in
>>> upgrading, who want bug fixes, are willing to write new code to take
>>> advantage of new features, in some cases are even willing to make
>>> small or medium changes their code to get really good performance
>>> improvements ... but they don't have a lot of time or energy to
>>> constantly
>>> rewrite big chunks of their app. For these people, knowing that
>>> they can
>>> "drop in" the new version and it will work is a big reason why
>>> there are
>>> willing to upgrade, and why they are willing to spend soem time
>>> tweaking code to take advantage of the new features and the new
>>> performacne enhaced APIs -- becuase they don't have to spend a
>>> lot of time
>>> just to get the app working as well as it was before.
>>>
>>> To draw an analogy...
>>>
>>> Group A will stand in one place for a really long time no matter
>>> how easy
>>> the path is. Once in a great while they will decide to march
>>> forward
>>> dozens of miles in one big push, but only once they feel they have
>>> adequate resources to make the entire trip at once.
>>>
>>> Group B likes to frolic, and will happily take two sptens
>>> backward and
>>> then 3 steps forward every day.
>>>
>>> Group C will walk forward with you at a steady pace, and
>>> occasionally even
>>> take a step back before moving forward, but only if the path is
>>> clear and
>>> not very steap.
>>>
>>> : I bet, if you did a poll of all Lucene users, you would find a
>>> majority of
>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or
>>> 3.0, that is
>>> : still going to be the case.
>>>
>>> That's probably true, but a nice perk of our current backwards
>>> compatibility commitments is that when people pop up asking
>>> questions
>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>> problem" and that advice isn't a death sentence -- the steps to move
>>> forward are small and easy.
>>>
>>> I look at things the way things like Maven v1 vs v2 worked out,
>>> and how
>>> that fractured the community for a long time (as far as i can
>>> tell it's
>>> still pretty fractured) because the path from v1 to v2 was so
>>> steep and
>>> involved backtracking so much and i worry that if we make changes
>>> to our
>>> "copatibility pledge" that don't allow for an even forward walk,
>>> we'll
>>> wind up with a heavily fractured community.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


markrmiller at gmail

Jan 23, 2008, 6:53 AM

Post #31 of 70 (14288 views)
Permalink
Re: Back Compatibility [In reply to]

Thats where Robert is confusing me as well. To have XA support you just
need to be able to define a transaction, atomically commit, or rollback.
You also need a consistent state after any of these operations.
LUCENE-1044 seems to guarantee that, and so isn't it more like finishing
up needed work than going down the wrong path? It seems more to me (and
obviously I know a lot less about this than either of you) that you have
just gotten Lucene ready to add XA support. Lucene now fulfills all of
the requirements. No? Someone just needs to write a boatload of JTA code :)

It would seem the next step would be, as Robert suggests, to make a
transaction a first class citizen. The XA protocol will require Lucene
to communicate with the TM about what transactions it has completed to
help in failure recovery and transaction management. I can certainly see
the need for a better transaction abstraction to help with this.

A little enlightenment on this would be great robert. I am very
interested in it for future projects.

And I have to point out...it just seems logical that we would make
things so that the index was consistent at some point before taking the
next step of making it consistent with other resources...no? I am just
still confused about Roberts objections to what is going on here. I
think that it would be a real leap forward to get it done though.

Also, as he mentioned, we really need a good distributed system that
allows for index partitioning. Thats the ticket to more enterprise
adoption. Could be Solr's work though...

Michael McCandless wrote:
>
> Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
> core missing in order for you (or, someone) to build XA compliance on
> top of it?
>
> Ie, you can open a writer with autoCommit=false and no changes are
> committed until you close it. You can abort the session by calling
> writer.abort(). What's still missing, besides LUCENE-1044?
>
> Mike
>
> robert engels wrote:
>
>> One more example on this. A lot of work was done on transaction
>> support. I would argue that this falls way short of what is needed,
>> since there is no XA transaction support. Since the lucene index
>> (unless stored in an XA db) is a separate resource, it really needs
>> XA support in order to be consistent with the other resources.
>>
>> All of the transaction work that has been performed only guarantees
>> that barring a physical hardware failure the lucene index can be
>> opened and used at a known state. This index though is probably not
>> consistent with the other resources.
>>
>> All that was done is that we can now guarantee that the index is
>> consistent at SOME point in time.
>>
>> Given the work that was done, we are probably closer to adding XA
>> support, but I think this would be much easier if the concept of a
>> transaction was made first class through the API (and then XA
>> transactions need to be supported).
>>
>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>
>>> I don't think group C is interested in bug fixes. I just don't see
>>> how Lucene is at all useful if the users are encountering any bug -
>>> so they either don't use that feature, or they have already
>>> developed a work-around (or they have patched the code in a way that
>>> avoids the bug, yet is specific to their environment).
>>>
>>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>>> substantial. I think the actual number of people trying to use NFS
>>> is probably very low - as the initial implementation had so many
>>> problems (and IMO is not a very good solution for distributed
>>> indexes anyway). So all the work in trying to make NFS work
>>> "correctly" behind the scenes may have been inefficient, since a
>>> more direct, yet major fix may have solved the problem better (like
>>> distributed server support, not shared index access).
>>>
>>> I just think that trying to maintain API compatibility through major
>>> releases is a bad idea. Leads to bloat, and complex code - both
>>> internal and external. In order to achieve great gains in usability
>>> and/or performance in a mature product like Lucene almost certainly
>>> requires massive changes to the processes, algorithms and
>>> structures, and the API should change as well to reflect this.
>>>
>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : If they are " no longer actively developing the portion of the
>>>> code that's
>>>> : broken, aren't seeking the new feature, etc", and they stay back
>>>> on old
>>>> : versions... isn't that exactly what we want? They can stay on the
>>>> old version,
>>>> : and new application development uses the newer version.
>>>>
>>>> This basically mirrors a philosophy that is rising in the Perl
>>>> community evangelized by (a really smart dude named chromatic) ...
>>>> "why are we worry about the effect of upgrades on users who don't
>>>> upgrade?"
>>>>
>>>> The problem is not all users are created equal and not all users
>>>> upgrade
>>>> for the same reasons or at the same time...
>>>>
>>>> Group A: If someone is paranoid about upgrading, and is still running
>>>> lucene1.4.3 because they are afraid if they upgrade their app will
>>>> break
>>>> and they don't want to deal with it; they don't care about known
>>>> bugs in
>>>> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
>>>> people aren't going to care wether we add a bunch of new methods to
>>>> interfaces, or remove a bunch of public methods from arbitrary
>>>> releases,
>>>> because they are never going to see them. They might do a total
>>>> rewrite
>>>> of their project later, and they'll worry about it then (when they
>>>> have
>>>> lots of time and QA resources)
>>>>
>>>> Group: B: At the other extreme, are the "free-spirited" developers
>>>> (god i
>>>> hate that that the word "agile" has been co-opted) who are always
>>>> eager to
>>>> upgrade to get the latest bells and whistles, and don't mind making
>>>> changes to code and recompiling everytime they upgrades -- just as
>>>> long as
>>>> there are some decent docs on what to change.
>>>>
>>>> Croup: C: In the middle is a larg group of people who are
>>>> interested in
>>>> upgrading, who want bug fixes, are willing to write new code to take
>>>> advantage of new features, in some cases are even willing to make
>>>> small or medium changes their code to get really good performance
>>>> improvements ... but they don't have a lot of time or energy to
>>>> constantly
>>>> rewrite big chunks of their app. For these people, knowing that
>>>> they can
>>>> "drop in" the new version and it will work is a big reason why
>>>> there are
>>>> willing to upgrade, and why they are willing to spend soem time
>>>> tweaking code to take advantage of the new features and the new
>>>> performacne enhaced APIs -- becuase they don't have to spend a lot
>>>> of time
>>>> just to get the app working as well as it was before.
>>>>
>>>> To draw an analogy...
>>>>
>>>> Group A will stand in one place for a really long time no matter
>>>> how easy
>>>> the path is. Once in a great while they will decide to march forward
>>>> dozens of miles in one big push, but only once they feel they have
>>>> adequate resources to make the entire trip at once.
>>>>
>>>> Group B likes to frolic, and will happily take two sptens backward and
>>>> then 3 steps forward every day.
>>>>
>>>> Group C will walk forward with you at a steady pace, and
>>>> occasionally even
>>>> take a step back before moving forward, but only if the path is
>>>> clear and
>>>> not very steap.
>>>>
>>>> : I bet, if you did a poll of all Lucene users, you would find a
>>>> majority of
>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or
>>>> 3.0, that is
>>>> : still going to be the case.
>>>>
>>>> That's probably true, but a nice perk of our current backwards
>>>> compatibility commitments is that when people pop up asking questions
>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>> problem" and that advice isn't a death sentence -- the steps to move
>>>> forward are small and easy.
>>>>
>>>> I look at things the way things like Maven v1 vs v2 worked out, and
>>>> how
>>>> that fractured the community for a long time (as far as i can tell
>>>> it's
>>>> still pretty fractured) because the path from v1 to v2 was so steep
>>>> and
>>>> involved backtracking so much and i worry that if we make changes
>>>> to our
>>>> "copatibility pledge" that don't allow for an even forward walk, we'll
>>>> wind up with a heavily fractured community.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yonik at apache

Jan 23, 2008, 7:24 AM

Post #32 of 70 (14297 views)
Permalink
Re: Back Compatibility [In reply to]

On Jan 23, 2008 9:53 AM, Mark Miller <markrmiller [at] gmail> wrote:
> Also, as he mentioned, we really need a good distributed system that
> allows for index partitioning. Thats the ticket to more enterprise
> adoption. Could be Solr's work though...

Yes, we're working on that :-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 23, 2008, 8:54 AM

Post #33 of 70 (14280 views)
Permalink
Re: Back Compatibility [In reply to]

Maybe I don't understand lockless commits then.

I just don't think you can enforce transactional consistency without
either 1) locking, or 2) optimistic collision detection. I could be
wrong here, but this has been my experience.

By effectively removing the locking requirement, I think you are
going to have users developing code without thought as to what is
going to happen when locking is added. This is going to break the
backwards compatibility that people are striving for.

The lucene "writer" structure needs to be something like:

start tx for update
do work
commit

where commit is composed of (prepare and commit phases), but commit
may fail.

It is unknown if this can actually happen though, since there is no
unique ID that could cause collisions, but there is the internal id
(which would need to remain constant throughout the tx in order for
queries and delete operations to work).

I am sure it is that I don't understand lockless commits, so I will
give a scenario.

client A issues query looking for documents with OID (a field) =
"some field";
client B issues same query
both queries return nothing found
client A inserts document with OID = "some filed"
client B inserts document with OID = "some field"

client A commits and client B commits

unless B is blocked, once A issues the query, the index is going to
end up with 2 different copies of the document.

I understand that Lucene is not a database, and has no concept of
unique constraints. It is my understand that this has been overcome
using locks and sequential access to the index when writing.

In a simple XA implementation, client A would open a SERIALIZABLE
transaction, which would block B from even reading the index. Most
simple XA implementation only support READ_COMMITTED, SERIALIZABLE,
and NONE.

There are other ways of offering finer grained locking (based on
internal id and timestamps), but most are going to need a "server
based" implementation of lucene to pull off.

To summarize, I think the "shared filestore (NFS)" and "lockless
commits" make implementing transactions very difficult. I am sure I
am missing something here, I just don't see what.

On Jan 23, 2008, at 8:53 AM, Mark Miller wrote:

> Thats where Robert is confusing me as well. To have XA support you
> just need to be able to define a transaction, atomically commit, or
> rollback. You also need a consistent state after any of these
> operations. LUCENE-1044 seems to guarantee that, and so isn't it
> more like finishing up needed work than going down the wrong path?
> It seems more to me (and obviously I know a lot less about this
> than either of you) that you have just gotten Lucene ready to add
> XA support. Lucene now fulfills all of the requirements. No?
> Someone just needs to write a boatload of JTA code :)
>
> It would seem the next step would be, as Robert suggests, to make a
> transaction a first class citizen. The XA protocol will require
> Lucene to communicate with the TM about what transactions it has
> completed to help in failure recovery and transaction management. I
> can certainly see the need for a better transaction abstraction to
> help with this.
>
> A little enlightenment on this would be great robert. I am very
> interested in it for future projects.
>
> And I have to point out...it just seems logical that we would make
> things so that the index was consistent at some point before taking
> the next step of making it consistent with other resources...no? I
> am just still confused about Roberts objections to what is going on
> here. I think that it would be a real leap forward to get it done
> though.
>
> Also, as he mentioned, we really need a good distributed system
> that allows for index partitioning. Thats the ticket to more
> enterprise adoption. Could be Solr's work though...
>
> Michael McCandless wrote:
>>
>> Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
>> core missing in order for you (or, someone) to build XA compliance on
>> top of it?
>>
>> Ie, you can open a writer with autoCommit=false and no changes are
>> committed until you close it. You can abort the session by calling
>> writer.abort(). What's still missing, besides LUCENE-1044?
>>
>> Mike
>>
>> robert engels wrote:
>>
>>> One more example on this. A lot of work was done on transaction
>>> support. I would argue that this falls way short of what is
>>> needed, since there is no XA transaction support. Since the
>>> lucene index (unless stored in an XA db) is a separate resource,
>>> it really needs XA support in order to be consistent with the
>>> other resources.
>>>
>>> All of the transaction work that has been performed only
>>> guarantees that barring a physical hardware failure the lucene
>>> index can be opened and used at a known state. This index though
>>> is probably not consistent with the other resources.
>>>
>>> All that was done is that we can now guarantee that the index is
>>> consistent at SOME point in time.
>>>
>>> Given the work that was done, we are probably closer to adding XA
>>> support, but I think this would be much easier if the concept of
>>> a transaction was made first class through the API (and then XA
>>> transactions need to be supported).
>>>
>>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>>
>>>> I don't think group C is interested in bug fixes. I just don't
>>>> see how Lucene is at all useful if the users are encountering
>>>> any bug - so they either don't use that feature, or they have
>>>> already developed a work-around (or they have patched the code
>>>> in a way that avoids the bug, yet is specific to their
>>>> environment).
>>>>
>>>> For example, I think the NFS work (bugs, fixes, etc.) was quite
>>>> substantial. I think the actual number of people trying to use
>>>> NFS is probably very low - as the initial implementation had so
>>>> many problems (and IMO is not a very good solution for
>>>> distributed indexes anyway). So all the work in trying to make
>>>> NFS work "correctly" behind the scenes may have been
>>>> inefficient, since a more direct, yet major fix may have solved
>>>> the problem better (like distributed server support, not shared
>>>> index access).
>>>>
>>>> I just think that trying to maintain API compatibility through
>>>> major releases is a bad idea. Leads to bloat, and complex code -
>>>> both internal and external. In order to achieve great gains in
>>>> usability and/or performance in a mature product like Lucene
>>>> almost certainly requires massive changes to the processes,
>>>> algorithms and structures, and the API should change as well to
>>>> reflect this.
>>>>
>>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>>
>>>>>
>>>>> : If they are " no longer actively developing the portion of
>>>>> the code that's
>>>>> : broken, aren't seeking the new feature, etc", and they stay
>>>>> back on old
>>>>> : versions... isn't that exactly what we want? They can stay on
>>>>> the old version,
>>>>> : and new application development uses the newer version.
>>>>>
>>>>> This basically mirrors a philosophy that is rising in the Perl
>>>>> community evangelized by (a really smart dude named chromatic) ...
>>>>> "why are we worry about the effect of upgrades on users who
>>>>> don't upgrade?"
>>>>>
>>>>> The problem is not all users are created equal and not all
>>>>> users upgrade
>>>>> for the same reasons or at the same time...
>>>>>
>>>>> Group A: If someone is paranoid about upgrading, and is still
>>>>> running
>>>>> lucene1.4.3 because they are afraid if they upgrade their app
>>>>> will break
>>>>> and they don't want to deal with it; they don't care about
>>>>> known bugs in
>>>>> lucene1.4.3, as long as those bugs haven't impacted them yet --
>>>>> these
>>>>> people aren't going to care wether we add a bunch of new
>>>>> methods to
>>>>> interfaces, or remove a bunch of public methods from arbitrary
>>>>> releases,
>>>>> because they are never going to see them. They might do a
>>>>> total rewrite
>>>>> of their project later, and they'll worry about it then (when
>>>>> they have
>>>>> lots of time and QA resources)
>>>>>
>>>>> Group: B: At the other extreme, are the "free-spirited"
>>>>> developers (god i
>>>>> hate that that the word "agile" has been co-opted) who are
>>>>> always eager to
>>>>> upgrade to get the latest bells and whistles, and don't mind
>>>>> making
>>>>> changes to code and recompiling everytime they upgrades -- just
>>>>> as long as
>>>>> there are some decent docs on what to change.
>>>>>
>>>>> Croup: C: In the middle is a larg group of people who are
>>>>> interested in
>>>>> upgrading, who want bug fixes, are willing to write new code to
>>>>> take
>>>>> advantage of new features, in some cases are even willing to make
>>>>> small or medium changes their code to get really good performance
>>>>> improvements ... but they don't have a lot of time or energy to
>>>>> constantly
>>>>> rewrite big chunks of their app. For these people, knowing
>>>>> that they can
>>>>> "drop in" the new version and it will work is a big reason why
>>>>> there are
>>>>> willing to upgrade, and why they are willing to spend soem time
>>>>> tweaking code to take advantage of the new features and the new
>>>>> performacne enhaced APIs -- becuase they don't have to spend a
>>>>> lot of time
>>>>> just to get the app working as well as it was before.
>>>>>
>>>>> To draw an analogy...
>>>>>
>>>>> Group A will stand in one place for a really long time no
>>>>> matter how easy
>>>>> the path is. Once in a great while they will decide to march
>>>>> forward
>>>>> dozens of miles in one big push, but only once they feel they have
>>>>> adequate resources to make the entire trip at once.
>>>>>
>>>>> Group B likes to frolic, and will happily take two sptens
>>>>> backward and
>>>>> then 3 steps forward every day.
>>>>>
>>>>> Group C will walk forward with you at a steady pace, and
>>>>> occasionally even
>>>>> take a step back before moving forward, but only if the path is
>>>>> clear and
>>>>> not very steap.
>>>>>
>>>>> : I bet, if you did a poll of all Lucene users, you would find
>>>>> a majority of
>>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3,
>>>>> or 3.0, that is
>>>>> : still going to be the case.
>>>>>
>>>>> That's probably true, but a nice perk of our current backwards
>>>>> compatibility commitments is that when people pop up asking
>>>>> questions
>>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>>> problem" and that advice isn't a death sentence -- the steps to
>>>>> move
>>>>> forward are small and easy.
>>>>>
>>>>> I look at things the way things like Maven v1 vs v2 worked out,
>>>>> and how
>>>>> that fractured the community for a long time (as far as i can
>>>>> tell it's
>>>>> still pretty fractured) because the path from v1 to v2 was so
>>>>> steep and
>>>>> involved backtracking so much and i worry that if we make
>>>>> changes to our
>>>>> "copatibility pledge" that don't allow for an even forward
>>>>> walk, we'll
>>>>> wind up with a heavily fractured community.
>>>>>
>>>>>
>>>>>
>>>>> -Hoss
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 23, 2008, 9:55 AM

Post #34 of 70 (14283 views)
Permalink
Re: Back Compatibility [In reply to]

robert engels wrote:

> Maybe I don't understand lockless commits then.
>
> I just don't think you can enforce transactional consistency
> without either 1) locking, or 2) optimistic collision detection. I
> could be wrong here, but this has been my experience.
> By effectively removing the locking requirement, I think you are
> going to have users developing code without thought as to what is
> going to happen when locking is added. This is going to break the
> backwards compatibility that people are striving for.

Lucene still has locking (write.lock), to only allow one writer at a
time to make changes to the index (ie, it serializes writer
sessions). Lock-less commits just replaced the old "commit.lock".

> The lucene "writer" structure needs to be something like:
>
> start tx for update
> do work
> commit
>
> where commit is composed of (prepare and commit phases), but commit
> may fail.

Right, this is what IndexWriter does now. It's just that with
autoCommit=false you have total control on when that commit takes
place (only on closing the writer).

> It is unknown if this can actually happen though, since there is no
> unique ID that could cause collisions, but there is the internal id
> (which would need to remain constant throughout the tx in order for
> queries and delete operations to work).

Yes but there are other errors that Lucene may hit, like disk full,
which must (and do) rollback the commit to the start of the
transaction (ie, index state when writer was first opened).

> I am sure it is that I don't understand lockless commits, so I will
> give a scenario.
>
> client A issues query looking for documents with OID (a field) =
> "some field";
> client B issues same query
> both queries return nothing found
> client A inserts document with OID = "some filed"
> client B inserts document with OID = "some field"
>
> client A commits and client B commits
>
> unless B is blocked, once A issues the query, the index is going to
> end up with 2 different copies of the document.
>
> I understand that Lucene is not a database, and has no concept of
> unique constraints. It is my understand that this has been overcome
> using locks and sequential access to the index when writing.
>
> In a simple XA implementation, client A would open a SERIALIZABLE
> transaction, which would block B from even reading the index. Most
> simple XA implementation only support READ_COMMITTED, SERIALIZABLE,
> and NONE.
>
> There are other ways of offering finer grained locking (based on
> internal id and timestamps), but most are going to need a "server
> based" implementation of lucene to pull off.
>
> To summarize, I think the "shared filestore (NFS)" and "lockless
> commits" make implementing transactions very difficult. I am sure I
> am missing something here, I just don't see what.

Lucene hasn't ever supported that case above: it never blocks a
reader from opening the index. But, you could easily build that on
top of Lucene, right?

I'm still trying to understand what you feel is missing in the core
that prevents you from building XA (or, your own transactions
handling that involves another resource like a DB) on top of Lucene...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 23, 2008, 10:58 AM

Post #35 of 70 (14288 views)
Permalink
Re: Back Compatibility [In reply to]

I guess I don't understand what a commit lock is, or what's its
purpose is. It seems the write lock is all that is needed.

If you still need a write lock, then what is the purpose of
"lockless" commits.

You can get consistency if all writers get the write lock before
performing any read. It would seem this should be the requirement???

Is there a Wiki or some such thing that discusses the "lockless
commits", their purpose and their implementation? I find the email
thread a bit cumbersome to review.


On Jan 23, 2008, at 11:55 AM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> Maybe I don't understand lockless commits then.
>>
>> I just don't think you can enforce transactional consistency
>> without either 1) locking, or 2) optimistic collision detection. I
>> could be wrong here, but this has been my experience.
>> By effectively removing the locking requirement, I think you are
>> going to have users developing code without thought as to what is
>> going to happen when locking is added. This is going to break the
>> backwards compatibility that people are striving for.
>
> Lucene still has locking (write.lock), to only allow one writer at
> a time to make changes to the index (ie, it serializes writer
> sessions). Lock-less commits just replaced the old "commit.lock".
>
>> The lucene "writer" structure needs to be something like:
>>
>> start tx for update
>> do work
>> commit
>>
>> where commit is composed of (prepare and commit phases), but
>> commit may fail.
>
> Right, this is what IndexWriter does now. It's just that with
> autoCommit=false you have total control on when that commit takes
> place (only on closing the writer).
>
>> It is unknown if this can actually happen though, since there is
>> no unique ID that could cause collisions, but there is the
>> internal id (which would need to remain constant throughout the tx
>> in order for queries and delete operations to work).
>
> Yes but there are other errors that Lucene may hit, like disk full,
> which must (and do) rollback the commit to the start of the
> transaction (ie, index state when writer was first opened).
>
>> I am sure it is that I don't understand lockless commits, so I
>> will give a scenario.
>>
>> client A issues query looking for documents with OID (a field) =
>> "some field";
>> client B issues same query
>> both queries return nothing found
>> client A inserts document with OID = "some filed"
>> client B inserts document with OID = "some field"
>>
>> client A commits and client B commits
>>
>> unless B is blocked, once A issues the query, the index is going
>> to end up with 2 different copies of the document.
>>
>> I understand that Lucene is not a database, and has no concept of
>> unique constraints. It is my understand that this has been
>> overcome using locks and sequential access to the index when writing.
>>
>> In a simple XA implementation, client A would open a SERIALIZABLE
>> transaction, which would block B from even reading the index.
>> Most simple XA implementation only support READ_COMMITTED,
>> SERIALIZABLE, and NONE.
>>
>> There are other ways of offering finer grained locking (based on
>> internal id and timestamps), but most are going to need a "server
>> based" implementation of lucene to pull off.
>>
>> To summarize, I think the "shared filestore (NFS)" and "lockless
>> commits" make implementing transactions very difficult. I am sure
>> I am missing something here, I just don't see what.
>
> Lucene hasn't ever supported that case above: it never blocks a
> reader from opening the index. But, you could easily build that on
> top of Lucene, right?
>
> I'm still trying to understand what you feel is missing in the core
> that prevents you from building XA (or, your own transactions
> handling that involves another resource like a DB) on top of Lucene...
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 23, 2008, 11:40 AM

Post #36 of 70 (14286 views)
Permalink
Re: Back Compatibility [In reply to]

robert engels wrote:

> I guess I don't understand what a commit lock is, or what's its
> purpose is. It seems the write lock is all that is needed.

The commit.lock was used to guard access to the "segments" file. A
reader would acquire the lock (blocking out other readers and
writers) when reading the file. And a writer would acquire the lock
when writing it.

> If you still need a write lock, then what is the purpose of
> "lockless" commits.

Lockless commits got rid of one lock (commit.lock), not write.lock.

> You can get consistency if all writers get the write lock before
> performing any read. It would seem this should be the requirement???

In Lucene, you use an IndexReader to do reads (not a writer), which
does not block other readers.

> Is there a Wiki or some such thing that discusses the "lockless
> commits", their purpose and their implementation? I find the email
> thread a bit cumbersome to review.

No, but really the concept is very simple: instead of writing to
segments, we write to segments_1, then segments_2, etc.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 23, 2008, 12:00 PM

Post #37 of 70 (14286 views)
Permalink
Re: Back Compatibility [In reply to]

: I do like the idea of a static/system property to match legacy
: behavior. For example, the bugs around how StandardTokenizer
: mislabels tokens (eg LUCENE-1100), this would be the perfect solution.
: Clearly those are silly bugs that should be fixed, quickly, with this
: back-compatible mode to keep the bug in place.
:
: We might want to, instead, have ctors for many classes take a required
: arg which states the version of Lucene you are using? So if you are
: writing a new app you would pass in the current version. Then, on
: dropping in a future Lucene JAR, we could use that arg to enforce the
: right backwards compatibility. This would save users from having to
: realize they are hitting one of these situations and then know to go
: set the right static/property to retain the buggy behavior.

I'm not sure that this would be better though ... when i write my code, i
pass "2.3" to all these constructors (or factory methods) and then later i
want to upgrade to 2.3 to get all the new performance goodness ... i
shouldn't have to change all those constructor calls to get all the 2.4
goodness, i should be able to leave my code as is -- but if i do that,
then i might not get all the 2.4 goodness, (like improved
tokenization, or more precise segment merging) because some of that
goodness violates previous assumptions that some code might have had ...
my code doesn't have those assumptions, i know nothing about them, i'll
take whatever behavior the Lucene Developers recommend (unless i see
evidence that it breaks something, in which case i'll happily set a
system property or something that the release notes say will force the
old behavior.

The basic principle being: by default, give users the behavior that is
generally viewed as "correct" -- but give them the option to force
"uncorrect" legacy behavior.

: Also, backporting is extremely costly over time. I'd much rather keep
: compatibility for longer on our forward releases, than spend our
: scarce resources moving changes back.

+1

: So to summarize ... I think we should have (keep) a high tolerance for
: cruft to maintain API compatibility. I think our current approach
: (try hard to keep compatibility during "minor" releases, then
: deprecate, then remove APIs on a major release; do major releases only
: when truly required) is a good one.

i'm with you for the most part, it's just the defintion of "when truly
required" that tends to hang people up ... there's a chicken vs egg
problem of deciding wether the code should drive what the next release
number is: "i've added a bitch'n feature but it requires adding a method
to an interface, therefor the next release must be called 4.0" ... vs the
mindset that "we just had a 3.0 release, it's too soon for another major
release, the next release should be called 3.1, so we need to hold off on
commiting non backwards compatible changes for a while."

I'm in the first camp: version numbers should be descriptive, information
carrying, labels for releases -- but the version number of a release
should be dicated by the code contained in that release. (if that means
the next version after 3.0.0 is 4.0.0, then so be it.)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 23, 2008, 12:08 PM

Post #38 of 70 (14277 views)
Permalink
Re: Back Compatibility [In reply to]

Thanks.

So all writers still need to get the write lock, before opening the
reader in order to maintain transactional consistency.

Was there performance testing done on the lockless commits with heavy
contention? I would think that reading the directory to find the
latest segments file would be slower. Is there a 'latest segments'
file to avoid this? If not, there probably should be. As long as the
data fits in a single disk block (which is should), I don't think you
will have a consistency problem.

On Jan 23, 2008, at 1:40 PM, Michael McCandless wrote:

> robert engels wrote:
>
>> I guess I don't understand what a commit lock is, or what's its
>> purpose is. It seems the write lock is all that is needed.
>
> The commit.lock was used to guard access to the "segments" file. A
> reader would acquire the lock (blocking out other readers and
> writers) when reading the file. And a writer would acquire the
> lock when writing it.
>
>> If you still need a write lock, then what is the purpose of
>> "lockless" commits.
>
> Lockless commits got rid of one lock (commit.lock), not write.lock.
>
>> You can get consistency if all writers get the write lock before
>> performing any read. It would seem this should be the requirement???
>
> In Lucene, you use an IndexReader to do reads (not a writer), which
> does not block other readers.
>
>> Is there a Wiki or some such thing that discusses the "lockless
>> commits", their purpose and their implementation? I find the email
>> thread a bit cumbersome to review.
>
> No, but really the concept is very simple: instead of writing to
> segments, we write to segments_1, then segments_2, etc.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 23, 2008, 12:09 PM

Post #39 of 70 (14291 views)
Permalink
Re: Back Compatibility [In reply to]

I guess I don't see the back-porting as an issue. Only those that
want to need to do the back-porting. Head moves on...


On Jan 23, 2008, at 2:00 PM, Chris Hostetter wrote:

>
> : I do like the idea of a static/system property to match legacy
> : behavior. For example, the bugs around how StandardTokenizer
> : mislabels tokens (eg LUCENE-1100), this would be the perfect
> solution.
> : Clearly those are silly bugs that should be fixed, quickly, with
> this
> : back-compatible mode to keep the bug in place.
> :
> : We might want to, instead, have ctors for many classes take a
> required
> : arg which states the version of Lucene you are using? So if you are
> : writing a new app you would pass in the current version. Then, on
> : dropping in a future Lucene JAR, we could use that arg to enforce
> the
> : right backwards compatibility. This would save users from having to
> : realize they are hitting one of these situations and then know to go
> : set the right static/property to retain the buggy behavior.
>
> I'm not sure that this would be better though ... when i write my
> code, i
> pass "2.3" to all these constructors (or factory methods) and then
> later i
> want to upgrade to 2.3 to get all the new performance goodness ... i
> shouldn't have to change all those constructor calls to get all the
> 2.4
> goodness, i should be able to leave my code as is -- but if i do that,
> then i might not get all the 2.4 goodness, (like improved
> tokenization, or more precise segment merging) because some of that
> goodness violates previous assumptions that some code might have
> had ...
> my code doesn't have those assumptions, i know nothing about them,
> i'll
> take whatever behavior the Lucene Developers recommend (unless i see
> evidence that it breaks something, in which case i'll happily set a
> system property or something that the release notes say will force the
> old behavior.
>
> The basic principle being: by default, give users the behavior that is
> generally viewed as "correct" -- but give them the option to force
> "uncorrect" legacy behavior.
>
> : Also, backporting is extremely costly over time. I'd much rather
> keep
> : compatibility for longer on our forward releases, than spend our
> : scarce resources moving changes back.
>
> +1
>
> : So to summarize ... I think we should have (keep) a high
> tolerance for
> : cruft to maintain API compatibility. I think our current approach
> : (try hard to keep compatibility during "minor" releases, then
> : deprecate, then remove APIs on a major release; do major releases
> only
> : when truly required) is a good one.
>
> i'm with you for the most part, it's just the defintion of "when truly
> required" that tends to hang people up ... there's a chicken vs egg
> problem of deciding wether the code should drive what the next release
> number is: "i've added a bitch'n feature but it requires adding a
> method
> to an interface, therefor the next release must be called 4.0" ...
> vs the
> mindset that "we just had a 3.0 release, it's too soon for another
> major
> release, the next release should be called 3.1, so we need to hold
> off on
> commiting non backwards compatible changes for a while."
>
> I'm in the first camp: version numbers should be descriptive,
> information
> carrying, labels for releases -- but the version number of a release
> should be dicated by the code contained in that release. (if that
> means
> the next version after 3.0.0 is 4.0.0, then so be it.)
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 23, 2008, 1:40 PM

Post #40 of 70 (14314 views)
Permalink
Re: Back Compatibility [In reply to]

: I guess I don't see the back-porting as an issue. Only those that want to need
: to do the back-porting. Head moves on...

I view it as a potential risk to the overal productivity of the community.

If upgrading from A to B is easy people (in general) won't spend a lot of
time/effort backporting feature from B to A -- this time/effort savings
benefits the community because (depending on the person):
1) that time/effort saved can be spend contributing even more features
to Lucene
2) that time/effort saved improves the impressions people have of Lucene.

If on the other hand upgrading from X to Y is "hard" that encouragees
people to backport features ... in some cases this backporting may be done
"in the open" with people contributing these backports as patches, which
can then be commited/releaseed by developers ... but there is still a
time/effort cost there ... a bigger time/effort cost is the cummulative
time/effort cost of all the people that backport some set of features just
enough to get things working for themselves on their local copy, and don't
contribute thouse changes back ... that cost gets paid by the commuity s a
whole over and over again.

I certianly don't want to discourage anyone who *wants* to backport
features, and I would never suggest that Lucene should make it a policy to
not accept patches to previous releases that backport functionality -- i
just think we should do our best to minimize the need/motivation to spend
time/effort on backporting.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 23, 2008, 1:55 PM

Post #41 of 70 (14268 views)
Permalink
Re: Back Compatibility [In reply to]

I think you are incorrect.

I would guess the number of people/organizations using Lucene vs.
contributing to Lucene is much greater.

The contributers work in head (should IMO). The users can select a
particular version of Lucene and code their apps accordingly. They
can also back-port features from a later to an earlier release. If
they have limited development resources, they are probably not
working on Lucene (they are working on their apps), but they can
update their own code to work with later versions - which they would
probably rather do than learning the internals and contributing to
Lucene.

If the users are "just dropping in a new version" they are not
contributing to the community... I think just the opposite, they are
parasites. I think a way to gauge this would be the number of
questions/people on the user list versus the development list.

Lucene is a library, and I believe what I stated is earlier is true -
in order to continue to advance it the API needs to be permitted to
change to allow for better functionality and performance. If Lucene
is hand-tied by earlier APIs then this work is either not going to
happen, or be very messy (inefficient).

On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:

>
> : I guess I don't see the back-porting as an issue. Only those that
> want to need
> : to do the back-porting. Head moves on...
>
> I view it as a potential risk to the overal productivity of the
> community.
>
> If upgrading from A to B is easy people (in general) won't spend a
> lot of
> time/effort backporting feature from B to A -- this time/effort
> savings
> benefits the community because (depending on the person):
> 1) that time/effort saved can be spend contributing even more
> features
> to Lucene
> 2) that time/effort saved improves the impressions people have of
> Lucene.
>
> If on the other hand upgrading from X to Y is "hard" that encouragees
> people to backport features ... in some cases this backporting may
> be done
> "in the open" with people contributing these backports as patches,
> which
> can then be commited/releaseed by developers ... but there is still a
> time/effort cost there ... a bigger time/effort cost is the
> cummulative
> time/effort cost of all the people that backport some set of
> features just
> enough to get things working for themselves on their local copy,
> and don't
> contribute thouse changes back ... that cost gets paid by the
> commuity s a
> whole over and over again.
>
> I certianly don't want to discourage anyone who *wants* to backport
> features, and I would never suggest that Lucene should make it a
> policy to
> not accept patches to previous releases that backport functionality
> -- i
> just think we should do our best to minimize the need/motivation to
> spend
> time/effort on backporting.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 23, 2008, 2:03 PM

Post #42 of 70 (14261 views)
Permalink
Re: Back Compatibility [In reply to]

robert engels wrote:

> Thanks.
>
> So all writers still need to get the write lock, before opening the
> reader in order to maintain transactional consistency.

I don't understand what you mean by "before opening the reader"? A
writer acquires the write.lock before opening. Readers do not,
unless/until they do their first write operation (deleteDocument/
setNorm).

> Was there performance testing done on the lockless commits with
> heavy contention? I would think that reading the directory to find
> the latest segments file would be slower. Is there a 'latest
> segments' file to avoid this? If not, there probably should be. As
> long as the data fits in a single disk block (which is should), I
> don't think you will have a consistency problem.

Performance tests were done (see LUCENE-710).

And, yes, there is a file segments.gen that records the latest
segment, but it is used along with the directory listing to find the
current segments file.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 23, 2008, 2:16 PM

Post #43 of 70 (14281 views)
Permalink
Re: Back Compatibility [In reply to]

chris Hostetter wrote:

>
> : I do like the idea of a static/system property to match legacy
> : behavior. For example, the bugs around how StandardTokenizer
> : mislabels tokens (eg LUCENE-1100), this would be the perfect
> solution.
> : Clearly those are silly bugs that should be fixed, quickly, with
> this
> : back-compatible mode to keep the bug in place.
> :
> : We might want to, instead, have ctors for many classes take a
> required
> : arg which states the version of Lucene you are using? So if you are
> : writing a new app you would pass in the current version. Then, on
> : dropping in a future Lucene JAR, we could use that arg to enforce
> the
> : right backwards compatibility. This would save users from having to
> : realize they are hitting one of these situations and then know to go
> : set the right static/property to retain the buggy behavior.
>
> I'm not sure that this would be better though ... when i write my
> code, i
> pass "2.3" to all these constructors (or factory methods) and then
> later i
> want to upgrade to 2.3 to get all the new performance goodness ... i
> shouldn't have to change all those constructor calls to get all the
> 2.4
> goodness, i should be able to leave my code as is -- but if i do that,
> then i might not get all the 2.4 goodness, (like improved
> tokenization, or more precise segment merging) because some of that
> goodness violates previous assumptions that some code might have
> had ...
> my code doesn't have those assumptions, i know nothing about them,
> i'll
> take whatever behavior the Lucene Developers recommend (unless i see
> evidence that it breaks something, in which case i'll happily set a
> system property or something that the release notes say will force the
> old behavior.
>
> The basic principle being: by default, give users the behavior that is
> generally viewed as "correct" -- but give them the option to force
> "uncorrect" legacy behavior.

OK, I agree: the vast majority of users upgrading would in fact want
all of the changes in the new release. And then the rare user who is
affected by that bug fix to StandardTokenizer would have to set the
compatibility mode. So it makes sense for you to get all changes on
upgrading (and NOT specify the legacy version in all ctors).

> : Also, backporting is extremely costly over time. I'd much rather
> keep
> : compatibility for longer on our forward releases, than spend our
> : scarce resources moving changes back.
>
> +1
>
> : So to summarize ... I think we should have (keep) a high
> tolerance for
> : cruft to maintain API compatibility. I think our current approach
> : (try hard to keep compatibility during "minor" releases, then
> : deprecate, then remove APIs on a major release; do major releases
> only
> : when truly required) is a good one.
>
> i'm with you for the most part, it's just the defintion of "when truly
> required" that tends to hang people up ... there's a chicken vs egg
> problem of deciding wether the code should drive what the next release
> number is: "i've added a bitch'n feature but it requires adding a
> method
> to an interface, therefor the next release must be called 4.0" ...
> vs the
> mindset that "we just had a 3.0 release, it's too soon for another
> major
> release, the next release should be called 3.1, so we need to hold
> off on
> commiting non backwards compatible changes for a while."
>
> I'm in the first camp: version numbers should be descriptive,
> information
> carrying, labels for releases -- but the version number of a release
> should be dicated by the code contained in that release. (if that
> means
> the next version after 3.0.0 is 4.0.0, then so be it.)

Well, I am weary of doing major releases too often. Though I do
agree that the version number should be a "fastmatch" for reading
through CHANGES.txt.

Say we do this, and zoom forward 2 years when we're up to 6.0, then
poor users stuck on 1.9 will dread upgrading, but probably shouldn't.

One of the amazing things about Lucene, to me, is how many really
major changes we have been able to make while not in fact breaking
backwards compatibility (too much). Being very careful not to make
things public, intentionally not committing to things like exactly
when does a flush or commit or merge actually happen, marking new
APIs as experimental and freely subject to change, using abstract
classes not interfaces, are all wonderful tools that Lucene employs
(and should continue to do so), to enable sizable changes in the
future while keeping backwards compatibility.

Allowing for future backwards compatibility is one of the most
important things we all do when we make changes to Lucene!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 23, 2008, 2:27 PM

Post #44 of 70 (14268 views)
Permalink
Re: Back Compatibility [In reply to]

robert engels wrote:

> I think you are incorrect.
>
> I would guess the number of people/organizations using Lucene vs.
> contributing to Lucene is much greater.
>
> The contributers work in head (should IMO). The users can select a
> particular version of Lucene and code their apps accordingly. They
> can also back-port features from a later to an earlier release. If
> they have limited development resources, they are probably not
> working on Lucene (they are working on their apps), but they can
> update their own code to work with later versions - which they
> would probably rather do than learning the internals and
> contributing to Lucene.
>
> If the users are "just dropping in a new version" they are not
> contributing to the community... I think just the opposite, they
> are parasites. I think a way to gauge this would be the number of
> questions/people on the user list versus the development list.

I don't think they are parasites at all. They are users that place
alot of trust in us and will come to the users list with interesting
issues. Many of the improvements to Lucene are sourced from the
users list. Even if that user doesn't do the actual work to fix the
issue, their innocent question and prodding can inspire someone else
to take the idea forward, make a patch, etc. This is the normal and
healthy way that open source works....

> Lucene is a library, and I believe what I stated is earlier is true
> - in order to continue to advance it the API needs to be permitted
> to change to allow for better functionality and performance. If
> Lucene is hand-tied by earlier APIs then this work is either not
> going to happen, or be very messy (inefficient).

The thing is, we have been able to advance lately, sizably, without
breaking APIs, thanks to the "future backwards compatibility
proofing" that Lucene does.

I do agree that if it got to the point where we were forced to make a
hard choice of stunt Lucene's growth so as to keep backwards
compatibility vs let Lucene grow and make a new major release, we
should definitely make a new major release. Search is still young
and if we stunt Lucene now it will slowly die.

It's just that I haven't seen any recent change, except for allowing
JVM 1.5 source, that actually requires a major release, I think.

Mike

> On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:
>
>>
>> : I guess I don't see the back-porting as an issue. Only those
>> that want to need
>> : to do the back-porting. Head moves on...
>>
>> I view it as a potential risk to the overal productivity of the
>> community.
>>
>> If upgrading from A to B is easy people (in general) won't spend a
>> lot of
>> time/effort backporting feature from B to A -- this time/effort
>> savings
>> benefits the community because (depending on the person):
>> 1) that time/effort saved can be spend contributing even more
>> features
>> to Lucene
>> 2) that time/effort saved improves the impressions people have of
>> Lucene.
>>
>> If on the other hand upgrading from X to Y is "hard" that encouragees
>> people to backport features ... in some cases this backporting may
>> be done
>> "in the open" with people contributing these backports as patches,
>> which
>> can then be commited/releaseed by developers ... but there is still a
>> time/effort cost there ... a bigger time/effort cost is the
>> cummulative
>> time/effort cost of all the people that backport some set of
>> features just
>> enough to get things working for themselves on their local copy,
>> and don't
>> contribute thouse changes back ... that cost gets paid by the
>> commuity s a
>> whole over and over again.
>>
>> I certianly don't want to discourage anyone who *wants* to backport
>> features, and I would never suggest that Lucene should make it a
>> policy to
>> not accept patches to previous releases that backport
>> functionality -- i
>> just think we should do our best to minimize the need/motivation
>> to spend
>> time/effort on backporting.
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


sarowe at syr

Jan 23, 2008, 2:29 PM

Post #45 of 70 (14286 views)
Permalink
RE: Back Compatibility [In reply to]

Hi robert,

On 01/23/2008 at 4:55 PM, robert engels wrote:
> If the users are "just dropping in a new version" they are not
> contributing to the community... I think just the opposite, they are
> parasites.

I reject your characterization of passive users as "parasites"; I suspect that you intend your casual use of this highly prejudicial term to license wholesale abandonment of them as a valid constituency.

In my estimation, nearly every active contributor to open source projects, including Lucene, was once a passive user. If you discourage that pipeline, you cut off the supply of fresh perspectives and future contributions. Please, let's not do that.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 23, 2008, 2:31 PM

Post #46 of 70 (14270 views)
Permalink
Re: Back Compatibility [In reply to]

You must get the write lock before opening the reader if you want
transactional consistency and are performing updates.

No other way to do it.

Otherwise.

A opens reader.
B opens reader.
A performs query decides an update is needed based on results
B performs query decides an update is needed based on results
B gets write lock
B updates
B releases
A gets write lock
A performs update - ERROR. A is performing an update based on stale data

If A & B want to update an index, it must work as:

A gets lock
A opens reader
A updates
A releases lock
B get lcoks
B opens reader
B updates
B releases lock

The only way you can avoid this is if system can determine that B's
query results in the first case would not change based on A's updates.

On Jan 23, 2008, at 4:03 PM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> Thanks.
>>
>> So all writers still need to get the write lock, before opening
>> the reader in order to maintain transactional consistency.
>
> I don't understand what you mean by "before opening the reader"? A
> writer acquires the write.lock before opening. Readers do not,
> unless/until they do their first write operation (deleteDocument/
> setNorm).
>
>> Was there performance testing done on the lockless commits with
>> heavy contention? I would think that reading the directory to find
>> the latest segments file would be slower. Is there a 'latest
>> segments' file to avoid this? If not, there probably should be. As
>> long as the data fits in a single disk block (which is should), I
>> don't think you will have a consistency problem.
>
> Performance tests were done (see LUCENE-710).
>
> And, yes, there is a file segments.gen that records the latest
> segment, but it is used along with the directory listing to find
> the current segments file.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 23, 2008, 2:36 PM

Post #47 of 70 (14288 views)
Permalink
Re: Back Compatibility [In reply to]

I don't think I can say that this needs to happen now either. :)

An interesting question to answer would be:

If Lucene did not exist, and given all of the knowledge we have, we
decided to create a Java based search engine, would the API look like
it does today?

The answer may be yes. I doubt it would be in many areas though.

The major releases are where you get to rethink the API and the
approach.

If you don't do this, Lucene will slowly die (as you stated). What
happens is that the developers get tired of the harness and start a
new project. If the API were able to be changed easier this would
not happen.


On Jan 23, 2008, at 4:27 PM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> I think you are incorrect.
>>
>> I would guess the number of people/organizations using Lucene vs.
>> contributing to Lucene is much greater.
>>
>> The contributers work in head (should IMO). The users can select a
>> particular version of Lucene and code their apps accordingly. They
>> can also back-port features from a later to an earlier release. If
>> they have limited development resources, they are probably not
>> working on Lucene (they are working on their apps), but they can
>> update their own code to work with later versions - which they
>> would probably rather do than learning the internals and
>> contributing to Lucene.
>>
>> If the users are "just dropping in a new version" they are not
>> contributing to the community... I think just the opposite, they
>> are parasites. I think a way to gauge this would be the number of
>> questions/people on the user list versus the development list.
>
> I don't think they are parasites at all. They are users that place
> alot of trust in us and will come to the users list with
> interesting issues. Many of the improvements to Lucene are sourced
> from the users list. Even if that user doesn't do the actual work
> to fix the issue, their innocent question and prodding can inspire
> someone else to take the idea forward, make a patch, etc. This is
> the normal and healthy way that open source works....
>
>> Lucene is a library, and I believe what I stated is earlier is
>> true - in order to continue to advance it the API needs to be
>> permitted to change to allow for better functionality and
>> performance. If Lucene is hand-tied by earlier APIs then this work
>> is either not going to happen, or be very messy (inefficient).
>
> The thing is, we have been able to advance lately, sizably, without
> breaking APIs, thanks to the "future backwards compatibility
> proofing" that Lucene does.
>
> I do agree that if it got to the point where we were forced to make
> a hard choice of stunt Lucene's growth so as to keep backwards
> compatibility vs let Lucene grow and make a new major release, we
> should definitely make a new major release. Search is still young
> and if we stunt Lucene now it will slowly die.
>
> It's just that I haven't seen any recent change, except for
> allowing JVM 1.5 source, that actually requires a major release, I
> think.
>
> Mike
>
>> On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:
>>
>>>
>>> : I guess I don't see the back-porting as an issue. Only those
>>> that want to need
>>> : to do the back-porting. Head moves on...
>>>
>>> I view it as a potential risk to the overal productivity of the
>>> community.
>>>
>>> If upgrading from A to B is easy people (in general) won't spend
>>> a lot of
>>> time/effort backporting feature from B to A -- this time/effort
>>> savings
>>> benefits the community because (depending on the person):
>>> 1) that time/effort saved can be spend contributing even more
>>> features
>>> to Lucene
>>> 2) that time/effort saved improves the impressions people have
>>> of Lucene.
>>>
>>> If on the other hand upgrading from X to Y is "hard" that
>>> encouragees
>>> people to backport features ... in some cases this backporting
>>> may be done
>>> "in the open" with people contributing these backports as
>>> patches, which
>>> can then be commited/releaseed by developers ... but there is
>>> still a
>>> time/effort cost there ... a bigger time/effort cost is the
>>> cummulative
>>> time/effort cost of all the people that backport some set of
>>> features just
>>> enough to get things working for themselves on their local copy,
>>> and don't
>>> contribute thouse changes back ... that cost gets paid by the
>>> commuity s a
>>> whole over and over again.
>>>
>>> I certianly don't want to discourage anyone who *wants* to backport
>>> features, and I would never suggest that Lucene should make it a
>>> policy to
>>> not accept patches to previous releases that backport
>>> functionality -- i
>>> just think we should do our best to minimize the need/motivation
>>> to spend
>>> time/effort on backporting.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 23, 2008, 2:41 PM

Post #48 of 70 (14286 views)
Permalink
Re: Back Compatibility [In reply to]

The statement upon rereading seems much stronger than intended. You
are correct, but I think the number of users that become contributers
is still far less than the number of users.

The only abandonment of the users was from the standpoint of
maintaining a legacy API. The users are free to update their code to
move with Lucene. They are the ones choosing to stay behind.

Even though I have contributed very little to Lucene, I still fight
for the developers ability to move it forward - since I do contribute
so little !!!!. It is up to me to update my code, or stay where I am
at. Now, if Lucene created a release every week that completely
changed the API and broke everything I wrote, while the old release
still had numerous serious bugs, I would quickly grow frustrated and
find a new library. That is not the case, and I don't think anyone
(especially me) is arguing for that.

On Jan 23, 2008, at 4:29 PM, Steven A Rowe wrote:

> Hi robert,
>
> On 01/23/2008 at 4:55 PM, robert engels wrote:
>> If the users are "just dropping in a new version" they are not
>> contributing to the community... I think just the opposite, they are
>> parasites.
>
> I reject your characterization of passive users as "parasites"; I
> suspect that you intend your casual use of this highly prejudicial
> term to license wholesale abandonment of them as a valid constituency.
>
> In my estimation, nearly every active contributor to open source
> projects, including Lucene, was once a passive user. If you
> discourage that pipeline, you cut off the supply of fresh
> perspectives and future contributions. Please, let's not do that.
>
> Steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Jan 23, 2008, 2:58 PM

Post #49 of 70 (14268 views)
Permalink
Re: Back Compatibility [In reply to]

Right.

But, that can, and should, be done outside of the Lucene core.

Mike

robert engels wrote:

> You must get the write lock before opening the reader if you want
> transactional consistency and are performing updates.
>
> No other way to do it.
>
> Otherwise.
>
> A opens reader.
> B opens reader.
> A performs query decides an update is needed based on results
> B performs query decides an update is needed based on results
> B gets write lock
> B updates
> B releases
> A gets write lock
> A performs update - ERROR. A is performing an update based on stale
> data
>
> If A & B want to update an index, it must work as:
>
> A gets lock
> A opens reader
> A updates
> A releases lock
> B get lcoks
> B opens reader
> B updates
> B releases lock
>
> The only way you can avoid this is if system can determine that B's
> query results in the first case would not change based on A's updates.
>
> On Jan 23, 2008, at 4:03 PM, Michael McCandless wrote:
>
>>
>> robert engels wrote:
>>
>>> Thanks.
>>>
>>> So all writers still need to get the write lock, before opening
>>> the reader in order to maintain transactional consistency.
>>
>> I don't understand what you mean by "before opening the reader"?
>> A writer acquires the write.lock before opening. Readers do not,
>> unless/until they do their first write operation (deleteDocument/
>> setNorm).
>>
>>> Was there performance testing done on the lockless commits with
>>> heavy contention? I would think that reading the directory to
>>> find the latest segments file would be slower. Is there a 'latest
>>> segments' file to avoid this? If not, there probably should be.
>>> As long as the data fits in a single disk block (which is
>>> should), I don't think you will have a consistency problem.
>>
>> Performance tests were done (see LUCENE-710).
>>
>> And, yes, there is a file segments.gen that records the latest
>> segment, but it is used along with the directory listing to find
>> the current segments file.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


dmsmith555 at gmail

Jan 23, 2008, 3:24 PM

Post #50 of 70 (14295 views)
Permalink
Re: Back Compatibility [In reply to]

Top posting because this is a response to the thread as a whole.

It appears that this thread has identified some different reasons for
"needing" to break compatibility:
1) A current behavior is now deemed bad or wrong. Examples: the silent
truncation of large documents or an analyzer that works incorrectly.
2) Performance tuning such as seen in Token, allowing reuse.
3) Support of a new language feature, e.g. generics, that make the
code "better".
4) A new feature requires a change to the existing API.

Perhaps there were others? Maybe specifics are in Jira.

It seems to me that the Lucene developers have done an excellent job
at figuring out how to maintain compatibility. This is a testament to
how well grounded the design of the API actually is, from early on and
even now. And changes seem to be well thought out, well designed and
carefully implemented.

I think that when it really gets down to it, the Lucene API will stay
very stable because of this.

On a side note, the cLucene project seems to be languishing (still
trying to get to 2.0) and any stability of the API is a good thing for
it. And perhaps for the other "ports" as well.

Again many thanks for all your hard work,
DM Smith, a thankful "parasite" :)

On Jan 23, 2008, at 5:16 PM, Michael McCandless wrote:

>
> chris Hostetter wrote:
>
>>
>> : I do like the idea of a static/system property to match legacy
>> : behavior. For example, the bugs around how StandardTokenizer
>> : mislabels tokens (eg LUCENE-1100), this would be the perfect
>> solution.
>> : Clearly those are silly bugs that should be fixed, quickly, with
>> this
>> : back-compatible mode to keep the bug in place.
>> :
>> : We might want to, instead, have ctors for many classes take a
>> required
>> : arg which states the version of Lucene you are using? So if you
>> are
>> : writing a new app you would pass in the current version. Then, on
>> : dropping in a future Lucene JAR, we could use that arg to enforce
>> the
>> : right backwards compatibility. This would save users from having
>> to
>> : realize they are hitting one of these situations and then know to
>> go
>> : set the right static/property to retain the buggy behavior.
>>
>> I'm not sure that this would be better though ... when i write my
>> code, i
>> pass "2.3" to all these constructors (or factory methods) and then
>> later i
>> want to upgrade to 2.3 to get all the new performance goodness ... i
>> shouldn't have to change all those constructor calls to get all the
>> 2.4
>> goodness, i should be able to leave my code as is -- but if i do
>> that,
>> then i might not get all the 2.4 goodness, (like improved
>> tokenization, or more precise segment merging) because some of that
>> goodness violates previous assumptions that some code might have
>> had ...
>> my code doesn't have those assumptions, i know nothing about them,
>> i'll
>> take whatever behavior the Lucene Developers recommend (unless i see
>> evidence that it breaks something, in which case i'll happily set a
>> system property or something that the release notes say will force
>> the
>> old behavior.
>>
>> The basic principle being: by default, give users the behavior that
>> is
>> generally viewed as "correct" -- but give them the option to force
>> "uncorrect" legacy behavior.
>
> OK, I agree: the vast majority of users upgrading would in fact want
> all of the changes in the new release. And then the rare user who
> is affected by that bug fix to StandardTokenizer would have to set
> the compatibility mode. So it makes sense for you to get all
> changes on upgrading (and NOT specify the legacy version in all
> ctors).
>
>> : Also, backporting is extremely costly over time. I'd much rather
>> keep
>> : compatibility for longer on our forward releases, than spend our
>> : scarce resources moving changes back.
>>
>> +1
>>
>> : So to summarize ... I think we should have (keep) a high
>> tolerance for
>> : cruft to maintain API compatibility. I think our current approach
>> : (try hard to keep compatibility during "minor" releases, then
>> : deprecate, then remove APIs on a major release; do major releases
>> only
>> : when truly required) is a good one.
>>
>> i'm with you for the most part, it's just the defintion of "when
>> truly
>> required" that tends to hang people up ... there's a chicken vs egg
>> problem of deciding wether the code should drive what the next
>> release
>> number is: "i've added a bitch'n feature but it requires adding a
>> method
>> to an interface, therefor the next release must be called 4.0" ...
>> vs the
>> mindset that "we just had a 3.0 release, it's too soon for another
>> major
>> release, the next release should be called 3.1, so we need to hold
>> off on
>> commiting non backwards compatible changes for a while."
>>
>> I'm in the first camp: version numbers should be descriptive,
>> information
>> carrying, labels for releases -- but the version number of a release
>> should be dicated by the code contained in that release. (if that
>> means
>> the next version after 3.0.0 is 4.0.0, then so be it.)
>
> Well, I am weary of doing major releases too often. Though I do
> agree that the version number should be a "fastmatch" for reading
> through CHANGES.txt.
>
> Say we do this, and zoom forward 2 years when we're up to 6.0, then
> poor users stuck on 1.9 will dread upgrading, but probably shouldn't.
>
> One of the amazing things about Lucene, to me, is how many really
> major changes we have been able to make while not in fact breaking
> backwards compatibility (too much). Being very careful not to make
> things public, intentionally not committing to things like exactly
> when does a flush or commit or merge actually happen, marking new
> APIs as experimental and freely subject to change, using abstract
> classes not interfaces, are all wonderful tools that Lucene employs
> (and should continue to do so), to enable sizable changes in the
> future while keeping backwards compatibility.
>
> Allowing for future backwards compatibility is one of the most
> important things we all do when we make changes to Lucene!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

First page Previous page 1 2 3 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.