Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Why release 3.0?

 

 

First page Previous page 1 2 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


erickerickson at gmail

Nov 16, 2009, 10:10 AM

Post #1 of 50 (743 views)
Permalink
Why release 3.0?

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions
are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to
be any 3.0 release intended for *production*?". And if not, would we save a
lot of work
by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1
as the first *supported* 3.x release?

Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a
good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on
cleaning up my code does seem worthwhile, if I have the spare time. And
having a base
3.0 version that's not changing all over the place would be useful for that.

That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.

Apologies if this has already been discussed, but I don't remember it.
Although my memory
isn't what it used to be (but some would claim it never was<G>)...

Erick


jake.mannix at gmail

Nov 16, 2009, 10:15 AM

Post #2 of 50 (711 views)
Permalink
Re: Why release 3.0? [In reply to]

Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
read your
2.4 index file formats? I suppose if you've already upgraded to 2.9, then
all is well because
2.9 is the same format as 3.0, but we can't assume all users upgraded from
2.4 to 2.9.

If you've done that already, then 3.0 might not be necessary, but if you're
on 2.4 right now,
you will be in for a bad surprise if you try to upgrade to 3.1.

-jake

On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <erickerickson [at] gmail>wrote:

> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
> as the first *supported* 3.x release?
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
> 3.0 version that's not changing all over the place would be useful for
> that.
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
> isn't what it used to be (but some would claim it never was<G>)...
>
> Erick
>
>
>


uwe at thetaphi

Nov 16, 2009, 11:03 AM

Post #3 of 50 (704 views)
Permalink
RE: Why release 3.0? [In reply to]

Hi Erick,



3.0 is *not* unsupported or beta release, it is the cleaned up 2.9.1
release. You are right, it is not needed for 2.9.1 users to upgrade (but
they can), but for new users starting with Lucene, the recommendadion is to
use it and not 2.9.

3.0 also contains some cleanups needed for 3.1, as the compressed fields are
no longer supported, so they must be uncompressed, which is done during
optimizing/merging in 3.0. Later versions will remove support for older
index types, but you should really update your indexes, especially because
flex indexing will possibly remove more support for older indexes (as it
gets more complex to maintain all the different file formats).



So 3.0 is recommended for users starting new Java 5 projects and want a
clean API. People needing backwards compatibility can use 2.9.1, but support
for that version will be cancelled in future and bugfixes will only go into
3.x.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Erick Erickson [mailto:erickerickson [at] gmail]
Sent: Monday, November 16, 2009 7:10 PM
To: java-dev [at] lucene
Subject: Why release 3.0?



One of my "specialties" is asking obvious questions just to see if
everyone's assumptions

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1

as the first *supported* 3.x release?



Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.



That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.



Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...



Erick


uwe at thetaphi

Nov 16, 2009, 11:05 AM

Post #4 of 50 (703 views)
Permalink
RE: Why release 3.0? [In reply to]

2.9 has *not* the same format as 3.0, an index created with 3.0 cannot be
read with 2.9. This is because compressed field support was removed and
therefore the version number of the stored fields file was upgraded. But
indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
3.0 Indexes can be read until version 4.9.



Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Jake Mannix [mailto:jake.mannix [at] gmail]
Sent: Monday, November 16, 2009 7:15 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
read your
2.4 index file formats? I suppose if you've already upgraded to 2.9, then
all is well because
2.9 is the same format as 3.0, but we can't assume all users upgraded from
2.4 to 2.9.

If you've done that already, then 3.0 might not be necessary, but if you're
on 2.4 right now,
you will be in for a bad surprise if you try to upgrade to 3.1.

-jake

On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <erickerickson [at] gmail>
wrote:

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1

as the first *supported* 3.x release?



Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.



That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.



Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...



Erick


jake.mannix at gmail

Nov 16, 2009, 11:08 AM

Post #5 of 50 (703 views)
Permalink
Re: Why release 3.0? [In reply to]

Yeah, sorry, I just meant that 3.0 can read 2.9 index format, but 3.1 will
not necessarily have that capability (this is the whole point of the
difference between 2.9 and 3.0, in my understanding).

On Mon, Nov 16, 2009 at 11:05 AM, Uwe Schindler <uwe [at] thetaphi> wrote:

> 2.9 has **not** the same format as 3.0, an index created with 3.0 cannot
> be read with 2.9. This is because compressed field support was removed and
> therefore the version number of the stored fields file was upgraded. But
> indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
> 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Jake Mannix [mailto:jake.mannix [at] gmail]
> *Sent:* Monday, November 16, 2009 7:15 PM
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
> read your
> 2.4 index file formats? I suppose if you've already upgraded to 2.9, then
> all is well because
> 2.9 is the same format as 3.0, but we can't assume all users upgraded from
> 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary, but if you're
> on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <erickerickson [at] gmail>
> wrote:
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>
>
>


rcmuir at gmail

Nov 16, 2009, 11:08 AM

Post #6 of 50 (703 views)
Permalink
Re: Why release 3.0? [In reply to]

uwe, on topic please read my comment on LUCENE-1689, because unicode version
was bumped in jdk 1.5, i believe this index backwards compatibility is only
theoretical

On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> 2.9 has **not** the same format as 3.0, an index created with 3.0 cannot
> be read with 2.9. This is because compressed field support was removed and
> therefore the version number of the stored fields file was upgraded. But
> indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
> 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Jake Mannix [mailto:jake.mannix [at] gmail]
> *Sent:* Monday, November 16, 2009 7:15 PM
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
> read your
> 2.4 index file formats? I suppose if you've already upgraded to 2.9, then
> all is well because
> 2.9 is the same format as 3.0, but we can't assume all users upgraded from
> 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary, but if you're
> on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <erickerickson [at] gmail>
> wrote:
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>
>
>



--
Robert Muir
rcmuir [at] gmail


uwe at thetaphi

Nov 16, 2009, 11:12 AM

Post #7 of 50 (704 views)
Permalink
RE: Why release 3.0? [In reply to]

But an UTF-8 stream from Java 4 can still be read with Java 5, what is the
problem? Java 5 extended Unicode support, but an index created with older
versions can still be read. UTF-8 is standardized.



-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:09 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



uwe, on topic please read my comment on LUCENE-1689, because unicode version
was bumped in jdk 1.5, i believe this index backwards compatibility is only
theoretical

On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

2.9 has *not* the same format as 3.0, an index created with 3.0 cannot be
read with 2.9. This is because compressed field support was removed and
therefore the version number of the stored fields file was upgraded. But
indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
3.0 Indexes can be read until version 4.9.



Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Jake Mannix [mailto:jake.mannix [at] gmail]
Sent: Monday, November 16, 2009 7:15 PM


To: java-dev [at] lucene

Subject: Re: Why release 3.0?



Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
read your
2.4 index file formats? I suppose if you've already upgraded to 2.9, then
all is well because
2.9 is the same format as 3.0, but we can't assume all users upgraded from
2.4 to 2.9.

If you've done that already, then 3.0 might not be necessary, but if you're
on 2.4 right now,
you will be in for a bad surprise if you try to upgrade to 3.1.

-jake

On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <erickerickson [at] gmail>
wrote:

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1

as the first *supported* 3.x release?



Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.



That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.



Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...



Erick










--
Robert Muir
rcmuir [at] gmail


rcmuir at gmail

Nov 16, 2009, 11:15 AM

Post #8 of 50 (703 views)
Permalink
Re: Why release 3.0? [In reply to]

the problem is that the properties have changed for various characters, and
new characters were added.

it really has nothing to do with lucene, but the idea you can go from jdk
1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.

On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> But an UTF-8 stream from Java 4 can still be read with Java 5, what is
> the problem? Java 5 extended Unicode support, but an index created with
> older versions can still be read. UTF-8 is standardized…
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir [at] gmail]
> *Sent:* Monday, November 16, 2009 8:09 PM
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> uwe, on topic please read my comment on LUCENE-1689, because unicode
> version was bumped in jdk 1.5, i believe this index backwards compatibility
> is only theoretical
>
> On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
> 2.9 has **not** the same format as 3.0, an index created with 3.0 cannot
> be read with 2.9. This is because compressed field support was removed and
> therefore the version number of the stored fields file was upgraded. But
> indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
> 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Jake Mannix [mailto:jake.mannix [at] gmail]
> *Sent:* Monday, November 16, 2009 7:15 PM
>
>
> *To:* java-dev [at] lucene
>
> *Subject:* Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
> read your
> 2.4 index file formats? I suppose if you've already upgraded to 2.9, then
> all is well because
> 2.9 is the same format as 3.0, but we can't assume all users upgraded from
> 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary, but if you're
> on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <erickerickson [at] gmail>
> wrote:
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>
>
>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>



--
Robert Muir
rcmuir [at] gmail


sarowe at syr

Nov 16, 2009, 11:33 AM

Post #9 of 50 (703 views)
Permalink
RE: Why release 3.0? [In reply to]

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they upgrade Lucene. I'd guess with few exceptions that most people have been using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on most Lucene users, assuming that most use Latin-1 exclusively; although I haven't looked, I'd be surprised if Latin-1 characters changed much, if at all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes, since the minimum required Java version, and so also the supported Unicode version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
>
> But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized…
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Robert Muir [mailto:rcmuir [at] gmail]
> Sent: Monday, November 16, 2009 8:09 PM
>
> To: java-dev [at] lucene
> Subject: Re: Why release 3.0?
>
>
>
> uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
> On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
> 2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Jake Mannix [mailto:jake.mannix [at] gmail]
> Sent: Monday, November 16, 2009 7:15 PM
>
>
> To: java-dev [at] lucene
>
> Subject: Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
> 2.4 index file formats? I suppose if you've already upgraded to
> 2.9, then all is well because
> 2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <erickerickson [at] gmail> wrote:
>
> One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
> That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
> Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
> Erick


markrmiller at gmail

Nov 16, 2009, 11:36 AM

Post #10 of 50 (703 views)
Permalink
Re: Why release 3.0? [In reply to]

X.n must be able to read (X-1).n - so 3.1 will be able to read 2.9 -
major versions are also for removing deprecations.

Jake Mannix wrote:
> Yeah, sorry, I just meant that 3.0 can read 2.9 index format, but 3.1
> will not necessarily have that capability (this is the whole point of
> the difference between 2.9 and 3.0, in my understanding).
>
> On Mon, Nov 16, 2009 at 11:05 AM, Uwe Schindler <uwe [at] thetaphi
> <mailto:uwe [at] thetaphi>> wrote:
>
> 2.9 has **not** the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support
> was removed and therefore the version number of the stored fields
> file was upgraded. But indexes from 2.9 can be read with 3.0 and
> support may get removed in 4.0. 3.0 Indexes can be read until
> version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi <mailto:uwe [at] thetaphi>
>
> ------------------------------------------------------------------------
>
> *From:* Jake Mannix [mailto:jake.mannix [at] gmail
> <mailto:jake.mannix [at] gmail>]
> *Sent:* Monday, November 16, 2009 7:15 PM
>
> *To:* java-dev [at] lucene <mailto:java-dev [at] lucene>
> *Subject:* Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
> 2.4 index file formats? I suppose if you've already upgraded to
> 2.9, then all is well because
> 2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary, but
> if you're on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <erickerickson [at] gmail <mailto:erickerickson [at] gmail>> wrote:
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to
> ask "Is there going to
>
> be any 3.0 release intended for *production*?". And if not, would
> we save a lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and
> carrying on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release
> to get a head start on
>
> cleaning up my code does seem worthwhile, if I have the spare
> time. And having a base
>
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>
>
> That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember
> it. Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>
>
>
>


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rcmuir at gmail

Nov 16, 2009, 11:37 AM

Post #11 of 50 (706 views)
Permalink
Re: Why release 3.0? [In reply to]

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:

> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene. I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
> >
> >
> > But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Robert Muir [mailto:rcmuir [at] gmail]
> > Sent: Monday, November 16, 2009 8:09 PM
> >
> > To: java-dev [at] lucene
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> > On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
> wrote:
> >
> > 2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Jake Mannix [mailto:jake.mannix [at] gmail]
> > Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> > To: java-dev [at] lucene
> >
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> > 2.4 index file formats? I suppose if you've already upgraded to
> > 2.9, then all is well because
> > 2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> > If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> > you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> > -jake
> >
> > On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <erickerickson [at] gmail> wrote:
> >
> > One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> > Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> > That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> > Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> > Erick
>
>


--
Robert Muir
rcmuir [at] gmail


uwe at thetaphi

Nov 16, 2009, 11:39 AM

Post #12 of 50 (703 views)
Permalink
RE: Why release 3.0? [In reply to]

But most people already use 1.5 or 1.6 even with 2.9. They could also switch
before. The problem is the used JVM not the used Lucene Version. And you can
also run Lucene 1.4.3 with Java 5 -> same problem. If people change their
Java Version, they have to take care what changed.



The only thing: we are forcing people to use Java 5.



-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:16 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



the problem is that the properties have changed for various characters, and
new characters were added.

it really has nothing to do with lucene, but the idea you can go from jdk
1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.

On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

But an UTF-8 stream from Java 4 can still be read with Java 5, what is the
problem? Java 5 extended Unicode support, but an index created with older
versions can still be read. UTF-8 is standardized.



-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:09 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



uwe, on topic please read my comment on LUCENE-1689, because unicode version
was bumped in jdk 1.5, i believe this index backwards compatibility is only
theoretical

On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

2.9 has *not* the same format as 3.0, an index created with 3.0 cannot be
read with 2.9. This is because compressed field support was removed and
therefore the version number of the stored fields file was upgraded. But
indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
3.0 Indexes can be read until version 4.9.



Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Jake Mannix [mailto:jake.mannix [at] gmail]
Sent: Monday, November 16, 2009 7:15 PM


To: java-dev [at] lucene

Subject: Re: Why release 3.0?



Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
read your
2.4 index file formats? I suppose if you've already upgraded to 2.9, then
all is well because
2.9 is the same format as 3.0, but we can't assume all users upgraded from
2.4 to 2.9.

If you've done that already, then 3.0 might not be necessary, but if you're
on 2.4 right now,
you will be in for a bad surprise if you try to upgrade to 3.1.

-jake

On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <erickerickson [at] gmail>
wrote:

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1

as the first *supported* 3.x release?



Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.



That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.



Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...



Erick










--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail


uwe at thetaphi

Nov 16, 2009, 11:42 AM

Post #13 of 50 (703 views)
Permalink
RE: Why release 3.0? [In reply to]

We tried out: Character.getType() for these two chars:



Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7



The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:37 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene. I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve


On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
>
> But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Robert Muir [mailto:rcmuir [at] gmail]
> Sent: Monday, November 16, 2009 8:09 PM
>
> To: java-dev [at] lucene
> Subject: Re: Why release 3.0?
>
>
>
> uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
> On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
wrote:
>
> 2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Jake Mannix [mailto:jake.mannix [at] gmail]
> Sent: Monday, November 16, 2009 7:15 PM
>
>
> To: java-dev [at] lucene
>
> Subject: Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
> 2.4 index file formats? I suppose if you've already upgraded to
> 2.9, then all is well because
> 2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <erickerickson [at] gmail> wrote:
>
> One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
> That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
> Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
> Erick




--
Robert Muir
rcmuir [at] gmail


rcmuir at gmail

Nov 16, 2009, 11:45 AM

Post #14 of 50 (704 views)
Permalink
Re: Why release 3.0? [In reply to]

right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think.

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> We tried out: Character.getType() for these two chars:
>
>
>
> Java 5:
> '\u00AD' = 16
> '\u06DD' = 16
>
> Java 1.4:
> '\u00AD' = 20
> '\u06DD' = 7
>
>
>
> The first is the soft hyphen.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir [at] gmail]
> *Sent:* Monday, November 16, 2009 8:37 PM
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> right, its nothing to do with lucene, instead due to property changes, etc.
>
> i just think we should inform users on java 1.4/2.9 that if they upgrade to
> java 1.5/3.0, they should reindex.
>
> the reason i say this about properties, is there are some that change that
> will affect tokenizers, i give two examples, a hyphen that changes from
> punctuation to format (might affect SolrWordDelimiterFilter),
> and arabic ayah which changes from NSM to format, which surely affects
> ArabicLetterTokenizer.
>
> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:
>
> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene. I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
> >
> >
> > But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Robert Muir [mailto:rcmuir [at] gmail]
> > Sent: Monday, November 16, 2009 8:09 PM
> >
> > To: java-dev [at] lucene
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> > On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
> wrote:
> >
> > 2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Jake Mannix [mailto:jake.mannix [at] gmail]
> > Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> > To: java-dev [at] lucene
> >
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> > 2.4 index file formats? I suppose if you've already upgraded to
> > 2.9, then all is well because
> > 2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> > If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> > you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> > -jake
> >
> > On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <erickerickson [at] gmail> wrote:
> >
> > One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> > Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> > That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> > Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> > Erick
>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>



--
Robert Muir
rcmuir [at] gmail


uwe at thetaphi

Nov 16, 2009, 11:50 AM

Post #15 of 50 (706 views)
Permalink
RE: Why release 3.0? [In reply to]

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.



It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:45 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think.

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

We tried out: Character.getType() for these two chars:



Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7



The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:37 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene. I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve


On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
>
> But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Robert Muir [mailto:rcmuir [at] gmail]
> Sent: Monday, November 16, 2009 8:09 PM
>
> To: java-dev [at] lucene
> Subject: Re: Why release 3.0?
>
>
>
> uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
> On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
wrote:
>
> 2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Jake Mannix [mailto:jake.mannix [at] gmail]
> Sent: Monday, November 16, 2009 7:15 PM
>
>
> To: java-dev [at] lucene
>
> Subject: Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
> 2.4 index file formats? I suppose if you've already upgraded to
> 2.9, then all is well because
> 2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <erickerickson [at] gmail> wrote:
>
> One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
> That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
> Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
> Erick




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail


rcmuir at gmail

Nov 16, 2009, 11:51 AM

Post #16 of 50 (703 views)
Permalink
Re: Why release 3.0? [In reply to]

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
the tokenizer, remember to use JRE 1.4 to run jflex (before
Lucene 3.0). This grammar now uses constructs (eg :digit:,
:letter:) whose meaning can vary according to the JRE used to
run jflex. See
https://issues.apache.org/jira/browse/LUCENE-1126 for details.

On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> But it is a general warning that should be placed in the Wiki: If you
> upgrade from Java 1.4 to Java 5, think about reindexing.
>
>
>
> It has definitely nothing to do with 3.0, because uses could have changed
> (and most of them have) before.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir [at] gmail]
> *Sent:* Monday, November 16, 2009 8:45 PM
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> right, my point is its true its nothing to do with Lucene at all, really.
>
> but the reality is we should clarify this to users I think.
>
> Its especially complex in the current StandardTokenizer, which uses a mix
> of hardcoded ranges and properties, can you tell me if you should reindex
> for given language X?
> I wouldn't want to answer that question right now.
>
> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
> We tried out: Character.getType() for these two chars:
>
>
>
> Java 5:
> '\u00AD' = 16
> '\u06DD' = 16
>
> Java 1.4:
> '\u00AD' = 20
> '\u06DD' = 7
>
>
>
> The first is the soft hyphen.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir [at] gmail]
> *Sent:* Monday, November 16, 2009 8:37 PM
>
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> right, its nothing to do with lucene, instead due to property changes, etc.
>
> i just think we should inform users on java 1.4/2.9 that if they upgrade to
> java 1.5/3.0, they should reindex.
>
> the reason i say this about properties, is there are some that change that
> will affect tokenizers, i give two examples, a hyphen that changes from
> punctuation to format (might affect SolrWordDelimiterFilter),
> and arabic ayah which changes from NSM to format, which surely affects
> ArabicLetterTokenizer.
>
> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:
>
> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene. I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
> >
> >
> > But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Robert Muir [mailto:rcmuir [at] gmail]
> > Sent: Monday, November 16, 2009 8:09 PM
> >
> > To: java-dev [at] lucene
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> > On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
> wrote:
> >
> > 2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Jake Mannix [mailto:jake.mannix [at] gmail]
> > Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> > To: java-dev [at] lucene
> >
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> > 2.4 index file formats? I suppose if you've already upgraded to
> > 2.9, then all is well because
> > 2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> > If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> > you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> > -jake
> >
> > On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <erickerickson [at] gmail> wrote:
> >
> > One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> > Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> > That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> > Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> > Erick
>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>



--
Robert Muir
rcmuir [at] gmail


rcmuir at gmail

Nov 16, 2009, 11:53 AM

Post #17 of 50 (704 views)
Permalink
Re: Why release 3.0? [In reply to]

btw, so heres a great example. you are backwards broken regardless of JVM
for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
in 3.0, right?

On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rcmuir [at] gmail> wrote:

> Uwe, thats probably a good solution I think. just as long as we document
> somewhere,
> I think there is some warning verbage in StandardTokenizer already about
> this.
>
> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
> the tokenizer, remember to use JRE 1.4 to run jflex (before
> Lucene 3.0). This grammar now uses constructs (eg :digit:,
> :letter:) whose meaning can vary according to the JRE used to
> run jflex. See
> https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>
>
> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
>> But it is a general warning that should be placed in the Wiki: If you
>> upgrade from Java 1.4 to Java 5, think about reindexing.
>>
>>
>>
>> It has definitely nothing to do with 3.0, because uses could have changed
>> (and most of them have) before.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe [at] thetaphi
>> ------------------------------
>>
>> *From:* Robert Muir [mailto:rcmuir [at] gmail]
>> *Sent:* Monday, November 16, 2009 8:45 PM
>>
>> *To:* java-dev [at] lucene
>> *Subject:* Re: Why release 3.0?
>>
>>
>>
>> right, my point is its true its nothing to do with Lucene at all, really.
>>
>> but the reality is we should clarify this to users I think.
>>
>> Its especially complex in the current StandardTokenizer, which uses a mix
>> of hardcoded ranges and properties, can you tell me if you should reindex
>> for given language X?
>> I wouldn't want to answer that question right now.
>>
>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>>
>> We tried out: Character.getType() for these two chars:
>>
>>
>>
>> Java 5:
>> '\u00AD' = 16
>> '\u06DD' = 16
>>
>> Java 1.4:
>> '\u00AD' = 20
>> '\u06DD' = 7
>>
>>
>>
>> The first is the soft hyphen.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe [at] thetaphi
>> ------------------------------
>>
>> *From:* Robert Muir [mailto:rcmuir [at] gmail]
>> *Sent:* Monday, November 16, 2009 8:37 PM
>>
>>
>> *To:* java-dev [at] lucene
>> *Subject:* Re: Why release 3.0?
>>
>>
>>
>> right, its nothing to do with lucene, instead due to property changes,
>> etc.
>>
>> i just think we should inform users on java 1.4/2.9 that if they upgrade
>> to java 1.5/3.0, they should reindex.
>>
>> the reason i say this about properties, is there are some that change that
>> will affect tokenizers, i give two examples, a hyphen that changes from
>> punctuation to format (might affect SolrWordDelimiterFilter),
>> and arabic ayah which changes from NSM to format, which surely affects
>> ArabicLetterTokenizer.
>>
>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:
>>
>> Hi Robert,
>>
>> I agree that the Unicode version supported by the JVM, as you say, really
>> has nothing to do with Lucene.
>>
>> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
>> they upgrade Lucene. I'd guess with few exceptions that most people have
>> been using Lucene with 1.5+ for a couple of years now, though.
>>
>> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
>> most Lucene users, assuming that most use Latin-1 exclusively; although I
>> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
>> all, from Unicode 3.0 to 4.0.
>>
>> It would be useful, I think, to include (a pointer to?) a description of
>> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
>> notes, since the minimum required Java version, and so also the supported
>> Unicode version, changes then.
>>
>> Steve
>>
>>
>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>> > the problem is that the properties have changed for various characters,
>> > and new characters were added.
>> >
>> > it really has nothing to do with lucene, but the idea you can go from
>> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>> >
>> >
>> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>> >
>> >
>> > But an UTF-8 stream from Java 4 can still be read with Java 5,
>> > what is the problem? Java 5 extended Unicode support, but an index
>> > created with older versions can still be read. UTF-8 is standardized…
>> >
>> >
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe [at] thetaphi
>> >
>> >
>> > ________________________________
>> >
>> >
>> > From: Robert Muir [mailto:rcmuir [at] gmail]
>> > Sent: Monday, November 16, 2009 8:09 PM
>> >
>> > To: java-dev [at] lucene
>> > Subject: Re: Why release 3.0?
>> >
>> >
>> >
>> > uwe, on topic please read my comment on LUCENE-1689, because
>> > unicode version was bumped in jdk 1.5, i believe this index backwards
>> > compatibility is only theoretical
>> >
>> > On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
>> wrote:
>> >
>> > 2.9 has *not* the same format as 3.0, an index created with 3.0
>> > cannot be read with 2.9. This is because compressed field support was
>> > removed and therefore the version number of the stored fields file was
>> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
>> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
>> >
>> >
>> >
>> > Uwe
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe [at] thetaphi
>> >
>> >
>> > ________________________________
>> >
>> >
>> > From: Jake Mannix [mailto:jake.mannix [at] gmail]
>> > Sent: Monday, November 16, 2009 7:15 PM
>> >
>> >
>> > To: java-dev [at] lucene
>> >
>> > Subject: Re: Why release 3.0?
>> >
>> >
>> >
>> > Don't users need to upgrade to 3.0 because 3.1 won't be
>> > necessarily able to read your
>> > 2.4 index file formats? I suppose if you've already upgraded to
>> > 2.9, then all is well because
>> > 2.9 is the same format as 3.0, but we can't assume all users
>> > upgraded from 2.4 to 2.9.
>> >
>> > If you've done that already, then 3.0 might not be necessary,
>> > but if you're on 2.4 right now,
>> > you will be in for a bad surprise if you try to upgrade to 3.1.
>> >
>> > -jake
>> >
>> > On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>> > <erickerickson [at] gmail> wrote:
>> >
>> > One of my "specialties" is asking obvious questions just to see
>> > if everyone's assumptions are aligned. So with the discussion about
>> > branching 3.0 I have to ask "Is there going to be any 3.0 release
>> > intended for *production*?". And if not, would we save a lot of
>> > work by just not worrying about retrofitting fixes to a 3.0 branch
>> > and carrying on with 3.1 as the first *supported* 3.x release?
>> >
>> > Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
>> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
>> > "beta/snapshot" release to get a head start on cleaning up my code
>> > does seem worthwhile, if I have the spare time. And having a base
>> > 3.0 version that's not changing all over the place would be useful
>> > for that.
>> >
>> > That said, I'm also not terribly comfortable with a "release"
>> > that's out there and unsupported.
>> >
>> > Apologies if this has already been discussed, but I don't
>> > remember it. Although my memory isn't what it used to be (but
>> > some would claim it never was<G>)...
>> >
>> > Erick
>>
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir [at] gmail
>>
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir [at] gmail
>>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>



--
Robert Muir
rcmuir [at] gmail


uwe at thetaphi

Nov 16, 2009, 11:59 AM

Post #18 of 50 (703 views)
Permalink
RE: Why release 3.0? [In reply to]

I think the regenerated code in Standard is since years no longer generated
with 1.4 :-) Most developers use 1.5 or even 1.6. So it already changed
incompatible.



-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:52 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
the tokenizer, remember to use JRE 1.4 to run jflex (before
Lucene 3.0). This grammar now uses constructs (eg :digit:,
:letter:) whose meaning can vary according to the JRE used to
run jflex. See
https://issues.apache.org/jira/browse/LUCENE-1126 for details.

On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.



It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:45 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think.

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

We tried out: Character.getType() for these two chars:



Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7



The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:37 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene. I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve


On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
>
> But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Robert Muir [mailto:rcmuir [at] gmail]
> Sent: Monday, November 16, 2009 8:09 PM
>
> To: java-dev [at] lucene
> Subject: Re: Why release 3.0?
>
>
>
> uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
> On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
wrote:
>
> 2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Jake Mannix [mailto:jake.mannix [at] gmail]
> Sent: Monday, November 16, 2009 7:15 PM
>
>
> To: java-dev [at] lucene
>
> Subject: Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
> 2.4 index file formats? I suppose if you've already upgraded to
> 2.9, then all is well because
> 2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <erickerickson [at] gmail> wrote:
>
> One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
> That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
> Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
> Erick




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail


uwe at thetaphi

Nov 16, 2009, 12:01 PM

Post #19 of 50 (706 views)
Permalink
RE: Why release 3.0? [In reply to]

JFlex was not regenerated as far as I know, but if somebody did, its already
broken.



-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:53 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



btw, so heres a great example. you are backwards broken regardless of JVM
for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
in 3.0, right?

On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rcmuir [at] gmail> wrote:

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
the tokenizer, remember to use JRE 1.4 to run jflex (before
Lucene 3.0). This grammar now uses constructs (eg :digit:,
:letter:) whose meaning can vary according to the JRE used to
run jflex. See
https://issues.apache.org/jira/browse/LUCENE-1126 for details.



On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.



It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:45 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think.

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

We tried out: Character.getType() for these two chars:



Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7



The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:37 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene. I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve


On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
>
> But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Robert Muir [mailto:rcmuir [at] gmail]
> Sent: Monday, November 16, 2009 8:09 PM
>
> To: java-dev [at] lucene
> Subject: Re: Why release 3.0?
>
>
>
> uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
> On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
wrote:
>
> 2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Jake Mannix [mailto:jake.mannix [at] gmail]
> Sent: Monday, November 16, 2009 7:15 PM
>
>
> To: java-dev [at] lucene
>
> Subject: Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
> 2.4 index file formats? I suppose if you've already upgraded to
> 2.9, then all is well because
> 2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <erickerickson [at] gmail> wrote:
>
> One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
> That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
> Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
> Erick




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail


rcmuir at gmail

Nov 16, 2009, 12:05 PM

Post #20 of 50 (703 views)
Permalink
Re: Why release 3.0? [In reply to]

i suppose we are ok then, except for the fact that now StandardTokenizer is
working with a unicode 3.0 definition, instead of the unicode version (4.0)
that corresponds to our required minimum jre (1.5)...

sorry if i raised a stink about nothing, but you see my concerns maybe?

On Mon, Nov 16, 2009 at 3:01 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> JFlex was not regenerated as far as I know, but if somebody did, its
> already broken…
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir [at] gmail]
> *Sent:* Monday, November 16, 2009 8:53 PM
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> btw, so heres a great example. you are backwards broken regardless of JVM
> for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
> in 3.0, right?
>
> On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rcmuir [at] gmail> wrote:
>
> Uwe, thats probably a good solution I think. just as long as we document
> somewhere,
> I think there is some warning verbage in StandardTokenizer already about
> this.
>
> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
> the tokenizer, remember to use JRE 1.4 to run jflex (before
> Lucene 3.0). This grammar now uses constructs (eg :digit:,
> :letter:) whose meaning can vary according to the JRE used to
> run jflex. See
> https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>
>
>
> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
> But it is a general warning that should be placed in the Wiki: If you
> upgrade from Java 1.4 to Java 5, think about reindexing.
>
>
>
> It has definitely nothing to do with 3.0, because uses could have changed
> (and most of them have) before.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir [at] gmail]
> *Sent:* Monday, November 16, 2009 8:45 PM
>
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> right, my point is its true its nothing to do with Lucene at all, really.
>
> but the reality is we should clarify this to users I think.
>
> Its especially complex in the current StandardTokenizer, which uses a mix
> of hardcoded ranges and properties, can you tell me if you should reindex
> for given language X?
> I wouldn't want to answer that question right now.
>
> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
> We tried out: Character.getType() for these two chars:
>
>
>
> Java 5:
> '\u00AD' = 16
> '\u06DD' = 16
>
> Java 1.4:
> '\u00AD' = 20
> '\u06DD' = 7
>
>
>
> The first is the soft hyphen.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir [at] gmail]
> *Sent:* Monday, November 16, 2009 8:37 PM
>
>
> *To:* java-dev [at] lucene
> *Subject:* Re: Why release 3.0?
>
>
>
> right, its nothing to do with lucene, instead due to property changes, etc.
>
> i just think we should inform users on java 1.4/2.9 that if they upgrade to
> java 1.5/3.0, they should reindex.
>
> the reason i say this about properties, is there are some that change that
> will affect tokenizers, i give two examples, a hyphen that changes from
> punctuation to format (might affect SolrWordDelimiterFilter),
> and arabic ayah which changes from NSM to format, which surely affects
> ArabicLetterTokenizer.
>
> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:
>
> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene. I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
> >
> >
> > But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Robert Muir [mailto:rcmuir [at] gmail]
> > Sent: Monday, November 16, 2009 8:09 PM
> >
> > To: java-dev [at] lucene
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> > On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
> wrote:
> >
> > 2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Jake Mannix [mailto:jake.mannix [at] gmail]
> > Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> > To: java-dev [at] lucene
> >
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> > 2.4 index file formats? I suppose if you've already upgraded to
> > 2.9, then all is well because
> > 2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> > If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> > you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> > -jake
> >
> > On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <erickerickson [at] gmail> wrote:
> >
> > One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> > Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> > That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> > Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> > Erick
>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>



--
Robert Muir
rcmuir [at] gmail


erickerickson at gmail

Nov 16, 2009, 12:06 PM

Post #21 of 50 (704 views)
Permalink
Re: Why release 3.0? [In reply to]

On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> Hi Erick,
>
>
>
> 3.0 is **not** unsupported or beta release, it is the cleaned up 2.9.1
> release. You are right, it is not needed for 2.9.1 users to upgrade (but
> they can), but for new users starting with Lucene, the recommendadion is to
> use it and not 2.9.
>
> 3.0 also contains some cleanups needed for 3.1, as the compressed fields
> are no longer supported, so they must be uncompressed, which is done during
> optimizing/merging in 3.0. Later versions will remove support for older
> index types, but you should really update your indexes, especially because
> flex indexing will possibly remove more support for older indexes (as it
> gets more complex to maintain all the different file formats).
>
>
>
> So 3.0 is recommended for users starting new Java 5 projects and want a
> clean API. People needing backwards compatibility can use 2.9.1, but support
> for that version will be cancelled in future and bugfixes will only go into
> 3.x.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Erick Erickson [mailto:erickerickson [at] gmail]
> *Sent:* Monday, November 16, 2009 7:10 PM
> *To:* java-dev [at] lucene
> *Subject:* Why release 3.0?
>
>
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>


erickerickson at gmail

Nov 16, 2009, 12:12 PM

Post #22 of 50 (706 views)
Permalink
Re: Why release 3.0? [In reply to]

Oops, stupid mouse made me send a blank message.

Ok, I withdraw the question since there *are* good reasons to put
3.0 in a prod environment <G>. It's also an easier thing to say "new Lucene
users should start with 3.0" rather than "new Lucene users should
start with 3.1. Use 3.0 until we release 3.1 but be aware we're not going to
support 3.0...." Yuuuuccckkkk....

Erick

On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> Hi Erick,
>
>
>
> 3.0 is **not** unsupported or beta release, it is the cleaned up 2.9.1
> release. You are right, it is not needed for 2.9.1 users to upgrade (but
> they can), but for new users starting with Lucene, the recommendadion is to
> use it and not 2.9.
>
> 3.0 also contains some cleanups needed for 3.1, as the compressed fields
> are no longer supported, so they must be uncompressed, which is done during
> optimizing/merging in 3.0. Later versions will remove support for older
> index types, but you should really update your indexes, especially because
> flex indexing will possibly remove more support for older indexes (as it
> gets more complex to maintain all the different file formats).
>
>
>
> So 3.0 is recommended for users starting new Java 5 projects and want a
> clean API. People needing backwards compatibility can use 2.9.1, but support
> for that version will be cancelled in future and bugfixes will only go into
> 3.x.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
> ------------------------------
>
> *From:* Erick Erickson [mailto:erickerickson [at] gmail]
> *Sent:* Monday, November 16, 2009 7:10 PM
> *To:* java-dev [at] lucene
> *Subject:* Why release 3.0?
>
>
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>


uwe at thetaphi

Nov 16, 2009, 12:13 PM

Post #23 of 50 (703 views)
Permalink
RE: Why release 3.0? [In reply to]

I have to regenerate the JFlex files to be sure that they are Java 5. Should
I do and recreate the artifacts, they are not yet released.



Correct would be to copy the current generated Java file and use it if
matchVersion < Version.LUCENE_30. For 3.0++ we have a new one. If the old
one is really Java 1.4 can be seen by trying out.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 9:06 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



i suppose we are ok then, except for the fact that now StandardTokenizer is
working with a unicode 3.0 definition, instead of the unicode version (4.0)
that corresponds to our required minimum jre (1.5)...

sorry if i raised a stink about nothing, but you see my concerns maybe?

On Mon, Nov 16, 2009 at 3:01 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

JFlex was not regenerated as far as I know, but if somebody did, its already
broken.



-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:53 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



btw, so heres a great example. you are backwards broken regardless of JVM
for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
in 3.0, right?

On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rcmuir [at] gmail> wrote:

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
the tokenizer, remember to use JRE 1.4 to run jflex (before
Lucene 3.0). This grammar now uses constructs (eg :digit:,
:letter:) whose meaning can vary according to the JRE used to
run jflex. See
https://issues.apache.org/jira/browse/LUCENE-1126 for details.



On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.



It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:45 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think.

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

We tried out: Character.getType() for these two chars:



Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7



The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Robert Muir [mailto:rcmuir [at] gmail]
Sent: Monday, November 16, 2009 8:37 PM


To: java-dev [at] lucene
Subject: Re: Why release 3.0?



right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene. I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve


On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
>
> But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Robert Muir [mailto:rcmuir [at] gmail]
> Sent: Monday, November 16, 2009 8:09 PM
>
> To: java-dev [at] lucene
> Subject: Re: Why release 3.0?
>
>
>
> uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
> On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
wrote:
>
> 2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> ________________________________
>
>
> From: Jake Mannix [mailto:jake.mannix [at] gmail]
> Sent: Monday, November 16, 2009 7:15 PM
>
>
> To: java-dev [at] lucene
>
> Subject: Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
> 2.4 index file formats? I suppose if you've already upgraded to
> 2.9, then all is well because
> 2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
> -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <erickerickson [at] gmail> wrote:
>
> One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
> That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
> Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
> Erick




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail




--
Robert Muir
rcmuir [at] gmail


rcmuir at gmail

Nov 16, 2009, 12:15 PM

Post #24 of 50 (706 views)
Permalink
Re: Why release 3.0? [In reply to]

Steven, I think we can be almost sure of no latin-1 changes.

what do you think about this jflex situation though?
it seems like a mess, is there anything we can do before the jflex 1.5 stuff
that is going on now (where we could actually link Version to the unicode
version jflex uses explicitly?)

should we generate a separate jflex for 3.0 based on 1.5 jre and use it
depending on Version for now?

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe [at] syr> wrote:

> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene. I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
> >
> >
> > But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Robert Muir [mailto:rcmuir [at] gmail]
> > Sent: Monday, November 16, 2009 8:09 PM
> >
> > To: java-dev [at] lucene
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> > On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe [at] thetaphi>
> wrote:
> >
> > 2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe [at] thetaphi
> >
> >
> > ________________________________
> >
> >
> > From: Jake Mannix [mailto:jake.mannix [at] gmail]
> > Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> > To: java-dev [at] lucene
> >
> > Subject: Re: Why release 3.0?
> >
> >
> >
> > Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> > 2.4 index file formats? I suppose if you've already upgraded to
> > 2.9, then all is well because
> > 2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> > If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> > you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> > -jake
> >
> > On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <erickerickson [at] gmail> wrote:
> >
> > One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> > Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> > That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> > Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> > Erick
>
>


--
Robert Muir
rcmuir [at] gmail


uwe at thetaphi

Nov 16, 2009, 12:15 PM

Post #25 of 50 (706 views)
Permalink
RE: Why release 3.0? [In reply to]

We support 3.0, why do you tend to say something other? I will always fix
the bug first in 3.0 and then merge (perhaps) back to 2.9.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Erick Erickson [mailto:erickerickson [at] gmail]
Sent: Monday, November 16, 2009 9:13 PM
To: java-dev [at] lucene
Subject: Re: Why release 3.0?



Oops, stupid mouse made me send a blank message.



Ok, I withdraw the question since there *are* good reasons to put

3.0 in a prod environment <G>. It's also an easier thing to say "new Lucene

users should start with 3.0" rather than "new Lucene users should

start with 3.1. Use 3.0 until we release 3.1 but be aware we're not going to

support 3.0...." Yuuuuccckkkk....



Erick

On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

Hi Erick,



3.0 is *not* unsupported or beta release, it is the cleaned up 2.9.1
release. You are right, it is not needed for 2.9.1 users to upgrade (but
they can), but for new users starting with Lucene, the recommendadion is to
use it and not 2.9.

3.0 also contains some cleanups needed for 3.1, as the compressed fields are
no longer supported, so they must be uncompressed, which is done during
optimizing/merging in 3.0. Later versions will remove support for older
index types, but you should really update your indexes, especially because
flex indexing will possibly remove more support for older indexes (as it
gets more complex to maintain all the different file formats).



So 3.0 is recommended for users starting new Java 5 projects and want a
clean API. People needing backwards compatibility can use 2.9.1, but support
for that version will be cancelled in future and bugfixes will only go into
3.x.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

_____

From: Erick Erickson [mailto:erickerickson [at] gmail]
Sent: Monday, November 16, 2009 7:10 PM
To: java-dev [at] lucene
Subject: Why release 3.0?



One of my "specialties" is asking obvious questions just to see if
everyone's assumptions

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1

as the first *supported* 3.x release?



Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.



That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.



Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...



Erick

First page Previous page 1 2 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.