Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Having a default constructor in Analyzers

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


sanne.grinovero at gmail

Feb 7, 2010, 9:33 AM

Post #1 of 18 (1905 views)
Permalink
Having a default constructor in Analyzers

Hello,
I've seen that some core Analyzers are now missing a default
constructor; this is preventing many applications to configure/load
Analyzers by reflection, which is a common use case to have Analyzers
chosen in configuration files.

Would it be possible to add, for example, a constructor like

public StandardAnalyzer() {
this(Version.LUCENE_CURRENT);
}

?

Of course more advanced use cases would need to pass parameters but
please make the advanced usage optional; I have now seen more than a
single project break because of this (and revert to older Lucene).

Regards,
Sanne

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


simon.willnauer at googlemail

Feb 7, 2010, 9:49 AM

Post #2 of 18 (1869 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

Almost no Analyzer has a default ctor anymore due to the introduction
of Version. This should not be an issue for API users loading
Analyzers per reflection. You can still call the version Ctor
alternatively. Providing a default ctor with the current version could
be very risky in many regards, a backwards break based on version
could break older indexes etc etc.

I don't see that this keeps other applications from upgrading lucene.

simon

On Sun, Feb 7, 2010 at 6:33 PM, Sanne Grinovero
<sanne.grinovero [at] gmail> wrote:
> Hello,
> I've seen that some core Analyzers are now missing a default
> constructor; this is preventing many applications to configure/load
> Analyzers by reflection, which is a common use case to have Analyzers
> chosen in configuration files.
>
> Would it be possible to add, for example, a constructor like
>
> public StandardAnalyzer() {
>   this(Version.LUCENE_CURRENT);
> }
>
> ?
>
> Of course more advanced use cases would need to pass parameters but
> please make the advanced usage optional; I have now seen more than a
> single project break because of this (and revert to older Lucene).
>
> Regards,
> Sanne
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


uwe at thetaphi

Feb 7, 2010, 9:53 AM

Post #3 of 18 (1870 views)
Permalink
RE: Having a default constructor in Analyzers [In reply to]

Hi Sanne,

Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is the badest thing you can do if you want to later update your Lucene version and do not want to reindex all your indexes (see javadocs).

It is easy to modify your application to create analyzers even from config files using the reflection way. Just find a constructor taking Version and call newInstance() on it, not directly on the Class. It's just one line of code more.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

> -----Original Message-----
> From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
> Sent: Sunday, February 07, 2010 6:33 PM
> To: java-dev [at] lucene
> Subject: Having a default constructor in Analyzers
>
> Hello,
> I've seen that some core Analyzers are now missing a default
> constructor; this is preventing many applications to configure/load
> Analyzers by reflection, which is a common use case to have Analyzers
> chosen in configuration files.
>
> Would it be possible to add, for example, a constructor like
>
> public StandardAnalyzer() {
> this(Version.LUCENE_CURRENT);
> }
>
> ?
>
> Of course more advanced use cases would need to pass parameters but
> please make the advanced usage optional; I have now seen more than a
> single project break because of this (and revert to older Lucene).
>
> Regards,
> Sanne
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


uwe at thetaphi

Feb 7, 2010, 10:15 AM

Post #4 of 18 (1873 views)
Permalink
RE: Having a default constructor in Analyzers [In reply to]

Hi Sanne,

This is example code used e.g. in my applications. This method allows you to create an analyzer via reflection for both cases: default ctor or with version:

final Analyzer createAnalyzer(String className, Version matchVersion) throws Exception {
final Class<? extends Analyzer> clazz = Class.forName(className).asSubclass(Analyzer.class);
try {
// first try to use a ctor with version parameter (needed for many new Analyzers that have no default one anymore
return clazz.getConstructor(Version.class).newInstance(matchVersion);
} catch (NoSuchMethodException nsme) {
// otherwise use default ctor
return clazz.newInstance();
}
}

The method takes a Version parameter, that you should really use (and not LUCENE_CURRENT!!!). You should add an extra configuration option in your software for a default version that is used everywhere when you instantiate lucene objects (like query parser and so on). By this you preserve the compatibility after upgrades.

Just add a option somewhere in your config files, and assign this to a final Version constant in your application using Version.valueOf(StringFromFile).

Hope that helps,
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe [at] thetaphi]
> Sent: Sunday, February 07, 2010 6:54 PM
> To: java-dev [at] lucene
> Subject: RE: Having a default constructor in Analyzers
>
> Hi Sanne,
>
> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is
> the badest thing you can do if you want to later update your Lucene
> version and do not want to reindex all your indexes (see javadocs).
>
> It is easy to modify your application to create analyzers even from
> config files using the reflection way. Just find a constructor taking
> Version and call newInstance() on it, not directly on the Class. It's
> just one line of code more.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
> > -----Original Message-----
> > From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
> > Sent: Sunday, February 07, 2010 6:33 PM
> > To: java-dev [at] lucene
> > Subject: Having a default constructor in Analyzers
> >
> > Hello,
> > I've seen that some core Analyzers are now missing a default
> > constructor; this is preventing many applications to configure/load
> > Analyzers by reflection, which is a common use case to have Analyzers
> > chosen in configuration files.
> >
> > Would it be possible to add, for example, a constructor like
> >
> > public StandardAnalyzer() {
> > this(Version.LUCENE_CURRENT);
> > }
> >
> > ?
> >
> > Of course more advanced use cases would need to pass parameters but
> > please make the advanced usage optional; I have now seen more than a
> > single project break because of this (and revert to older Lucene).
> >
> > Regards,
> > Sanne
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> > For additional commands, e-mail: java-dev-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rcmuir at gmail

Feb 7, 2010, 10:33 AM

Post #5 of 18 (1866 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION is
done.

On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> Hi Sanne,
>
> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is the
> badest thing you can do if you want to later update your Lucene version and
> do not want to reindex all your indexes (see javadocs).
>
> It is easy to modify your application to create analyzers even from config
> files using the reflection way. Just find a constructor taking Version and
> call newInstance() on it, not directly on the Class. It's just one line of
> code more.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
> > -----Original Message-----
> > From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
> > Sent: Sunday, February 07, 2010 6:33 PM
> > To: java-dev [at] lucene
> > Subject: Having a default constructor in Analyzers
> >
> > Hello,
> > I've seen that some core Analyzers are now missing a default
> > constructor; this is preventing many applications to configure/load
> > Analyzers by reflection, which is a common use case to have Analyzers
> > chosen in configuration files.
> >
> > Would it be possible to add, for example, a constructor like
> >
> > public StandardAnalyzer() {
> > this(Version.LUCENE_CURRENT);
> > }
> >
> > ?
> >
> > Of course more advanced use cases would need to pass parameters but
> > please make the advanced usage optional; I have now seen more than a
> > single project break because of this (and revert to older Lucene).
> >
> > Regards,
> > Sanne
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> > For additional commands, e-mail: java-dev-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>


--
Robert Muir
rcmuir [at] gmail


sanne.grinovero at gmail

Feb 7, 2010, 11:17 AM

Post #6 of 18 (1860 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

Thanks for all the quick answers;

finding the ctor having only a Version parameter is fine for me, I had
noticed this "frequent pattern" but didn't understand that was a
general rule.
So can I assume this is an implicit contract for all Analyzers, to
have either an empty ctor or a single-parameter of type Version?

I know about the dangers of using LUCENE_CURRENT, but rebuilding the
index is not always something you need to avoid.
Having LUCENE_CURRENT is for example useful for me to test Hibernate
Search towards the current Lucene on classpath, without having to
rebuild the code.

thanks for all help,
Sanne


2010/2/7 Robert Muir <rcmuir [at] gmail>:
> I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION is
> done.
>
> On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>>
>> Hi Sanne,
>>
>> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is the
>> badest thing you can do if you want to later update your Lucene version and
>> do not want to reindex all your indexes (see javadocs).
>>
>> It is easy to modify your application to create analyzers even from config
>> files using the reflection way. Just find a constructor taking Version and
>> call newInstance() on it, not directly on the Class. It's just one line of
>> code more.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe [at] thetaphi
>>
>> > -----Original Message-----
>> > From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
>> > Sent: Sunday, February 07, 2010 6:33 PM
>> > To: java-dev [at] lucene
>> > Subject: Having a default constructor in Analyzers
>> >
>> > Hello,
>> > I've seen that some core Analyzers are now missing a default
>> > constructor; this is preventing many applications to configure/load
>> > Analyzers by reflection, which is a common use case to have Analyzers
>> > chosen in configuration files.
>> >
>> > Would it be possible to add, for example, a constructor like
>> >
>> > public StandardAnalyzer() {
>> >    this(Version.LUCENE_CURRENT);
>> > }
>> >
>> > ?
>> >
>> > Of course more advanced use cases would need to pass parameters but
>> > please make the advanced usage optional; I have now seen more than a
>> > single project break because of this (and revert to older Lucene).
>> >
>> > Regards,
>> > Sanne
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> > For additional commands, e-mail: java-dev-help [at] lucene
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rcmuir at gmail

Feb 7, 2010, 11:31 AM

Post #7 of 18 (1862 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

ok, I think you have a valid case for using this for testing. We just have
to be careful about wording and such, because i think right now we encourage
users to use this constant. I created a patch to make it scary and
deprecated under LUCENE-2080, we don't have to remove the constant just keep
it forever deprecated as it breaks backwards compatibility to use it.

as far as implicit contract of no-arg/Version ctor for analyzers i do not
think this is the case. someone can make an analyzer with a required
argument if they want to, and I think some analyzers have this already
(PerFieldAnalyzerWrapper at least)

On Sun, Feb 7, 2010 at 2:17 PM, Sanne Grinovero
<sanne.grinovero [at] gmail>wrote:

> Thanks for all the quick answers;
>
> finding the ctor having only a Version parameter is fine for me, I had
> noticed this "frequent pattern" but didn't understand that was a
> general rule.
> So can I assume this is an implicit contract for all Analyzers, to
> have either an empty ctor or a single-parameter of type Version?
>
> I know about the dangers of using LUCENE_CURRENT, but rebuilding the
> index is not always something you need to avoid.
> Having LUCENE_CURRENT is for example useful for me to test Hibernate
> Search towards the current Lucene on classpath, without having to
> rebuild the code.
>
> thanks for all help,
> Sanne
>
>
> 2010/2/7 Robert Muir <rcmuir [at] gmail>:
> > I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION is
> > done.
> >
> > On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
> >>
> >> Hi Sanne,
> >>
> >> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is
> the
> >> badest thing you can do if you want to later update your Lucene version
> and
> >> do not want to reindex all your indexes (see javadocs).
> >>
> >> It is easy to modify your application to create analyzers even from
> config
> >> files using the reflection way. Just find a constructor taking Version
> and
> >> call newInstance() on it, not directly on the Class. It's just one line
> of
> >> code more.
> >>
> >> Uwe
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe [at] thetaphi
> >>
> >> > -----Original Message-----
> >> > From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
> >> > Sent: Sunday, February 07, 2010 6:33 PM
> >> > To: java-dev [at] lucene
> >> > Subject: Having a default constructor in Analyzers
> >> >
> >> > Hello,
> >> > I've seen that some core Analyzers are now missing a default
> >> > constructor; this is preventing many applications to configure/load
> >> > Analyzers by reflection, which is a common use case to have Analyzers
> >> > chosen in configuration files.
> >> >
> >> > Would it be possible to add, for example, a constructor like
> >> >
> >> > public StandardAnalyzer() {
> >> > this(Version.LUCENE_CURRENT);
> >> > }
> >> >
> >> > ?
> >> >
> >> > Of course more advanced use cases would need to pass parameters but
> >> > please make the advanced usage optional; I have now seen more than a
> >> > single project break because of this (and revert to older Lucene).
> >> >
> >> > Regards,
> >> > Sanne
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> >> > For additional commands, e-mail: java-dev-help [at] lucene
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-dev-help [at] lucene
> >>
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir [at] gmail
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>


--
Robert Muir
rcmuir [at] gmail


simon.willnauer at googlemail

Feb 7, 2010, 11:34 AM

Post #8 of 18 (1855 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

Sanne, I would recommend you building a Factory pattern around you
Analyzers / TokenStreams similar to what solr does. That way you can
load you own "default ctor" interface via reflection and obtain you
analyzers from those factories. That makes more sense anyway as you
only load the factory via reflection an not the analyzers.

@Robert: I don't know if removing LUCENE_CURRENT is the way to go. On
the one hand it would make our live easier over time but would make it
harder for our users to upgrade. I would totally agree that for
upgrade safety it would be much better to enforce an explicit version
number so upgrading can be done step by step. Yet, if we deprecate
LUCENE_CURRENT people will use it for at least the next 3 to 5 years
(until 4.0) anyway :)

simon

On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero
<sanne.grinovero [at] gmail> wrote:
> Thanks for all the quick answers;
>
> finding the ctor having only a Version parameter is fine for me, I had
> noticed this "frequent pattern" but didn't understand that was a
> general rule.
> So can I assume this is an implicit contract for all Analyzers, to
> have either an empty ctor or a single-parameter of type Version?
>
> I know about the dangers of using LUCENE_CURRENT, but rebuilding the
> index is not always something you need to avoid.
> Having LUCENE_CURRENT is for example useful for me to test Hibernate
> Search towards the current Lucene on classpath, without having to
> rebuild the code.
>
> thanks for all help,
> Sanne
>
>
> 2010/2/7 Robert Muir <rcmuir [at] gmail>:
>> I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION is
>> done.
>>
>> On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>>>
>>> Hi Sanne,
>>>
>>> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is the
>>> badest thing you can do if you want to later update your Lucene version and
>>> do not want to reindex all your indexes (see javadocs).
>>>
>>> It is easy to modify your application to create analyzers even from config
>>> files using the reflection way. Just find a constructor taking Version and
>>> call newInstance() on it, not directly on the Class. It's just one line of
>>> code more.
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe [at] thetaphi
>>>
>>> > -----Original Message-----
>>> > From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
>>> > Sent: Sunday, February 07, 2010 6:33 PM
>>> > To: java-dev [at] lucene
>>> > Subject: Having a default constructor in Analyzers
>>> >
>>> > Hello,
>>> > I've seen that some core Analyzers are now missing a default
>>> > constructor; this is preventing many applications to configure/load
>>> > Analyzers by reflection, which is a common use case to have Analyzers
>>> > chosen in configuration files.
>>> >
>>> > Would it be possible to add, for example, a constructor like
>>> >
>>> > public StandardAnalyzer() {
>>> >    this(Version.LUCENE_CURRENT);
>>> > }
>>> >
>>> > ?
>>> >
>>> > Of course more advanced use cases would need to pass parameters but
>>> > please make the advanced usage optional; I have now seen more than a
>>> > single project break because of this (and revert to older Lucene).
>>> >
>>> > Regards,
>>> > Sanne
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> > For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir [at] gmail
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rcmuir at gmail

Feb 7, 2010, 11:38 AM

Post #9 of 18 (1858 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

Simon, can you explain how removing CURRENT makes it harder for users to
upgrade? If you mean for the case of people that always re-index all
documents when upgrading lucene jar, then this makes sense to me.

I guess as a step we can at least deprecate this thing and strongly
discourage its use, please see the patch at LUCENE-2080.

Not to pick on Sanne, but his wording about: "Of course more advanced use
cases would need to pass parameters but please make the advanced usage
optional", this really caused me to rethink CURRENT, because CURRENT itself
should be the advanced use case!!!

On Sun, Feb 7, 2010 at 2:34 PM, Simon Willnauer <
simon.willnauer [at] googlemail> wrote:

> Sanne, I would recommend you building a Factory pattern around you
> Analyzers / TokenStreams similar to what solr does. That way you can
> load you own "default ctor" interface via reflection and obtain you
> analyzers from those factories. That makes more sense anyway as you
> only load the factory via reflection an not the analyzers.
>
> @Robert: I don't know if removing LUCENE_CURRENT is the way to go. On
> the one hand it would make our live easier over time but would make it
> harder for our users to upgrade. I would totally agree that for
> upgrade safety it would be much better to enforce an explicit version
> number so upgrading can be done step by step. Yet, if we deprecate
> LUCENE_CURRENT people will use it for at least the next 3 to 5 years
> (until 4.0) anyway :)
>
> simon
>
> On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero
> <sanne.grinovero [at] gmail> wrote:
> > Thanks for all the quick answers;
> >
> > finding the ctor having only a Version parameter is fine for me, I had
> > noticed this "frequent pattern" but didn't understand that was a
> > general rule.
> > So can I assume this is an implicit contract for all Analyzers, to
> > have either an empty ctor or a single-parameter of type Version?
> >
> > I know about the dangers of using LUCENE_CURRENT, but rebuilding the
> > index is not always something you need to avoid.
> > Having LUCENE_CURRENT is for example useful for me to test Hibernate
> > Search towards the current Lucene on classpath, without having to
> > rebuild the code.
> >
> > thanks for all help,
> > Sanne
> >
> >
> > 2010/2/7 Robert Muir <rcmuir [at] gmail>:
> >> I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION
> is
> >> done.
> >>
> >> On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
> >>>
> >>> Hi Sanne,
> >>>
> >>> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is
> the
> >>> badest thing you can do if you want to later update your Lucene version
> and
> >>> do not want to reindex all your indexes (see javadocs).
> >>>
> >>> It is easy to modify your application to create analyzers even from
> config
> >>> files using the reflection way. Just find a constructor taking Version
> and
> >>> call newInstance() on it, not directly on the Class. It's just one line
> of
> >>> code more.
> >>>
> >>> Uwe
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe [at] thetaphi
> >>>
> >>> > -----Original Message-----
> >>> > From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
> >>> > Sent: Sunday, February 07, 2010 6:33 PM
> >>> > To: java-dev [at] lucene
> >>> > Subject: Having a default constructor in Analyzers
> >>> >
> >>> > Hello,
> >>> > I've seen that some core Analyzers are now missing a default
> >>> > constructor; this is preventing many applications to configure/load
> >>> > Analyzers by reflection, which is a common use case to have Analyzers
> >>> > chosen in configuration files.
> >>> >
> >>> > Would it be possible to add, for example, a constructor like
> >>> >
> >>> > public StandardAnalyzer() {
> >>> > this(Version.LUCENE_CURRENT);
> >>> > }
> >>> >
> >>> > ?
> >>> >
> >>> > Of course more advanced use cases would need to pass parameters but
> >>> > please make the advanced usage optional; I have now seen more than a
> >>> > single project break because of this (and revert to older Lucene).
> >>> >
> >>> > Regards,
> >>> > Sanne
> >>> >
> >>> > ---------------------------------------------------------------------
> >>> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> >>> > For additional commands, e-mail: java-dev-help [at] lucene
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> >>> For additional commands, e-mail: java-dev-help [at] lucene
> >>>
> >>
> >>
> >>
> >> --
> >> Robert Muir
> >> rcmuir [at] gmail
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> > For additional commands, e-mail: java-dev-help [at] lucene
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>


--
Robert Muir
rcmuir [at] gmail


simon.willnauer at googlemail

Feb 7, 2010, 11:46 AM

Post #10 of 18 (1856 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

On Sun, Feb 7, 2010 at 8:38 PM, Robert Muir <rcmuir [at] gmail> wrote:
> Simon, can you explain how removing CURRENT makes it harder for users to
> upgrade? If you mean for the case of people that always re-index all
> documents when upgrading lucene jar, then this makes sense to me.
That is what I was alluding to!
Not much of a deal though most IDEs let you upgrade via refactoring
easily and we can document this too. Yet we won't have a drop in
upgrade anymore though.

>
> I guess as a step we can at least deprecate this thing and strongly
> discourage its use, please see the patch at LUCENE-2080.
>
> Not to pick on Sanne, but his wording about: "Of course more advanced use
> cases would need to pass parameters but please make the advanced usage
> optional", this really caused me to rethink CURRENT, because CURRENT itself
> should be the advanced use case!!!
>
> On Sun, Feb 7, 2010 at 2:34 PM, Simon Willnauer
> <simon.willnauer [at] googlemail> wrote:
>>
>> Sanne, I would recommend you building a Factory pattern around you
>> Analyzers / TokenStreams similar to what solr does. That way you can
>> load you own "default ctor" interface via reflection and obtain you
>> analyzers from those factories. That makes more sense anyway as you
>> only load the factory via reflection an not the analyzers.
>>
>> @Robert: I don't know if removing LUCENE_CURRENT is the way to go. On
>> the one hand it would make our live easier over time but would make it
>> harder for our users to upgrade. I would totally agree that for
>> upgrade safety it would be much better to enforce an explicit version
>> number so upgrading can be done step by step. Yet, if we deprecate
>> LUCENE_CURRENT people will use it for at least the next 3 to 5 years
>> (until 4.0) anyway :)
>>
>> simon
>>
>> On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero
>> <sanne.grinovero [at] gmail> wrote:
>> > Thanks for all the quick answers;
>> >
>> > finding the ctor having only a Version parameter is fine for me, I had
>> > noticed this "frequent pattern" but didn't understand that was a
>> > general rule.
>> > So can I assume this is an implicit contract for all Analyzers, to
>> > have either an empty ctor or a single-parameter of type Version?
>> >
>> > I know about the dangers of using LUCENE_CURRENT, but rebuilding the
>> > index is not always something you need to avoid.
>> > Having LUCENE_CURRENT is for example useful for me to test Hibernate
>> > Search towards the current Lucene on classpath, without having to
>> > rebuild the code.
>> >
>> > thanks for all help,
>> > Sanne
>> >
>> >
>> > 2010/2/7 Robert Muir <rcmuir [at] gmail>:
>> >> I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION
>> >> is
>> >> done.
>> >>
>> >> On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>> >>>
>> >>> Hi Sanne,
>> >>>
>> >>> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is
>> >>> the
>> >>> badest thing you can do if you want to later update your Lucene
>> >>> version and
>> >>> do not want to reindex all your indexes (see javadocs).
>> >>>
>> >>> It is easy to modify your application to create analyzers even from
>> >>> config
>> >>> files using the reflection way. Just find a constructor taking Version
>> >>> and
>> >>> call newInstance() on it, not directly on the Class. It's just one
>> >>> line of
>> >>> code more.
>> >>>
>> >>> Uwe
>> >>>
>> >>> -----
>> >>> Uwe Schindler
>> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >>> http://www.thetaphi.de
>> >>> eMail: uwe [at] thetaphi
>> >>>
>> >>> > -----Original Message-----
>> >>> > From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
>> >>> > Sent: Sunday, February 07, 2010 6:33 PM
>> >>> > To: java-dev [at] lucene
>> >>> > Subject: Having a default constructor in Analyzers
>> >>> >
>> >>> > Hello,
>> >>> > I've seen that some core Analyzers are now missing a default
>> >>> > constructor; this is preventing many applications to configure/load
>> >>> > Analyzers by reflection, which is a common use case to have
>> >>> > Analyzers
>> >>> > chosen in configuration files.
>> >>> >
>> >>> > Would it be possible to add, for example, a constructor like
>> >>> >
>> >>> > public StandardAnalyzer() {
>> >>> >    this(Version.LUCENE_CURRENT);
>> >>> > }
>> >>> >
>> >>> > ?
>> >>> >
>> >>> > Of course more advanced use cases would need to pass parameters but
>> >>> > please make the advanced usage optional; I have now seen more than a
>> >>> > single project break because of this (and revert to older Lucene).
>> >>> >
>> >>> > Regards,
>> >>> > Sanne
>> >>> >
>> >>> >
>> >>> > ---------------------------------------------------------------------
>> >>> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> >>> > For additional commands, e-mail: java-dev-help [at] lucene
>> >>>
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> >>> For additional commands, e-mail: java-dev-help [at] lucene
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Robert Muir
>> >> rcmuir [at] gmail
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> > For additional commands, e-mail: java-dev-help [at] lucene
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
>
>
> --
> Robert Muir
> rcmuir [at] gmail
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


sanne.grinovero at gmail

Feb 7, 2010, 2:32 PM

Post #11 of 18 (1853 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

Does it make sense to use different values across the same
application? Obviously in the unlikely case you want to threat
different indexes in a different way, but does it make sense when
working all on the same index?
If not, why not introduce a value like "Version.BY_ENVIRONMENT" which
is statically initialized to be one of the other values, reading from
an environment parameter?
So you get the latest at first deploy, and can then keep compatibility
as long as you need, even when updating Lucene.
This way I could still have the safety of pinning down a specific
version and yet avoid rebuilding the app when changing it.
Of course the default would be LUCENE_CURRENT, so that people trying
out Lucene get all features out of the box, and warn about setting it
(maybe log a warning when not set).

Also, wouldn't it make sense to be able to read the recommended
version from the Index?
I'd like to have the hypothetical AnalyzerFactory to find out what it
needs to build getting information from the relevant IndexReader; so
in the case I have two indexes using different versions I won't get
mistakes. (For a query on index A I'm creating a QueryParser, so let's
ask the index which kind of QueryParser I should use...)

just some ideas, forgive me if I misunderstood this usage (should
avoid writing late in the night..)
Regards,
Sanne



2010/2/7 Simon Willnauer <simon.willnauer [at] googlemail>:
> On Sun, Feb 7, 2010 at 8:38 PM, Robert Muir <rcmuir [at] gmail> wrote:
>> Simon, can you explain how removing CURRENT makes it harder for users to
>> upgrade? If you mean for the case of people that always re-index all
>> documents when upgrading lucene jar, then this makes sense to me.
> That is what I was alluding to!
> Not much of a deal though most IDEs let you upgrade via refactoring
> easily and we can document this too. Yet we won't have a drop in
> upgrade anymore though.
>
>>
>> I guess as a step we can at least deprecate this thing and strongly
>> discourage its use, please see the patch at LUCENE-2080.
>>
>> Not to pick on Sanne, but his wording about: "Of course more advanced use
>> cases would need to pass parameters but please make the advanced usage
>> optional", this really caused me to rethink CURRENT, because CURRENT itself
>> should be the advanced use case!!!
>>
>> On Sun, Feb 7, 2010 at 2:34 PM, Simon Willnauer
>> <simon.willnauer [at] googlemail> wrote:
>>>
>>> Sanne, I would recommend you building a Factory pattern around you
>>> Analyzers / TokenStreams similar to what solr does. That way you can
>>> load you own "default ctor" interface via reflection and obtain you
>>> analyzers from those factories. That makes more sense anyway as you
>>> only load the factory via reflection an not the analyzers.
>>>
>>> @Robert: I don't know if removing LUCENE_CURRENT is the way to go. On
>>> the one hand it would make our live easier over time but would make it
>>> harder for our users to upgrade. I would totally agree that for
>>> upgrade safety it would be much better to enforce an explicit version
>>> number so upgrading can be done step by step. Yet, if we deprecate
>>> LUCENE_CURRENT people will use it for at least the next 3 to 5 years
>>> (until 4.0) anyway :)
>>>
>>> simon
>>>
>>> On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero
>>> <sanne.grinovero [at] gmail> wrote:
>>> > Thanks for all the quick answers;
>>> >
>>> > finding the ctor having only a Version parameter is fine for me, I had
>>> > noticed this "frequent pattern" but didn't understand that was a
>>> > general rule.
>>> > So can I assume this is an implicit contract for all Analyzers, to
>>> > have either an empty ctor or a single-parameter of type Version?
>>> >
>>> > I know about the dangers of using LUCENE_CURRENT, but rebuilding the
>>> > index is not always something you need to avoid.
>>> > Having LUCENE_CURRENT is for example useful for me to test Hibernate
>>> > Search towards the current Lucene on classpath, without having to
>>> > rebuild the code.
>>> >
>>> > thanks for all help,
>>> > Sanne
>>> >
>>> >
>>> > 2010/2/7 Robert Muir <rcmuir [at] gmail>:
>>> >> I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION
>>> >> is
>>> >> done.
>>> >>
>>> >> On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>>> >>>
>>> >>> Hi Sanne,
>>> >>>
>>> >>> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is
>>> >>> the
>>> >>> badest thing you can do if you want to later update your Lucene
>>> >>> version and
>>> >>> do not want to reindex all your indexes (see javadocs).
>>> >>>
>>> >>> It is easy to modify your application to create analyzers even from
>>> >>> config
>>> >>> files using the reflection way. Just find a constructor taking Version
>>> >>> and
>>> >>> call newInstance() on it, not directly on the Class. It's just one
>>> >>> line of
>>> >>> code more.
>>> >>>
>>> >>> Uwe
>>> >>>
>>> >>> -----
>>> >>> Uwe Schindler
>>> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> >>> http://www.thetaphi.de
>>> >>> eMail: uwe [at] thetaphi
>>> >>>
>>> >>> > -----Original Message-----
>>> >>> > From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
>>> >>> > Sent: Sunday, February 07, 2010 6:33 PM
>>> >>> > To: java-dev [at] lucene
>>> >>> > Subject: Having a default constructor in Analyzers
>>> >>> >
>>> >>> > Hello,
>>> >>> > I've seen that some core Analyzers are now missing a default
>>> >>> > constructor; this is preventing many applications to configure/load
>>> >>> > Analyzers by reflection, which is a common use case to have
>>> >>> > Analyzers
>>> >>> > chosen in configuration files.
>>> >>> >
>>> >>> > Would it be possible to add, for example, a constructor like
>>> >>> >
>>> >>> > public StandardAnalyzer() {
>>> >>> >    this(Version.LUCENE_CURRENT);
>>> >>> > }
>>> >>> >
>>> >>> > ?
>>> >>> >
>>> >>> > Of course more advanced use cases would need to pass parameters but
>>> >>> > please make the advanced usage optional; I have now seen more than a
>>> >>> > single project break because of this (and revert to older Lucene).
>>> >>> >
>>> >>> > Regards,
>>> >>> > Sanne
>>> >>> >
>>> >>> >
>>> >>> > ---------------------------------------------------------------------
>>> >>> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> >>> > For additional commands, e-mail: java-dev-help [at] lucene
>>> >>>
>>> >>>
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> >>> For additional commands, e-mail: java-dev-help [at] lucene
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Robert Muir
>>> >> rcmuir [at] gmail
>>> >>
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> > For additional commands, e-mail: java-dev-help [at] lucene
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir [at] gmail
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


dmsmith555 at gmail

Feb 7, 2010, 3:09 PM

Post #12 of 18 (1855 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

On Feb 7, 2010, at 5:32 PM, Sanne Grinovero wrote:

> Does it make sense to use different values across the same
> application? Obviously in the unlikely case you want to threat
> different indexes in a different way, but does it make sense when
> working all on the same index?

I think it entirely depends on the use case. In my use case, my app is indexing one book per index with each sentence or paragraph (depends on the book) as a document. The app lives on a user's desktop and they can download books on an as needed basis and them index them in that app.

I don't have it yet, but need to: Imagine that each index maintains a manifest of the toolchain for the index, which includes the version of each part of the chain. Since the index is created all at once, this probably is the same as the version of lucene. When the user searches the index the manifest is consulted to recreate the toolchain.

Suppose the user has updated the application a couple of times and now is sitting at Lucene 4.7. Any index at VERSION 1.9.x (not that we go back that far) has been obsoleted, but all the 2.x and 3.x are still in play, based upon the backward compatibility policy. (2.x is in play from an index compatibility perspective, but not an API perspective.)

But what does Version 3.2 mean at 4.7. For a given filter, it may not have changed from 3.2 to 3.6. Those versions and in between are equivalent for that filter, but another filter in the same tool chain may have been changed at 3.4.

> If not, why not introduce a value like "Version.BY_ENVIRONMENT" which
> is statically initialized to be one of the other values, reading from
> an environment parameter?

Environment parameters are not per index, but per JVM.

> So you get the latest at first deploy, and can then keep compatibility
> as long as you need, even when updating Lucene.
> This way I could still have the safety of pinning down a specific
> version and yet avoid rebuilding the app when changing it.
> Of course the default would be LUCENE_CURRENT, so that people trying
> out Lucene get all features out of the box, and warn about setting it
> (maybe log a warning when not set).
>
> Also, wouldn't it make sense to be able to read the recommended
> version from the Index?

Absolutely!

> I'd like to have the hypothetical AnalyzerFactory to find out what it
> needs to build getting information from the relevant IndexReader; so
> in the case I have two indexes using different versions I won't get
> mistakes. (For a query on index A I'm creating a QueryParser, so let's
> ask the index which kind of QueryParser I should use...)

IIRC: This is something that Marvin has implemented in Lucy. And what I was talking about above.

>
> just some ideas, forgive me if I misunderstood this usage (should
> avoid writing late in the night..)
> Regards,
> Sanne
>
>
>
> 2010/2/7 Simon Willnauer <simon.willnauer [at] googlemail>:
>> On Sun, Feb 7, 2010 at 8:38 PM, Robert Muir <rcmuir [at] gmail> wrote:
>>> Simon, can you explain how removing CURRENT makes it harder for users to
>>> upgrade? If you mean for the case of people that always re-index all
>>> documents when upgrading lucene jar, then this makes sense to me.
>> That is what I was alluding to!
>> Not much of a deal though most IDEs let you upgrade via refactoring
>> easily and we can document this too. Yet we won't have a drop in
>> upgrade anymore though.
>>
>>>
>>> I guess as a step we can at least deprecate this thing and strongly
>>> discourage its use, please see the patch at LUCENE-2080.
>>>
>>> Not to pick on Sanne, but his wording about: "Of course more advanced use
>>> cases would need to pass parameters but please make the advanced usage
>>> optional", this really caused me to rethink CURRENT, because CURRENT itself
>>> should be the advanced use case!!!
>>>
>>> On Sun, Feb 7, 2010 at 2:34 PM, Simon Willnauer
>>> <simon.willnauer [at] googlemail> wrote:
>>>>
>>>> Sanne, I would recommend you building a Factory pattern around you
>>>> Analyzers / TokenStreams similar to what solr does. That way you can
>>>> load you own "default ctor" interface via reflection and obtain you
>>>> analyzers from those factories. That makes more sense anyway as you
>>>> only load the factory via reflection an not the analyzers.
>>>>
>>>> @Robert: I don't know if removing LUCENE_CURRENT is the way to go. On
>>>> the one hand it would make our live easier over time but would make it
>>>> harder for our users to upgrade. I would totally agree that for
>>>> upgrade safety it would be much better to enforce an explicit version
>>>> number so upgrading can be done step by step. Yet, if we deprecate
>>>> LUCENE_CURRENT people will use it for at least the next 3 to 5 years
>>>> (until 4.0) anyway :)
>>>>
>>>> simon
>>>>
>>>> On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero
>>>> <sanne.grinovero [at] gmail> wrote:
>>>>> Thanks for all the quick answers;
>>>>>
>>>>> finding the ctor having only a Version parameter is fine for me, I had
>>>>> noticed this "frequent pattern" but didn't understand that was a
>>>>> general rule.
>>>>> So can I assume this is an implicit contract for all Analyzers, to
>>>>> have either an empty ctor or a single-parameter of type Version?
>>>>>
>>>>> I know about the dangers of using LUCENE_CURRENT, but rebuilding the
>>>>> index is not always something you need to avoid.
>>>>> Having LUCENE_CURRENT is for example useful for me to test Hibernate
>>>>> Search towards the current Lucene on classpath, without having to
>>>>> rebuild the code.
>>>>>
>>>>> thanks for all help,
>>>>> Sanne
>>>>>
>>>>>
>>>>> 2010/2/7 Robert Muir <rcmuir [at] gmail>:
>>>>>> I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION
>>>>>> is
>>>>>> done.
>>>>>>
>>>>>> On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>>>>>>>
>>>>>>> Hi Sanne,
>>>>>>>
>>>>>>> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is
>>>>>>> the
>>>>>>> badest thing you can do if you want to later update your Lucene
>>>>>>> version and
>>>>>>> do not want to reindex all your indexes (see javadocs).
>>>>>>>
>>>>>>> It is easy to modify your application to create analyzers even from
>>>>>>> config
>>>>>>> files using the reflection way. Just find a constructor taking Version
>>>>>>> and
>>>>>>> call newInstance() on it, not directly on the Class. It's just one
>>>>>>> line of
>>>>>>> code more.
>>>>>>>
>>>>>>> Uwe
>>>>>>>
>>>>>>> -----
>>>>>>> Uwe Schindler
>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>>> http://www.thetaphi.de
>>>>>>> eMail: uwe [at] thetaphi
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Sanne Grinovero [mailto:sanne.grinovero [at] gmail]
>>>>>>>> Sent: Sunday, February 07, 2010 6:33 PM
>>>>>>>> To: java-dev [at] lucene
>>>>>>>> Subject: Having a default constructor in Analyzers
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>> I've seen that some core Analyzers are now missing a default
>>>>>>>> constructor; this is preventing many applications to configure/load
>>>>>>>> Analyzers by reflection, which is a common use case to have
>>>>>>>> Analyzers
>>>>>>>> chosen in configuration files.
>>>>>>>>
>>>>>>>> Would it be possible to add, for example, a constructor like
>>>>>>>>
>>>>>>>> public StandardAnalyzer() {
>>>>>>>> this(Version.LUCENE_CURRENT);
>>>>>>>> }
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>> Of course more advanced use cases would need to pass parameters but
>>>>>>>> please make the advanced usage optional; I have now seen more than a
>>>>>>>> single project break because of this (and revert to older Lucene).
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sanne
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>>>>>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>>>>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Muir
>>>>>> rcmuir [at] gmail
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir [at] gmail
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rcmuir at gmail

Feb 7, 2010, 3:13 PM

Post #13 of 18 (1858 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

> > Also, wouldn't it make sense to be able to read the recommended
> > version from the Index?
>
> Absolutely!
>
>
how would this work when the Query analyzer differs from the Index analyzer?
For example, using commongrams in solr means you use a different Query
analyzer from Index analyzer, and there are some other use cases even in
solr (synonyms expansion and things like that)

--
Robert Muir
rcmuir [at] gmail


marvin at rectangular

Feb 7, 2010, 4:47 PM

Post #14 of 18 (1847 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

DM Smith:

> Imagine that each index maintains a manifest of the toolchain for the index,
> which includes the version of each part of the chain. Since the index is
> created all at once, this probably is the same as the version of lucene.
> When the user searches the index the manifest is consulted to recreate the
> toolchain.

---->8 snip 8<----

> IIRC: This is something that Marvin has implemented in Lucy.

Yes. QueryParser's constructor takes a Schema argument. Furthermore, Schema
definitions are fully externalized and stored as JSON with the index itself.
So you can do stuff like this:

IndexReader reader = IndexReader.open("/path/to/index");
QueryParser qparser = new QueryParser(reader.getSchema());

We haven't got Version for our Analyzers yet, but it's planned. I'm following
this discussion with interest to see how the deployment of Version plays out
with the user base.

However, Lucy's approach won't work for Lucene because Lucene allows you to
have fields with the same name and completely different semantics.

Marvin Humphrey



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


sanne.grinovero at gmail

Feb 8, 2010, 12:31 AM

Post #15 of 18 (1811 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

2010/2/8 Robert Muir <rcmuir [at] gmail>:
---->8 snip 8<----
>
> how would this work when the Query analyzer differs from the Index analyzer?
> For example, using commongrams in solr means you use a different Query
> analyzer from Index analyzer, and there are some other use cases even in
> solr (synonyms expansion and things like that)
---->8 snip 8<----

They are two different Analyzer types, but I assume they want to use
the same value for Version, right? The same version which was used to
build the rest of the index.

Regards,
Sanne

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


simon.willnauer at googlemail

Feb 8, 2010, 12:48 AM

Post #16 of 18 (1814 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

On Mon, Feb 8, 2010 at 9:31 AM, Sanne Grinovero
<sanne.grinovero [at] gmail> wrote:
> 2010/2/8 Robert Muir <rcmuir [at] gmail>:
> ---->8 snip 8<----
>>
>> how would this work when the Query analyzer differs from the Index analyzer?
>> For example, using commongrams in solr means you use a different Query
>> analyzer from Index analyzer, and there are some other use cases even in
>> solr (synonyms expansion and things like that)
> ---->8 snip 8<----
>
> They are two different Analyzer types, but I assume they want to use
> the same value for Version, right? The same version which was used to
> build the rest of the index.
So this is trick, if you have Analyzer A(Version.1) and Analyzer
B(Version.2) and build an index with it you will likely have to
maintain those version until you make a clean cut and upgrade to
A(Version.X) B(Version.X). If you are upgrading to 2.9 from 2.x you
can simply use Version.2.x for both of you analyzers. It always
depends on the changes which have been applied to the Analyzers but it
is very likely that they break your index compat if you blindly
upgrade.

One of the biggest problems is that some of your users might want to
use analyzer X with Version.Y and analyzer X' with Version.Y' because
of some weird "buggy" behavior. You have to find your way through this
unfortunately. You might offer your users to define the versions
themselves and warn them like Robert did in the blinking JavaDoc :)

simon
>
> Regards,
> Sanne
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


uwe at thetaphi

Feb 8, 2010, 1:57 AM

Post #17 of 18 (1807 views)
Permalink
RE: Having a default constructor in Analyzers [In reply to]

Simon:
> Sanne, I would recommend you building a Factory pattern around you
> Analyzers / TokenStreams similar to what solr does. That way you can
> load you own "default ctor" interface via reflection and obtain you
> analyzers from those factories. That makes more sense anyway as you
> only load the factory via reflection an not the analyzers.

As far as I see, Hibernate uses Solr Factories. On the other hand, you can instead of creating your own SolrAnalyzer also use a "standard" one from Lucene (you can do this in Solr, too):

http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#analyzer

In my opinion, the Factory pattern is ok for own analyzer definitions. For reusing "standard" analyzers like "StandardAnalyzer" or "TurkishAnalyzer", the ideal case is to use the reflection code I proposed before. This code works for all language-based analyzers having a standard ctor or Version ctor. Solr will also handle this reflection-based instantiation with optional Version parameter in future, too (Eric Hatcher pointed that out to me, when working on SOLR-1677: "Another comment on this... Solr supports using an Analyzer also, but only ones with zero-arg constructors. It would be nice if this Version support also allowed for Analyzers (say SmartChineseAnalyzer) to be used also directly. I don't think this patch accounts for this case, does it?").

As Hibernate also uses the factory pattern for custom analyzers, as soon as https://issues.apache.org/jira/browse/SOLR-1677 is in, the version problem for those should be solved, too (as you can specify the parameter to each component). But Hibernate should also think about a global default Version (like Solr via CoreAware or like that), that is used as a default param to all Tokenizers/TokenFilters and when reflection-based Anaylzer subclass instantiation is used.

By the way, hibernate's reuse of Solr's schema is one argument of Hoss, not to make it CoreAware.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


sanne.grinovero at gmail

Feb 8, 2010, 2:35 AM

Post #18 of 18 (1799 views)
Permalink
Re: Having a default constructor in Analyzers [In reply to]

Hi Uwe,
yes Hibernate is definitely recommending the Solr way for normal and
power users, but we're also taking care of beginners trying it out for
the first time it should just work out of the box for a simple POC, in
those cases an Analyzer is defined as global analyzer (used for all
cases you're not overriding the default); in this case it used to be
possible to specify a single Analyzer by fully qualified name, to be
used globally, or one "per index". Of course this is far from the
flexibility needed for most real world applications, but keeps it
simple for beginners taking a first look to introducing Lucene; so for
these cases I don't care much about the Version used, of course it's
important that they later can pin it down.
To be compatible I'll have to change the loader, which is going to
look for a default constructor, or a single-parameter Version
constructor, should be good enough to accomodate the simple goal; I'll
read the Version from a configuration parameter, probably nailing down
the Version to the current latest and/or reading my own environment
parameter.

I agree about the factory strategy; in fact it's on HSEARCH-457 since
right before my emails here; I asked here to check we could keep it
simple :-)

Thanks all,
Sanne

2010/2/8 Uwe Schindler <uwe [at] thetaphi>:
> Simon:
>> Sanne, I would recommend you building a Factory pattern around you
>> Analyzers / TokenStreams similar to what solr does. That way you can
>> load you own "default ctor" interface via reflection and obtain you
>> analyzers from those factories. That makes more sense anyway as you
>> only load the factory via reflection an not the analyzers.
>
> As far as I see, Hibernate uses Solr Factories.  On the other hand, you can instead of creating your own SolrAnalyzer also use a "standard" one from Lucene (you can do this in Solr, too):
>
> http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#analyzer
>
> In my opinion, the Factory pattern is ok for own analyzer definitions. For reusing "standard" analyzers like "StandardAnalyzer" or "TurkishAnalyzer", the ideal case is to use the reflection code I proposed before. This code works for all language-based analyzers having a standard ctor or Version ctor. Solr will also handle this reflection-based instantiation with optional Version parameter in future, too (Eric Hatcher pointed that out to me, when working on SOLR-1677: "Another comment on this... Solr supports using an Analyzer also, but only ones with zero-arg constructors. It would be nice if this Version support also allowed for Analyzers (say SmartChineseAnalyzer) to be used also directly. I don't think this patch accounts for this case, does it?").
>
> As Hibernate also uses the factory pattern for custom analyzers, as soon as https://issues.apache.org/jira/browse/SOLR-1677 is in, the version problem for those should be solved, too (as you can specify the parameter to each component). But Hibernate should also think about a global default Version (like Solr via CoreAware or like that), that is used as a default param to all Tokenizers/TokenFilters and when reflection-based Anaylzer subclass instantiation is used.
>
> By the way, hibernate's reuse of Solr's schema is one argument of Hoss, not to make it CoreAware.
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.