Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

release & migration plan

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


cutting at apache

Jul 12, 2004, 10:09 AM

Post #1 of 16 (2402 views)
Permalink
release & migration plan

I think perhaps it is time to make some incompatible changes to Lucene's
API. There are a number of places where it is showing its age. I'd
like to try to make as many API changes at once as is possible, so that
folks only have to port application code once.

I propose we do this as follows:

1. Make a 1.9 release which has all the new APIs and deprecates all the
outdated APIs. Existing applications should compile and run fine, but
with lots of deprecation warnings.

2. Make a 2.0 release which removes all deprecated code.

Thus 1.9 would be a migration release. Before an application is moved
to 2.0, folks should first make sure that it compiles against 1.9
without deprecation warnings. Once it does then it should move to 2.0
without incident.

Does this sound like a good plan?

What changes would I like to see in the API? Here are a few candidates:

1. Replace Field factory methods (Field.Text, Field.Keyword, etc.) with
a few methods that use type-safe enumerations, as described in:

http://www.mail-archive.com/lucene-user [at] jakarta/msg08479.html

2. Similarly, replace BooleanQuery.add() with a type-safe enumeration,
also as described in:

http://www.mail-archive.com/lucene-user [at] jakarta/msg08479.html

3. Replace public IndexWriter fields (mergeFactor, minMergeDocs, etc.)
with get/set accessors. Also, minMergeDocs should be renamed
maxBufferedDocs.

4. Rename PhrasePrefixQuery to be something like MultiPhraseQuery. Also
make MultipleTermPositions a private nested class of this, as this is
the only place MultipleTermPositions is used.

5. Rename InputSteam to IndexInput and OutputStream to IndexOutput.
Also make both of these interfaces and add BufferedIndexInput and
BufferedIndexOutput as the implementation used by FSDirectory,
RAMDirectory, etc. This would permit unbuffered and native
implementations (e.g., that use mmap) that could potentially speed
things considerably.

6. Replace DateField with something that formats dates suitably for
RangeQuery.

7. Move language-specific analyzers into separate downloads?

8. Add support for span queries to query parser?

Do you have other candidates?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


rengels at ix

Jul 12, 2004, 10:15 AM

Post #2 of 16 (2344 views)
Permalink
RE: release & migration plan [In reply to]

I think the IndexReader and IndexWriter should be interfaces, and change the
codebase to use the interface where possible.

Robert Engels

-----Original Message-----
From: Doug Cutting [mailto:cutting [at] apache]
Sent: Monday, July 12, 2004 12:10 PM
To: Lucene Developers List
Subject: release & migration plan


I think perhaps it is time to make some incompatible changes to Lucene's
API. There are a number of places where it is showing its age. I'd
like to try to make as many API changes at once as is possible, so that
folks only have to port application code once.

I propose we do this as follows:

1. Make a 1.9 release which has all the new APIs and deprecates all the
outdated APIs. Existing applications should compile and run fine, but
with lots of deprecation warnings.

2. Make a 2.0 release which removes all deprecated code.

Thus 1.9 would be a migration release. Before an application is moved
to 2.0, folks should first make sure that it compiles against 1.9
without deprecation warnings. Once it does then it should move to 2.0
without incident.

Does this sound like a good plan?

What changes would I like to see in the API? Here are a few candidates:

1. Replace Field factory methods (Field.Text, Field.Keyword, etc.) with
a few methods that use type-safe enumerations, as described in:

http://www.mail-archive.com/lucene-user [at] jakarta/msg08479.html

2. Similarly, replace BooleanQuery.add() with a type-safe enumeration,
also as described in:

http://www.mail-archive.com/lucene-user [at] jakarta/msg08479.html

3. Replace public IndexWriter fields (mergeFactor, minMergeDocs, etc.)
with get/set accessors. Also, minMergeDocs should be renamed
maxBufferedDocs.

4. Rename PhrasePrefixQuery to be something like MultiPhraseQuery. Also
make MultipleTermPositions a private nested class of this, as this is
the only place MultipleTermPositions is used.

5. Rename InputSteam to IndexInput and OutputStream to IndexOutput.
Also make both of these interfaces and add BufferedIndexInput and
BufferedIndexOutput as the implementation used by FSDirectory,
RAMDirectory, etc. This would permit unbuffered and native
implementations (e.g., that use mmap) that could potentially speed
things considerably.

6. Replace DateField with something that formats dates suitably for
RangeQuery.

7. Move language-specific analyzers into separate downloads?

8. Add support for span queries to query parser?

Do you have other candidates?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


julien.nioche at lingway

Jul 12, 2004, 10:19 AM

Post #3 of 16 (2344 views)
Permalink
Re:release & migration plan [In reply to]

Hello Doug,

I'd like to be able to modify the indexInterval (tii file) from the IndexWriter.
I tried it recently and it proved to increase performances, especially for large
queries. Do you think there would be a reason not to give access to that parameter?

Julien

---------- Debut du message initial -----------

De : Doug Cutting <cutting [at] apache>
A : Lucene Developers List <lucene-dev [at] jakarta>
Copies :
Date : Mon, 12 Jul 2004 10:09:36 -0700
Sujet : release & migration plan

I think perhaps it is time to make some incompatible changes to Lucene's
API. There are a number of places where it is showing its age. I'd
like to try to make as many API changes at once as is possible, so that
folks only have to port application code once.

I propose we do this as follows:

1. Make a 1.9 release which has all the new APIs and deprecates all the
outdated APIs. Existing applications should compile and run fine, but
with lots of deprecation warnings.

2. Make a 2.0 release which removes all deprecated code.

Thus 1.9 would be a migration release. Before an application is moved
to 2.0, folks should first make sure that it compiles against 1.9
without deprecation warnings. Once it does then it should move to 2.0
without incident.

Does this sound like a good plan?

What changes would I like to see in the API? Here are a few candidates:

1. Replace Field factory methods (Field.Text, Field.Keyword, etc.) with
a few methods that use type-safe enumerations, as described in:

http://www.mail-archive.com/lucene-user [at] jakarta/msg08479.html

2. Similarly, replace BooleanQuery.add() with a type-safe enumeration,
also as described in:

http://www.mail-archive.com/lucene-user [at] jakarta/msg08479.html

3. Replace public IndexWriter fields (mergeFactor, minMergeDocs, etc.)
with get/set accessors. Also, minMergeDocs should be renamed
maxBufferedDocs.

4. Rename PhrasePrefixQuery to be something like MultiPhraseQuery. Also
make MultipleTermPositions a private nested class of this, as this is
the only place MultipleTermPositions is used.

5. Rename InputSteam to IndexInput and OutputStream to IndexOutput.
Also make both of these interfaces and add BufferedIndexInput and
BufferedIndexOutput as the implementation used by FSDirectory,
RAMDirectory, etc. This would permit unbuffered and native
implementations (e.g., that use mmap) that could potentially speed
things considerably.

6. Replace DateField with something that formats dates suitably for
RangeQuery.

7. Move language-specific analyzers into separate downloads?

8. Add support for span queries to query parser?

Do you have other candidates?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


jons at wrq

Jul 12, 2004, 10:33 AM

Post #4 of 16 (2361 views)
Permalink
RE: release & migration plan [In reply to]

Doug Cutting wrote:

> 3. Replace public IndexWriter fields (mergeFactor, minMergeDocs, etc.)
> with get/set accessors. Also, minMergeDocs should be renamed
> maxBufferedDocs.

I'd also suggest changing the static field maxClauseCount in BooleanQuery to
use a getter/setter and eliminate the System.getProperty call (as in
IndexWriter). The System.getProperty calls prevent the use of Lucene as a
search engine in an unsigned applet.

Thanks,
Jon


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


cutting at apache

Jul 12, 2004, 10:34 AM

Post #5 of 16 (2365 views)
Permalink
Re: release & migration plan [In reply to]

fp235-5 wrote:
> I'd like to be able to modify the indexInterval (tii file) from the IndexWriter.
> I tried it recently and it proved to increase performances, especially for large
> queries. Do you think there would be a reason not to give access to that parameter?

I agree. We should add this, but with caution. I haven't added this in
the past because I feared that, like mergeFactor, it is rife for abuse,
that folks will set it to very large or small values and then loudly
proclaim that Lucene is broken.

If we make it easily changeable then we should clearly document (a) that
one should change it with caution; (b) what reasonable values are; and
(c) what the effects are of setting it high (low memory use, somewhat
slower search, faster IndexReader creation) and low (high memory use,
somewhat faster search, slower IndexReader creation). A formula for
memory use and IndexReader creation speed should be included
(proportional to total number of terms) and a description of impact on
search performance (constant factor per query term, regardless of index
size: greater impact for small indexes, less for large indexes).

In particular, we want to discourage folks with very large indexes or
who re-open indexes frequently from setting this to small values.

Would you like to draft the javadoc and make a patch file for this
change? This would not be an incompatible change and could be done at
any time.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


cutting at apache

Jul 12, 2004, 10:44 AM

Post #6 of 16 (2372 views)
Permalink
Re: release & migration plan [In reply to]

Robert Engels wrote:
> I think the IndexReader and IndexWriter should be interfaces, and change the
> codebase to use the interface where possible.

I agree that IndexReader should be an interface.

I'm less convinced about IndexWriter. I have a little harder time
imagining alternate, pluggable implementations of IndexWriter. Perhaps
one would want to write something which uses a different algorithm for
merging indexes? So then the interface would be addDocument(),
addIndexes(), optimize(), close(), get/setSimilarity and getAnalyzer(),
with the rest of the methods, especially all the new accessors
(getMergeFactor(), getMaxBufferedDocs, etc.) as implementation-specific?
Is that what you have in mind?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


rengels at ix

Jul 12, 2004, 10:55 AM

Post #7 of 16 (2373 views)
Permalink
RE: release & migration plan [In reply to]

Yes. I think if IndexReader is an interface (which might use a completely
different physical storage mechanism then Lucene uses), IndexWriter needs to
be an interface as well (in order to get the documents into the index).

By making the Reader and Writer interfaces, implementations can still use
the Lucene search capabilities, tokenizers, remote searches, etc.

Robert

-----Original Message-----
From: Doug Cutting [mailto:cutting [at] apache]
Sent: Monday, July 12, 2004 12:45 PM
To: Lucene Developers List
Subject: Re: release & migration plan


Robert Engels wrote:
> I think the IndexReader and IndexWriter should be interfaces, and change
the
> codebase to use the interface where possible.

I agree that IndexReader should be an interface.

I'm less convinced about IndexWriter. I have a little harder time
imagining alternate, pluggable implementations of IndexWriter. Perhaps
one would want to write something which uses a different algorithm for
merging indexes? So then the interface would be addDocument(),
addIndexes(), optimize(), close(), get/setSimilarity and getAnalyzer(),
with the rest of the methods, especially all the new accessors
(getMergeFactor(), getMaxBufferedDocs, etc.) as implementation-specific?
Is that what you have in mind?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


vauchers at cirano

Jul 12, 2004, 11:52 AM

Post #8 of 16 (2353 views)
Permalink
RE: release & migration plan [In reply to]

I don't remember what was decided for exception handling. What I do
remember is a discussion about the BooleanQuery$TooManyClauses where
option 3 was popular for the "next big release" ;)

http://www.mail-archive.com/lucene-dev [at] jakarta/msg04050.html

sv

On Mon, 12 Jul 2004, Jon Schuster wrote:

> Doug Cutting wrote:
>
> > 3. Replace public IndexWriter fields (mergeFactor, minMergeDocs, etc.)
> > with get/set accessors. Also, minMergeDocs should be renamed
> > maxBufferedDocs.
>
> I'd also suggest changing the static field maxClauseCount in BooleanQuery to
> use a getter/setter and eliminate the System.getProperty call (as in
> IndexWriter). The System.getProperty calls prevent the use of Lucene as a
> search engine in an unsigned applet.
>
> Thanks,
> Jon
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-dev-help [at] jakarta
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


max at osua

Jul 13, 2004, 5:14 AM

Post #9 of 16 (2351 views)
Permalink
Re: release & migration plan [In reply to]

Hello Doug.

There are a lot of Lucene classes still use Vector & Hashtable instead
of ArrayList and HashMap because of compatibility reason with java 1.

Since the changes, proposed and made by Aviran to FieldInfos class
made Lucene java 1 incompatible, but can give us some reasonable
performance gain, shouldn't we go ahead with the whole Hashtable ->
HashMap and Vector -> ArrayList replacement arround the code to have
even more performance in other places?

Max

DC> I think perhaps it is time to make some incompatible changes to Lucene's
DC> API. There are a number of places where it is showing its age. I'd
DC> like to try to make as many API changes at once as is possible, so that
DC> folks only have to port application code once.

DC> I propose we do this as follows:

DC> 1. Make a 1.9 release which has all the new APIs and deprecates all the
DC> outdated APIs. Existing applications should compile and run fine, but
DC> with lots of deprecation warnings.

DC> 2. Make a 2.0 release which removes all deprecated code.

DC> Thus 1.9 would be a migration release. Before an application is moved
DC> to 2.0, folks should first make sure that it compiles against 1.9
DC> without deprecation warnings. Once it does then it should move to 2.0
DC> without incident.

DC> Does this sound like a good plan?

DC> What changes would I like to see in the API? Here are a few candidates:

DC> 1. Replace Field factory methods (Field.Text, Field.Keyword, etc.) with
DC> a few methods that use type-safe enumerations, as described in:

DC> http://www.mail-archive.com/lucene-user [at] jakarta/msg08479.html

DC> 2. Similarly, replace BooleanQuery.add() with a type-safe enumeration,
DC> also as described in:

DC> http://www.mail-archive.com/lucene-user [at] jakarta/msg08479.html

DC> 3. Replace public IndexWriter fields (mergeFactor, minMergeDocs, etc.)
DC> with get/set accessors. Also, minMergeDocs should be renamed
DC> maxBufferedDocs.

DC> 4. Rename PhrasePrefixQuery to be something like MultiPhraseQuery. Also
DC> make MultipleTermPositions a private nested class of this, as this is
DC> the only place MultipleTermPositions is used.

DC> 5. Rename InputSteam to IndexInput and OutputStream to IndexOutput.
DC> Also make both of these interfaces and add BufferedIndexInput and
DC> BufferedIndexOutput as the implementation used by FSDirectory,
DC> RAMDirectory, etc. This would permit unbuffered and native
DC> implementations (e.g., that use mmap) that could potentially speed
DC> things considerably.

DC> 6. Replace DateField with something that formats dates suitably for
DC> RangeQuery.

DC> 7. Move language-specific analyzers into separate downloads?

DC> 8. Add support for span queries to query parser?

DC> Do you have other candidates?

DC> Doug

DC> ---------------------------------------------------------------------
DC> To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
DC> For additional commands, e-mail: lucene-dev-help [at] jakarta


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


Julien.Nioche at lingway

Jul 13, 2004, 5:22 AM

Post #10 of 16 (2359 views)
Permalink
Re: release & migration plan [In reply to]

Of course I'd be pleased to make a draft of the javadoc and a patch file.
I'll try to do it, but I can't promise to deliver it soon....

Julien
----- Original Message -----
From: "Doug Cutting" <cutting [at] apache>
To: "Lucene Developers List" <lucene-dev [at] jakarta>
Sent: Monday, July 12, 2004 7:34 PM
Subject: Re: release & migration plan


> fp235-5 wrote:
> > I'd like to be able to modify the indexInterval (tii file) from the
IndexWriter.
> > I tried it recently and it proved to increase performances, especially
for large
> > queries. Do you think there would be a reason not to give access to that
parameter?
>
> I agree. We should add this, but with caution. I haven't added this in
> the past because I feared that, like mergeFactor, it is rife for abuse,
> that folks will set it to very large or small values and then loudly
> proclaim that Lucene is broken.
>
> If we make it easily changeable then we should clearly document (a) that
> one should change it with caution; (b) what reasonable values are; and
> (c) what the effects are of setting it high (low memory use, somewhat
> slower search, faster IndexReader creation) and low (high memory use,
> somewhat faster search, slower IndexReader creation). A formula for
> memory use and IndexReader creation speed should be included
> (proportional to total number of terms) and a description of impact on
> search performance (constant factor per query term, regardless of index
> size: greater impact for small indexes, less for large indexes).
>
> In particular, we want to discourage folks with very large indexes or
> who re-open indexes frequently from setting this to small values.
>
> Would you like to draft the javadoc and make a patch file for this
> change? This would not be an incompatible change and could be done at
> any time.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-dev-help [at] jakarta
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


byronm at gmail

Jul 13, 2004, 5:50 AM

Post #11 of 16 (2361 views)
Permalink
Re: release & migration plan [In reply to]

I agree. It would be nice to finally migrate to java2 code and work
on the optimizations made available in the later jvm's.

On Tue, 13 Jul 2004 15:14:56 +0300, Maxim Patramanskij <max [at] osua> wrote:
> Hello Doug.
>
> There are a lot of Lucene classes still use Vector & Hashtable instead
> of ArrayList and HashMap because of compatibility reason with java 1.
>
> Since the changes, proposed and made by Aviran to FieldInfos class
> made Lucene java 1 incompatible, but can give us some reasonable
> performance gain, shouldn't we go ahead with the whole Hashtable ->
> HashMap and Vector -> ArrayList replacement arround the code to have
> even more performance in other places?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


cutting at apache

Jul 13, 2004, 9:18 AM

Post #12 of 16 (2365 views)
Permalink
Re: release & migration plan [In reply to]

Maxim Patramanskij wrote:
> Since the changes, proposed and made by Aviran to FieldInfos class
> made Lucene java 1 incompatible, but can give us some reasonable
> performance gain, shouldn't we go ahead with the whole Hashtable ->
> HashMap and Vector -> ArrayList replacement arround the code to have
> even more performance in other places?

This cannot be done blindly. Thread safety needs to be carefully
considered in each case. For example, if, when used by an IndexReader,
the Hashtable or Vector is only modified under a constructor, and
treated as read-only subsequently, then it is probably safe to make this
conversion. But if it is modified while searching, then access must be
synchronized. IndexWriter is less-frequently used in a multi-threaded
manner, but is also written to be thread-safe, so uses in IndexWriter
need to be closely examined too. And so on.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


goller at detego-software

Jul 14, 2004, 7:21 AM

Post #13 of 16 (2377 views)
Permalink
Re: release & migration plan [In reply to]

A couple of weeks ago Bernhard proposed a patch concerning
the version number of an index. He proposed to initialize
it with the current time in ms. This simple change solves
a problem when an index gets deleted a new index is generated
in the same directory, and an old IndexReader is still existing.
But version number is not a version number any more.
We had a discussion with Dmitry about it and then we proposed
an API change:

*) Keep Bernhard´s initialization of version number
*) Remove (deprecate) static IndexReader.getCurrentVersion() methods
and maybe also lastModified() methods.
*) Introduce a new public non-static IndexReader member function
boolean isCurrent() that is similar to the current aquireLock and checks
whether the IndexReader is still stale ot not.

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


erik at ehatchersolutions

Jul 15, 2004, 5:55 AM

Post #14 of 16 (2367 views)
Permalink
Re: release & migration plan [In reply to]

I have placed Doug's original list on the wiki at
http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard

Perhaps the wiki makes the best "whiteboard" for Lucene 2.0
brainstorming.

Erik


On Jul 12, 2004, at 1:09 PM, Doug Cutting wrote:

> I think perhaps it is time to make some incompatible changes to
> Lucene's API. There are a number of places where it is showing its
> age. I'd like to try to make as many API changes at once as is
> possible, so that folks only have to port application code once.
>
> I propose we do this as follows:
>
> 1. Make a 1.9 release which has all the new APIs and deprecates all
> the outdated APIs. Existing applications should compile and run fine,
> but with lots of deprecation warnings.
>
> 2. Make a 2.0 release which removes all deprecated code.
>
> Thus 1.9 would be a migration release. Before an application is moved
> to 2.0, folks should first make sure that it compiles against 1.9
> without deprecation warnings. Once it does then it should move to 2.0
> without incident.
>
> Does this sound like a good plan?
>
> What changes would I like to see in the API? Here are a few
> candidates:
>
> 1. Replace Field factory methods (Field.Text, Field.Keyword, etc.)
> with a few methods that use type-safe enumerations, as described in:
>
> http://www.mail-archive.com/lucene-user [at] jakarta/
> msg08479.html
>
> 2. Similarly, replace BooleanQuery.add() with a type-safe enumeration,
> also as described in:
>
> http://www.mail-archive.com/lucene-user [at] jakarta/
> msg08479.html
>
> 3. Replace public IndexWriter fields (mergeFactor, minMergeDocs, etc.)
> with get/set accessors. Also, minMergeDocs should be renamed
> maxBufferedDocs.
>
> 4. Rename PhrasePrefixQuery to be something like MultiPhraseQuery.
> Also make MultipleTermPositions a private nested class of this, as
> this is the only place MultipleTermPositions is used.
>
> 5. Rename InputSteam to IndexInput and OutputStream to IndexOutput.
> Also make both of these interfaces and add BufferedIndexInput and
> BufferedIndexOutput as the implementation used by FSDirectory,
> RAMDirectory, etc. This would permit unbuffered and native
> implementations (e.g., that use mmap) that could potentially speed
> things considerably.
>
> 6. Replace DateField with something that formats dates suitably for
> RangeQuery.
>
> 7. Move language-specific analyzers into separate downloads?
>
> 8. Add support for span queries to query parser?
>
> Do you have other candidates?
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-dev-help [at] jakarta


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


Julien.Nioche at lingway

Jul 20, 2004, 1:23 AM

Post #15 of 16 (2361 views)
Permalink
Re: release & migration plan [In reply to]

DocumentWriter is typically created with the
ramDirectory field of IndexWriter and not the actual directory field.
So getDirectory() should return this ramDirectory in order to work,
which is not very intuitive (one could expect it to return the real
directory). One could change the visibility of ramDirectory to package
so that the DocumentWriter could access it??? Is it a clean way to do?


----- Original Message -----
From: "Doug Cutting" <cutting [at] apache>
To: "Lucene Users List" <lucene-user [at] jakarta>
Sent: Thursday, July 15, 2004 11:06 PM
Subject: Re: release & migration plan


> fp235-5 wrote:
> > I am looking at the code to implement setIndexInterval() in IndexWriter.
I'd
> > like to have your opinion on the best way to do it.
> >
> > Currently the creation of an instance of TermInfosWriter requires the
following
> > steps:
> > ...
> > IndexWriter.addDocument(Document)
> > IndexWriter.addDocument(Document, Analyser)
> > DocumentWriter.addDocument(String, Document)
> > DocumentWriter.writePostings(Posting[],String)
> > TermInfosWriter.<init>
> >
> > To give a different value to indexInterval in TermInfosWriter, we need
to add a
> > variable holding this value into IndexWriter and DocumentWriter and
modify the
> > constructors for DocumentWriter and TermInfosWriter. (quite heavy
changes)
>
> I think this is the best approach. I would replace other parameters in
> these constructors which can be derived from an IndexWriter with the
> IndexWriter. That way, if we add more parameters like this, they can
> also be passed in through the IndexWriter.
>
> All of the parameters to the DocumentWriter constructor are fields of
> IndexWriter. So one can instead simply pass a single parameter, an
> IndexWriter, then access its directory, analyzer, similarity and
> maxFieldLength in the DocumentWriter constructor. A public
> getDirectory() method would also need to be added to IndexWriter for
> this to work.
>
> Similarly, two of SegmentMerger's constructor parameters could be
> replaced with an IndexWriter, the directory and boolean useCompoundFile.
>
> In SegmentMerge I would replace the directory parameter with IndexWriter.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-user-help [at] jakarta
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta


cutting at apache

Jul 20, 2004, 11:08 AM

Post #16 of 16 (2358 views)
Permalink
Re: release & migration plan [In reply to]

For the purposes of this change, the DocumentWriter directory doesn't
actually matter. A persistent index is only written by the segment
merger, so that's the only place the indexInterval really needs to be
specified.

Doug

Julien Nioche wrote:
> DocumentWriter is typically created with the
> ramDirectory field of IndexWriter and not the actual directory field.
> So getDirectory() should return this ramDirectory in order to work,
> which is not very intuitive (one could expect it to return the real
> directory). One could change the visibility of ramDirectory to package
> so that the DocumentWriter could access it??? Is it a clean way to do?
>
>
> ----- Original Message -----
> From: "Doug Cutting" <cutting [at] apache>
> To: "Lucene Users List" <lucene-user [at] jakarta>
> Sent: Thursday, July 15, 2004 11:06 PM
> Subject: Re: release & migration plan
>
>
>
>>fp235-5 wrote:
>>
>>>I am looking at the code to implement setIndexInterval() in IndexWriter.
>
> I'd
>
>>>like to have your opinion on the best way to do it.
>>>
>>>Currently the creation of an instance of TermInfosWriter requires the
>
> following
>
>>>steps:
>>>...
>>>IndexWriter.addDocument(Document)
>>>IndexWriter.addDocument(Document, Analyser)
>>>DocumentWriter.addDocument(String, Document)
>>>DocumentWriter.writePostings(Posting[],String)
>>>TermInfosWriter.<init>
>>>
>>>To give a different value to indexInterval in TermInfosWriter, we need
>
> to add a
>
>>>variable holding this value into IndexWriter and DocumentWriter and
>
> modify the
>
>>>constructors for DocumentWriter and TermInfosWriter. (quite heavy
>
> changes)
>
>>I think this is the best approach. I would replace other parameters in
>>these constructors which can be derived from an IndexWriter with the
>>IndexWriter. That way, if we add more parameters like this, they can
>>also be passed in through the IndexWriter.
>>
>>All of the parameters to the DocumentWriter constructor are fields of
>>IndexWriter. So one can instead simply pass a single parameter, an
>>IndexWriter, then access its directory, analyzer, similarity and
>>maxFieldLength in the DocumentWriter constructor. A public
>>getDirectory() method would also need to be added to IndexWriter for
>>this to work.
>>
>>Similarly, two of SegmentMerger's constructor parameters could be
>>replaced with an IndexWriter, the directory and boolean useCompoundFile.
>>
>>In SegmentMerge I would replace the directory parameter with IndexWriter.
>>
>>Doug
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
>>For additional commands, e-mail: lucene-user-help [at] jakarta
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-dev-help [at] jakarta
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-dev-help [at] jakarta

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.