Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

IndexWriter forceOptimize() ?

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


otis_gospodnetic at yahoo

Jan 11, 2007, 6:25 AM

Post #1 of 14 (816 views)
Permalink
IndexWriter forceOptimize() ?

Hi,

What do people here think about adding forceOptimize() to IndexWriter?

public synchronized void forceOptimize() throws IOException {
flushRamSegments();
int minSegment = segmentInfos.size() - mergeFactor;
mergeSegments(minSegment < 0 ? 0 : minSegment);
}

I need it for https://issues.apache.org/jira/browse/LUCENE-741 (Field Norms Modifier), which I wrote to work with multi-file indices, which means that if there are any CFS index files in an index, I need to expand those first. Is there a better way to extract a CFS file in an index that may also contain some non-CFS segments? There is a CfsExtractor tool in https://issues.apache.org/jira/browse/LUCENE-770 now, maybe that will do for LUCENE-741.... haven't tried it yet.

Otis




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


DORONC at il

Jan 11, 2007, 9:16 AM

Post #2 of 14 (794 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

Otis Gospodnetic <otis_gospodnetic [at] yahoo> wrote on 11/01/2007 06:25:59:

> Hi,
>
> What do people here think about adding forceOptimize() to IndexWriter?
>
> public synchronized void forceOptimize() throws IOException {
> flushRamSegments();
> int minSegment = segmentInfos.size() - mergeFactor;
> mergeSegments(minSegment < 0 ? 0 : minSegment);
> }
>
> I need it for https://issues.apache.org/jira/browse/LUCENE-741
> (Field Norms Modifier), which I wrote to work with multi-file
> indices, which means that if there are any CFS index files in an
> index, I need to expand those first. Is there a better way to
> extract a CFS file in an index that may also contain some non-CFS
> segments? There is a CfsExtractor tool in https://issues.apache.
> org/jira/browse/LUCENE-770 now, maybe that will do for
> LUCENE-741.... haven't tried it yet.
>
> Otis

I think one (non-performant) external way to move an index from CFS to
non-CFS is:
1. open in non-CFS mode
2. add one (empty) doc
3. optimize
4. (optionally) remove last doc and (optionally) optimize again


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


otis_gospodnetic at yahoo

Jan 11, 2007, 9:30 AM

Post #3 of 14 (800 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

Hi Doron,

Yeah, you are right, adding that (empty) Doc would force the optimize to actually optimize. I was trying to avoid doing that and forceOptimize() looked cleaner.... but I'm not sure if others would agree. Are there other situations where one would want to force index optimization even if none of those conditions in optimize() are true?

I'd actually appreciate it if you could look at https://issues.apache.org/jira/browse/LUCENE-741 . The code can completely remove norms for a given field, but this assumes a pre-.nrm index structure (.fN field norms files). I'm not sure yet how to deal with .nrm, so if you have a quick solution to plug into the code in LUCENE-741, that would be great.

Thanks,
Otis

----- Original Message ----
From: Doron Cohen <DORONC [at] il>
To: java-dev [at] lucene
Sent: Thursday, January 11, 2007 12:16:35 PM
Subject: Re: IndexWriter forceOptimize() ?

Otis Gospodnetic <otis_gospodnetic [at] yahoo> wrote on 11/01/2007 06:25:59:

> Hi,
>
> What do people here think about adding forceOptimize() to IndexWriter?
>
> public synchronized void forceOptimize() throws IOException {
> flushRamSegments();
> int minSegment = segmentInfos.size() - mergeFactor;
> mergeSegments(minSegment < 0 ? 0 : minSegment);
> }
>
> I need it for https://issues.apache.org/jira/browse/LUCENE-741
> (Field Norms Modifier), which I wrote to work with multi-file
> indices, which means that if there are any CFS index files in an
> index, I need to expand those first. Is there a better way to
> extract a CFS file in an index that may also contain some non-CFS
> segments? There is a CfsExtractor tool in https://issues.apache.
> org/jira/browse/LUCENE-770 now, maybe that will do for
> LUCENE-741.... haven't tried it yet.
>
> Otis

I think one (non-performant) external way to move an index from CFS to
non-CFS is:
1. open in non-CFS mode
2. add one (empty) doc
3. optimize
4. (optionally) remove last doc and (optionally) optimize again


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 11, 2007, 12:47 PM

Post #4 of 14 (783 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

: What do people here think about adding forceOptimize() to IndexWriter?

I like the idea, but i don't have any value add to offer to the discussion
of wether the implimentation you suggest is "safe" ... in particular i
notice that the current optimize method is an iterative loop, presumably
to make surethat mergeSegments gets called as many times as it needs to
based on segmentInfos.size() .. your version doesn't seem to have that, so
does that mean your new version wouldn't allways result in a single
segment?

another suggestin i have is with the API ... instead of calling it
"forceOptimize" perhaps the current noarg optimize method should be
deprecated, and replaced with a new optimize(boolean force) where
force==true means an optimize will be done, and force==false means an
optimize will be done if the IndexWriter feels it should be done ... this
would also address my above concern (assuming it's valid)...

@deprecated use optimize(false)
public synchronized void optimize() throws IOException { optimize(false); }
public synchronized void optimize(boolean force) throws IOException {
flushRamSegments();
while (force ||
(segmentInfos.size() > 1 ||
(segmentInfos.size() == 1 &&
(SegmentReader.hasDeletions(segmentInfos.info(0)) ||
segmentInfos.info(0).dir != directory ||
(useCompoundFile &&
(!SegmentReader.usesCompoundFile(segmentInfos.info(0)) ||
SegmentReader.hasSeparateNorms(segmentInfos.info(0)))))))) {
int minSegment = segmentInfos.size() - mergeFactor;
mergeSegments(segmentInfos, minSegment < 0 ? 0 : minSegment, segmentInfos.size());
}
}


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Jan 11, 2007, 12:55 PM

Post #5 of 14 (783 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

I agree with the boolean addition.

optimize(false) is a request to maybe optimize, optimize(true) always
should optimize to a single segment

optimize(false) might check some parameter as to the maximum number
of segments allowed before an actual optimize if performed.


On Jan 11, 2007, at 2:47 PM, Chris Hostetter wrote:

>
> : What do people here think about adding forceOptimize() to
> IndexWriter?
>
> I like the idea, but i don't have any value add to offer to the
> discussion
> of wether the implimentation you suggest is "safe" ... in particular i
> notice that the current optimize method is an iterative loop,
> presumably
> to make surethat mergeSegments gets called as many times as it
> needs to
> based on segmentInfos.size() .. your version doesn't seem to have
> that, so
> does that mean your new version wouldn't allways result in a single
> segment?
>
> another suggestin i have is with the API ... instead of calling it
> "forceOptimize" perhaps the current noarg optimize method should be
> deprecated, and replaced with a new optimize(boolean force) where
> force==true means an optimize will be done, and force==false means an
> optimize will be done if the IndexWriter feels it should be
> done ... this
> would also address my above concern (assuming it's valid)...
>
> @deprecated use optimize(false)
> public synchronized void optimize() throws IOException { optimize
> (false); }
> public synchronized void optimize(boolean force) throws IOException {
> flushRamSegments();
> while (force ||
> (segmentInfos.size() > 1 ||
> (segmentInfos.size() == 1 &&
> (SegmentReader.hasDeletions(segmentInfos.info(0)) ||
> segmentInfos.info(0).dir != directory ||
> (useCompoundFile &&
> (!SegmentReader.usesCompoundFile(segmentInfos.info(0)) ||
> SegmentReader.hasSeparateNorms(segmentInfos.info
> (0)))))))) {
> int minSegment = segmentInfos.size() - mergeFactor;
> mergeSegments(segmentInfos, minSegment < 0 ? 0 : minSegment,
> segmentInfos.size());
> }
> }
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 11, 2007, 1:05 PM

Post #6 of 14 (807 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

: optimize(false) is a request to maybe optimize, optimize(true) always
: should optimize to a single segment
:
: optimize(false) might check some parameter as to the maximum number
: of segments allowed before an actual optimize if performed.

maybe it should be optimize(int minSegmentCountToSkip), with
optimize(0) forcing an optimize even if there is only 1 segment, and
optimize() remaining undeprecated and using a "sensible default" (whatever
that may be ... 1 perhaps?)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


DORONC at il

Jan 11, 2007, 1:27 PM

Post #7 of 14 (796 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

Otis Gospodnetic <otis_gospodnetic [at] yahoo> wrote on 11/01/2007 09:30:08:
>
> I'd actually appreciate it if you could look at https://issues.
> apache.org/jira/browse/LUCENE-741 . The code can completely remove
> norms for a given field, but this assumes a pre-.nrm index structure
> (.fN field norms files). I'm not sure yet how to deal with .nrm, so
> if you have a quick solution to plug into the code in LUCENE-741,
> that would be great.

Okay, sure, see new comments in lucene-741

>
> Thanks,
> Otis


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yonik at apache

Jan 11, 2007, 2:08 PM

Post #8 of 14 (789 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

On 1/11/07, Chris Hostetter <hossman_lucene [at] fucit> wrote:
> maybe it should be optimize(int minSegmentCountToSkip), with
> optimize(0) forcing an optimize even if there is only 1 segment, and
> optimize() remaining undeprecated and using a "sensible default" (whatever
> that may be ... 1 perhaps?)

If we are going to expose that there are multiple segments, I wouldn't
mind a direct API to get the number of segments (for IndexReader and
IndexWriter).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


otis_gospodnetic at yahoo

Jan 11, 2007, 8:08 PM

Post #9 of 14 (781 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

Yeah, I actually had:

public int segments() { return segmentInfos.size(); }

in my IndexReader, but then erased it precisely because I thought this was exposing too much about the impl.
I think optimize(int) that Chris mentioned exposes too much. I thought about having optimize(boolean force) in place of optimize(), but then we'd have to deprecate, so I opted for forceOptimize() that, I feel exposes a little less.
But I'm looking to hear what others think before committing LUCENE-741, which includes this forceOptimize() addition.

Otis

----- Original Message ----
From: Yonik Seeley <yonik [at] apache>
To: java-dev [at] lucene
Sent: Thursday, January 11, 2007 5:08:26 PM
Subject: Re: IndexWriter forceOptimize() ?

On 1/11/07, Chris Hostetter <hossman_lucene [at] fucit> wrote:
> maybe it should be optimize(int minSegmentCountToSkip), with
> optimize(0) forcing an optimize even if there is only 1 segment, and
> optimize() remaining undeprecated and using a "sensible default" (whatever
> that may be ... 1 perhaps?)

If we are going to expose that there are multiple segments, I wouldn't
mind a direct API to get the number of segments (for IndexReader and
IndexWriter).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


otis_gospodnetic at yahoo

Jan 11, 2007, 8:17 PM

Post #10 of 14 (798 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

Doron,

Maybe my browser is misbehaving, but I don't see your comments in http://issues.apache.org/jira/browse/LUCENE-741 . Didn't see the JIRA email with them either...

Otis

----- Original Message ----
From: Doron Cohen <DORONC [at] il>
To: java-dev [at] lucene
Sent: Thursday, January 11, 2007 4:27:49 PM
Subject: Re: IndexWriter forceOptimize() ?

Otis Gospodnetic <otis_gospodnetic [at] yahoo> wrote on 11/01/2007 09:30:08:
>
> I'd actually appreciate it if you could look at https://issues.
> apache.org/jira/browse/LUCENE-741 . The code can completely remove
> norms for a given field, but this assumes a pre-.nrm index structure
> (.fN field norms files). I'm not sure yet how to deal with .nrm, so
> if you have a quick solution to plug into the code in LUCENE-741,
> that would be great.

Okay, sure, see new comments in lucene-741

>
> Thanks,
> Otis


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 11, 2007, 8:54 PM

Post #11 of 14 (799 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

: I think optimize(int) that Chris mentioned exposes too much. I thought
: about having optimize(boolean force) in place of optimize(), but then
: we'd have to deprecate, so I opted for forceOptimize() that, I feel
: exposes a little less.

i have no strong feelings about exposing the number of segments, or having
optimize(int) ... but i would prefer optimize(boolean) over forceOptimize
.. because it saves apps (like Solr) from needing to have code like this
to drive their behavior...

if (someBooleanValue) {
writer.forceOptimize();
} else {
writer.optimize();
}



----- Original Message ----
From: Yonik Seeley <yonik [at] apache>
To: java-dev [at] lucene
Sent: Thursday, January 11, 2007 5:08:26 PM
Subject: Re: IndexWriter forceOptimize() ?

On 1/11/07, Chris Hostetter <hossman_lucene [at] fucit> wrote:
> maybe it should be optimize(int minSegmentCountToSkip), with
> optimize(0) forcing an optimize even if there is only 1 segment, and
> optimize() remaining undeprecated and using a "sensible default" (whatever
> that may be ... 1 perhaps?)

If we are going to expose that there are multiple segments, I wouldn't
mind a direct API to get the number of segments (for IndexReader and
IndexWriter).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene




:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
: For additional commands, e-mail: java-dev-help [at] lucene
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


otis_gospodnetic at yahoo

Jan 11, 2007, 9:14 PM

Post #12 of 14 (784 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

One day I read email in a different order, I miss replies like this.
If optimize(boolean force) looks more attractive than optimizeForce(), that's fine by me. I just want to be able to force the cfs index, even if it's already optimized, to expand. Getting it to have a single segment is just a nice bonus here for me.

Regarding that while loop.... it looks like iteration is not needed to force reoptimization. I've tested it with CFS and non-CFS indices, with optimized and unoptimized indices, with and without deletions, and after forced optimization I always ended up with a single segment:

sis = new SegmentInfos();
sis.read(dir);
System.out.println("SEGS: " + sis.size());

If nobody speaks up until the weekend, I'll add optimize(boolean force). We can leave optimize() and make it call optimize(false);

Otis

----- Original Message ----
From: robert engels <rengels [at] ix>
To: java-dev [at] lucene
Sent: Thursday, January 11, 2007 3:55:29 PM
Subject: Re: IndexWriter forceOptimize() ?

I agree with the boolean addition.

optimize(false) is a request to maybe optimize, optimize(true) always
should optimize to a single segment

optimize(false) might check some parameter as to the maximum number
of segments allowed before an actual optimize if performed.


On Jan 11, 2007, at 2:47 PM, Chris Hostetter wrote:

>
> : What do people here think about adding forceOptimize() to
> IndexWriter?
>
> I like the idea, but i don't have any value add to offer to the
> discussion
> of wether the implimentation you suggest is "safe" ... in particular i
> notice that the current optimize method is an iterative loop,
> presumably
> to make surethat mergeSegments gets called as many times as it
> needs to
> based on segmentInfos.size() .. your version doesn't seem to have
> that, so
> does that mean your new version wouldn't allways result in a single
> segment?
>
> another suggestin i have is with the API ... instead of calling it
> "forceOptimize" perhaps the current noarg optimize method should be
> deprecated, and replaced with a new optimize(boolean force) where
> force==true means an optimize will be done, and force==false means an
> optimize will be done if the IndexWriter feels it should be
> done ... this
> would also address my above concern (assuming it's valid)...
>
> @deprecated use optimize(false)
> public synchronized void optimize() throws IOException { optimize
> (false); }
> public synchronized void optimize(boolean force) throws IOException {
> flushRamSegments();
> while (force ||
> (segmentInfos.size() > 1 ||
> (segmentInfos.size() == 1 &&
> (SegmentReader.hasDeletions(segmentInfos.info(0)) ||
> segmentInfos.info(0).dir != directory ||
> (useCompoundFile &&
> (!SegmentReader.usesCompoundFile(segmentInfos.info(0)) ||
> SegmentReader.hasSeparateNorms(segmentInfos.info
> (0)))))))) {
> int minSegment = segmentInfos.size() - mergeFactor;
> mergeSegments(segmentInfos, minSegment < 0 ? 0 : minSegment,
> segmentInfos.size());
> }
> }
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


DORONC at il

Jan 11, 2007, 10:40 PM

Post #13 of 14 (796 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

Otis Gospodnetic <otis_gospodnetic [at] yahoo> wrote on 11/01/2007 20:17:31:

> Doron,
>
> Maybe my browser is misbehaving, but I don't see your comments in
> http://issues.apache.org/jira/browse/LUCENE-741 . Didn't see the
> JIRA email with them either...
>
> Otis

Otis, your browser is perfect, just that I was distracted with stg else...
it is there now!


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yonik at apache

Jan 12, 2007, 10:04 AM

Post #14 of 14 (795 views)
Permalink
Re: IndexWriter forceOptimize() ? [In reply to]

On 1/11/07, Otis Gospodnetic <otis_gospodnetic [at] yahoo> wrote:
> Yeah, I actually had:
>
> public int segments() { return segmentInfos.size(); }
>
> in my IndexReader, but then erased it precisely because I thought this was exposing too much about the impl.

That was my first instinct, but then again, we do expose mergeFactor
and maxMergeDocs, both of which make no sense w/o understanding the
underlying merge model and the fact that there are multiple segments.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.