Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Deprecating IndexModifier

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


ning.li.li at gmail

Aug 7, 2007, 12:37 PM

Post #1 of 13 (2219 views)
Permalink
Deprecating IndexModifier

With the plan towards 3.0 release laid out, I think it's a good time
to deprecate IndexModifier and eventually remove IndexModifier.

The only method in IndexModifier which is not implemented in
IndexWriter is "deleteDocument(int doc)". This is because of the
concern that document ids are changing as documents are deleted and
segments are merged. Should we add "deleteDocument(int doc)" to
IndexWriter but comment that it is an expert method, and then
deprecate IndexModifier and eventually remove IndexModifier?

Cheers,
Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


gsingers at apache

Aug 7, 2007, 3:30 PM

Post #2 of 13 (2185 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

+1


On Aug 7, 2007, at 3:37 PM, Ning Li wrote:

> With the plan towards 3.0 release laid out, I think it's a good time
> to deprecate IndexModifier and eventually remove IndexModifier.
>
> The only method in IndexModifier which is not implemented in
> IndexWriter is "deleteDocument(int doc)". This is because of the
> concern that document ids are changing as documents are deleted and
> segments are merged. Should we add "deleteDocument(int doc)" to
> IndexWriter but comment that it is an expert method, and then
> deprecate IndexModifier and eventually remove IndexModifier?
>
> Cheers,
> Ning
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


ning.li.li at gmail

Aug 8, 2007, 8:56 AM

Post #3 of 13 (2158 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

I'm thinking about the impact of adding "deleteDocument(int doc)" on
LUCENE-847, especially on concurrent merge. The semantics of
"deleteDocument(int doc)" is that the document to delete is specified
by the document id on the index at the time of the call. When a merge
is finished and the result is being checked into IndexWriter's
SegmentInfos, document ids may change. Therefore, it may be necessary
to flush buffered delete doc ids (thus buffered docs and delete terms
as well) before a merge result is checked in.

The flush is not necessary if there is no buffered delete doc ids. I
don't think it should be the reason not to support "deleteDocument(int
doc)" in IndexWriter. But its impact on concurrent merge is a concern.

Ning


On 8/7/07, Grant Ingersoll <gsingers [at] apache> wrote:
> +1
>
>
> On Aug 7, 2007, at 3:37 PM, Ning Li wrote:
>
> > With the plan towards 3.0 release laid out, I think it's a good time
> > to deprecate IndexModifier and eventually remove IndexModifier.
> >
> > The only method in IndexModifier which is not implemented in
> > IndexWriter is "deleteDocument(int doc)". This is because of the
> > concern that document ids are changing as documents are deleted and
> > segments are merged. Should we add "deleteDocument(int doc)" to
> > IndexWriter but comment that it is an expert method, and then
> > deprecate IndexModifier and eventually remove IndexModifier?
> >
> > Cheers,
> > Ning
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> > For additional commands, e-mail: java-dev-help [at] lucene
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yonik at apache

Aug 8, 2007, 9:35 AM

Post #4 of 13 (2151 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/8/07, Ning Li <ning.li.li [at] gmail> wrote:
> I'm thinking about the impact of adding "deleteDocument(int doc)" on

To make delete by docid useful, one needs a way to *get* those docids.
A callback after flush that provided acurrent list of readers for the
segments would serve.

I think IndexWriter.deleteDocument(int doc) is something that wouldn't
be used by expert users, and would be accidentally used by novices.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


ning.li.li at gmail

Aug 8, 2007, 10:57 AM

Post #5 of 13 (2160 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/8/07, Yonik Seeley <yonik [at] apache> wrote:
> To make delete by docid useful, one needs a way to *get* those docids.
> A callback after flush that provided acurrent list of readers for the
> segments would serve.

Interesting. That makes sense.

> I think IndexWriter.deleteDocument(int doc) is something that wouldn't
> be used by expert users, and would be accidentally used by novices.

But you still think it's worth to be included in IndexWriter, right?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yonik at apache

Aug 8, 2007, 11:16 AM

Post #6 of 13 (2157 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/8/07, Ning Li <ning.li.li [at] gmail> wrote:
> On 8/8/07, Yonik Seeley <yonik [at] apache> wrote:
> > To make delete by docid useful, one needs a way to *get* those docids.
> > A callback after flush that provided acurrent list of readers for the
> > segments would serve.
>
> Interesting. That makes sense.
>
> > I think IndexWriter.deleteDocument(int doc) is something that wouldn't
> > be used by expert users, and would be accidentally used by novices.
>
> But you still think it's worth to be included in IndexWriter, right?

I'm not sure... (unless I'm missing some obvious use-cases).
If one could get a list of IndexReaders, one could directly use those
for deletions, right?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


ning.li.li at gmail

Aug 8, 2007, 12:34 PM

Post #7 of 13 (2160 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/8/07, Yonik Seeley <yonik [at] apache> wrote:
> On 8/8/07, Ning Li <ning.li.li [at] gmail> wrote:
> > But you still think it's worth to be included in IndexWriter, right?
>
> I'm not sure... (unless I'm missing some obvious use-cases).
> If one could get a list of IndexReaders, one could directly use those
> for deletions, right?

This is so that users won't have to close IndexWriter to do such
deletes, flush deletes and re-open IndexWriter to continue adding
documents.


On 8/8/07, Yonik Seeley <yonik [at] apache> wrote:
> To make delete by docid useful, one needs a way to *get* those docids.
> A callback after flush that provided acurrent list of readers for the
> segments would serve.

Maybe this can wait? After LUCENE-847 (factor merge policy), merge
behaviour could be much more controllable and then we'll see whether
such a mechanism is needed.


Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yonik at apache

Aug 8, 2007, 1:03 PM

Post #8 of 13 (2155 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/8/07, Ning Li <ning.li.li [at] gmail> wrote:
> On 8/8/07, Yonik Seeley <yonik [at] apache> wrote:
> > On 8/8/07, Ning Li <ning.li.li [at] gmail> wrote:
> > > But you still think it's worth to be included in IndexWriter, right?
> >
> > I'm not sure... (unless I'm missing some obvious use-cases).
> > If one could get a list of IndexReaders, one could directly use those
> > for deletions, right?
>
> This is so that users won't have to close IndexWriter to do such
> deletes, flush deletes and re-open IndexWriter to continue adding
> documents.

Let's take a simple case of deleting documents in a range, like
date:[2006 TO 2008]
One would currently need to close the writer and open a new reader to
ensure that they can "see" all the documents. Then execute a
RangeQuery, collect the ids, and do deletes.

If one wanted to use IndexWriter.deleteDocument(int doc) instead, then
one would still need to ensure that all documents were first flushed,
and would further need to ensure that no docids changed from when the
reader was opened (basically, that no adds/optimizes had been done via
the writer). So it seems that we are back to square one (closing the
writer, opening a new reader, etc).

Something that allowed one to request a list of readers (perhaps just
at a special point after a flush or before a merge) would get around
these issues I think.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


ning.li.li at gmail

Aug 8, 2007, 1:25 PM

Post #9 of 13 (2151 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/8/07, Yonik Seeley <yonik [at] apache> wrote:
> Let's take a simple case of deleting documents in a range, like
> date:[2006 TO 2008]
> One would currently need to close the writer and open a new reader to
> ensure that they can "see" all the documents. Then execute a
> RangeQuery, collect the ids, and do deletes.

This reminds me: It'd be nice if we could support delete-by-query someday. :)

I was thinking people use deleteDocument(int docid) when they are sure
the docid hasn't changed since obtained. That's why I considered it an
expert method...

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yonik at apache

Aug 8, 2007, 1:29 PM

Post #10 of 13 (2158 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/8/07, Ning Li <ning.li.li [at] gmail> wrote:
> On 8/8/07, Yonik Seeley <yonik [at] apache> wrote:
> > Let's take a simple case of deleting documents in a range, like
> > date:[2006 TO 2008]
> > One would currently need to close the writer and open a new reader to
> > ensure that they can "see" all the documents. Then execute a
> > RangeQuery, collect the ids, and do deletes.
>
> This reminds me: It'd be nice if we could support delete-by-query someday. :)
>
> I was thinking people use deleteDocument(int docid) when they are sure
> the docid hasn't changed since obtained. That's why I considered it an
> expert method...

Sure, but when do you know that the ids haven't change? A: When you
haven't added anything with the writer. So you get to keep the writer
open, but you can't really *use* it... that doesn't seem incredibly
useful ;-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


ning.li.li at gmail

Aug 8, 2007, 3:04 PM

Post #11 of 13 (2159 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/8/07, Yonik Seeley <yonik [at] apache> wrote:
> On 8/8/07, Ning Li <ning.li.li [at] gmail> wrote:
> > This reminds me: It'd be nice if we could support delete-by-query someday. :)
> >
> > I was thinking people use deleteDocument(int docid) when they are sure
> > the docid hasn't changed since obtained. That's why I considered it an
> > expert method...
>
> Sure, but when do you know that the ids haven't change? A: When you
> haven't added anything with the writer. So you get to keep the writer
> open, but you can't really *use* it... that doesn't seem incredibly
> useful ;-)

I was thinking a user knows the max buffered docs, and he knows the
buffer is far from full so he can continue adding documents and
deleting without triggering flush or merge so document ids won't
change...

But I do see your point.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


ning.li.li at gmail

Aug 12, 2007, 12:05 PM

Post #12 of 13 (2131 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

IndexWriter does everything IndexModifier does and more, except
"deleteDocument(int doc)". Can we reach consensus on: 1 Should we
deprecate IndexModifier before 3.0 and remove it in 3.0? 2 If so, do
we have to add "deleteDocument(int doc)" to IndexWriter?

We know how to support "deleteDocument(int doc)" in IndexWriter even
with concurrent merge (see discussion in LUCENE-847). The main
concern, as Yonik pointed out, is how users can use it, since docids
may change after segment merge. Is it possible to deprecate and remove
IndexModifier without adding "deleteDocument(int doc)" to IndexWriter?
Probably few users use IndexModifier.deleteDocument(int doc) since
it's also the case that docids may change after segment merge.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yonik at apache

Aug 13, 2007, 6:55 PM

Post #13 of 13 (2112 views)
Permalink
Re: Deprecating IndexModifier [In reply to]

On 8/12/07, Ning Li <ning.li.li [at] gmail> wrote:
> IndexWriter does everything IndexModifier does and more, except
> "deleteDocument(int doc)". Can we reach consensus on: 1 Should we
> deprecate IndexModifier before 3.0 and remove it in 3.0? 2 If so, do
> we have to add "deleteDocument(int doc)" to IndexWriter?

IMO, (1)=yes, (2)=no

> We know how to support "deleteDocument(int doc)" in IndexWriter even
> with concurrent merge (see discussion in LUCENE-847). The main
> concern, as Yonik pointed out, is how users can use it, since docids
> may change after segment merge. Is it possible to deprecate and remove
> IndexModifier without adding "deleteDocument(int doc)" to IndexWriter?
> Probably few users use IndexModifier.deleteDocument(int doc) since
> it's also the case that docids may change after segment merge.

Right.
It's not like adding deleteDocument() to IndexWriter will allow
anything written to IndexModifier to work unchanged. So I think if we
add something like deleteDocument() to IndexWriter it should be on
it's own merits.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.