Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

deleteDocuments() does not work

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


pcdinh at gmail

Oct 28, 2009, 3:45 AM

Post #1 of 8 (701 views)
Permalink
deleteDocuments() does not work

Hi all,

I have a very simple method to delete a document that is indexed before

/**
* @param id
*/
public void deleteById(String id) throws IOException {
IndexWriter writer = IndexWriterFactory.factory();

try {
writer.deleteDocuments(new Term(Configuration.Field.ID,
String.valueOf(id)));
writer.commit();
} catch (ArrayIndexOutOfBoundsException e) {
// CHECK ignore this. Can happen if index has not been built yet
} catch (IOException e) {
System.out.println(e);
}
}

The problem is after executing this method without any exception, I come
back and try to do a search the supposed-to-be-deleted record is still
there. I need to restart my servlet engine to have that record been really
deleted. How can it happen?

Thanks

Dinh


anshumg at gmail

Oct 28, 2009, 3:49 AM

Post #2 of 8 (679 views)
Permalink
Re: deleteDocuments() does not work [In reply to]

Hi Dinh,
Is it that your engine keeps an IndexSearcher[Reader] open all through this
while? For the deleted document to actually reflect in the search (service),
you'd need to reload the index searcher with the latest version.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Wed, Oct 28, 2009 at 4:15 PM, Dinh <pcdinh [at] gmail> wrote:

> Hi all,
>
> I have a very simple method to delete a document that is indexed before
>
> /**
> * @param id
> */
> public void deleteById(String id) throws IOException {
> IndexWriter writer = IndexWriterFactory.factory();
>
> try {
> writer.deleteDocuments(new Term(Configuration.Field.ID,
> String.valueOf(id)));
> writer.commit();
> } catch (ArrayIndexOutOfBoundsException e) {
> // CHECK ignore this. Can happen if index has not been built yet
> } catch (IOException e) {
> System.out.println(e);
> }
> }
>
> The problem is after executing this method without any exception, I come
> back and try to do a search the supposed-to-be-deleted record is still
> there. I need to restart my servlet engine to have that record been really
> deleted. How can it happen?
>
> Thanks
>
> Dinh
>


pcdinh at gmail

Oct 28, 2009, 4:17 AM

Post #3 of 8 (672 views)
Permalink
Re: deleteDocuments() does not work [In reply to]

Hi Anshum,

> Is it that your engine keeps an IndexSearcher[Reader] open all through
this
while?

The answer is yes. I have tried to keep a singleton instance of
IndexSearcher open across web requests.

Regarding to your advice, I have tried to re-open the IndexReader that is
associated with that IndexSearcher

public void deleteById(String id) throws IOException {
IndexWriter writer = IndexWriterFactory.factory();

try {
writer.deleteDocuments(new Term(Configuration.Field.ID,
String.valueOf(id)));
writer.commit();
IndexSearcherFactory.reload();
} catch (ArrayIndexOutOfBoundsException e) {
// CHECK ignore this. Can happen if index has not been built yet
} catch (IOException e) {
System.out.println(e);
}
}

Here is how IndexSearcherFactory#reload is defined

public static void reload() throws CorruptIndexException, IOException {

Set<Map.Entry<String, IndexReader>> set = readers.entrySet();
for (Map.Entry<String, IndexReader> entry : set) {
readers.put(entry.getKey(), entry.getValue().reopen(true));
}
}

However, it does not work either.

Is there any way to debug this situation?

Thanks,

Dinh

On Wed, Oct 28, 2009 at 5:49 PM, Anshum <anshumg [at] gmail> wrote:

> Hi Dinh,
> Is it that your engine keeps an IndexSearcher[Reader] open all through this
> while? For the deleted document to actually reflect in the search
> (service),
> you'd need to reload the index searcher with the latest version.
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Wed, Oct 28, 2009 at 4:15 PM, Dinh <pcdinh [at] gmail> wrote:
>
> > Hi all,
> >
> > I have a very simple method to delete a document that is indexed before
> >
> > /**
> > * @param id
> > */
> > public void deleteById(String id) throws IOException {
> > IndexWriter writer = IndexWriterFactory.factory();
> >
> > try {
> > writer.deleteDocuments(new Term(Configuration.Field.ID,
> > String.valueOf(id)));
> > writer.commit();
> > } catch (ArrayIndexOutOfBoundsException e) {
> > // CHECK ignore this. Can happen if index has not been built
> yet
> > } catch (IOException e) {
> > System.out.println(e);
> > }
> > }
> >
> > The problem is after executing this method without any exception, I come
> > back and try to do a search the supposed-to-be-deleted record is still
> > there. I need to restart my servlet engine to have that record been
> really
> > deleted. How can it happen?
> >
> > Thanks
> >
> > Dinh
> >
>



--
Spica Framework: http://code.google.com/p/spica
http://www.twitter.com/pcdinh
http://groups.google.com/group/phpvietnam


lucene at mikemccandless

Oct 28, 2009, 4:47 AM

Post #4 of 8 (672 views)
Permalink
Re: deleteDocuments() does not work [In reply to]

Can you not suppress the AIOOBE (just in case you're hitting that)?

Also, you are failing to close the old reader after opening a new one.
This shouldn't cause the issue you're seeing, but, will lead
eventually to OOME or file descriptor exhaustion.

Can you verify you are in fact reopening the reader that's reading the
same Directory the writer is writing to?

Finally, are you sure the iteration over the Map entries, that
overwrites each entry, is safe?

Maybe, after writer.commit, try to simply [temporarily] open a new
reader on that Dir and see if the doc is deleted.

Are you sure String.valueOf(id) is giving you the expected result? Eg
does id ever have leading zeros?

Mike

On Wed, Oct 28, 2009 at 7:17 AM, Dinh <pcdinh [at] gmail> wrote:
> Hi Anshum,
>
>> Is it that your engine keeps an IndexSearcher[Reader] open all through
> this
> while?
>
> The answer is yes. I have tried to keep a singleton instance of
> IndexSearcher open across web requests.
>
> Regarding to your advice, I have tried to re-open the IndexReader that is
> associated with that IndexSearcher
>
>    public void deleteById(String id) throws IOException {
>        IndexWriter writer = IndexWriterFactory.factory();
>
>        try {
>            writer.deleteDocuments(new Term(Configuration.Field.ID,
> String.valueOf(id)));
>            writer.commit();
>            IndexSearcherFactory.reload();
>        } catch (ArrayIndexOutOfBoundsException e) {
>            // CHECK ignore this. Can happen if index has not been built yet
>        } catch (IOException e) {
>            System.out.println(e);
>        }
>    }
>
> Here is how IndexSearcherFactory#reload is defined
>
>    public static void reload() throws CorruptIndexException, IOException {
>
>        Set<Map.Entry<String, IndexReader>> set = readers.entrySet();
>        for (Map.Entry<String, IndexReader> entry : set) {
>            readers.put(entry.getKey(), entry.getValue().reopen(true));
>        }
>    }
>
> However, it does not work either.
>
> Is there any way to debug this situation?
>
> Thanks,
>
> Dinh
>
> On Wed, Oct 28, 2009 at 5:49 PM, Anshum <anshumg [at] gmail> wrote:
>
>> Hi Dinh,
>> Is it that your engine keeps an IndexSearcher[Reader] open all through this
>> while? For the deleted document to actually reflect in the search
>> (service),
>> you'd need to reload the index searcher with the latest version.
>> --
>> Anshum Gupta
>> Naukri Labs!
>> http://ai-cafe.blogspot.com
>>
>> The facts expressed here belong to everybody, the opinions to me. The
>> distinction is yours to draw............
>>
>>
>> On Wed, Oct 28, 2009 at 4:15 PM, Dinh <pcdinh [at] gmail> wrote:
>>
>> > Hi all,
>> >
>> > I have a very simple method to delete a document that is indexed before
>> >
>> >    /**
>> >     * @param id
>> >     */
>> >    public void deleteById(String id) throws IOException {
>> >        IndexWriter writer = IndexWriterFactory.factory();
>> >
>> >        try {
>> >            writer.deleteDocuments(new Term(Configuration.Field.ID,
>> > String.valueOf(id)));
>> >            writer.commit();
>> >        } catch (ArrayIndexOutOfBoundsException e) {
>> >            // CHECK ignore this. Can happen if index has not been built
>> yet
>> >        } catch (IOException e) {
>> >            System.out.println(e);
>> >        }
>> >    }
>> >
>> > The problem is after executing this method without any exception, I come
>> > back and try to do a search the supposed-to-be-deleted record is still
>> > there. I need to restart my servlet engine to have that record been
>> really
>> > deleted. How can it happen?
>> >
>> > Thanks
>> >
>> > Dinh
>> >
>>
>
>
>
> --
> Spica Framework: http://code.google.com/p/spica
> http://www.twitter.com/pcdinh
> http://groups.google.com/group/phpvietnam
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


pcdinh at gmail

Nov 3, 2009, 1:24 AM

Post #5 of 8 (588 views)
Permalink
Re: deleteDocuments() does not work [In reply to]

Hi Michael,

Thank a lot for your advice

> Can you verify you are in fact reopening the reader that's reading the
> same Directory the writer is writing to?

Yes. I have a single and configurable index path. So I can not make a
mistake here

> Also, you are failing to close the old reader after opening a new one.
> This shouldn't cause the issue you're seeing, but, will lead
> eventually to OOME or file descriptor exhaustion.

I have rewritten the method as follows

/**
* Reloads searchers after index is changed (added, deleted or updated).
*/
public static synchronized void reload() {

Set<Map.Entry<String, IndexSearcher>> set = searchers.entrySet();

for (Map.Entry<String, IndexSearcher> entry : set) {
try {
IndexSearcher searcher = entry.getValue();
IndexReader oldReader = searcher.getIndexReader();
IndexReader newReader = oldReader.reopen(true);

if (newReader != oldReader) {
oldReader.close();
searcher.close();
searchers.put(entry.getKey(), new
IndexSearcher(newReader));
}
} catch (Exception e) {
log.warn(e.getMessage(), e);
}
}
}

And it works now.

> Finally, are you sure the iteration over the Map entries, that
> overwrites each entry, is safe?

Do you think that my iteration is safe now? At least I have closed the
previous searcher and oldReader before creating new ones. However, I don't
know if it is a good practice to do so.

Thanks

Dinh

On Wed, Oct 28, 2009 at 6:47 PM, Michael McCandless <
lucene [at] mikemccandless> wrote:

> Can you not suppress the AIOOBE (just in case you're hitting that)?
>
> Also, you are failing to close the old reader after opening a new one.
> This shouldn't cause the issue you're seeing, but, will lead
> eventually to OOME or file descriptor exhaustion.
>
> Can you verify you are in fact reopening the reader that's reading the
> same Directory the writer is writing to?
>
> Finally, are you sure the iteration over the Map entries, that
> overwrites each entry, is safe?
>
> Maybe, after writer.commit, try to simply [temporarily] open a new
> reader on that Dir and see if the doc is deleted.
>
> Are you sure String.valueOf(id) is giving you the expected result? Eg
> does id ever have leading zeros?
>
> Mike
>
> On Wed, Oct 28, 2009 at 7:17 AM, Dinh <pcdinh [at] gmail> wrote:
> > Hi Anshum,
> >
> >> Is it that your engine keeps an IndexSearcher[Reader] open all through
> > this
> > while?
> >
> > The answer is yes. I have tried to keep a singleton instance of
> > IndexSearcher open across web requests.
> >
> > Regarding to your advice, I have tried to re-open the IndexReader that is
> > associated with that IndexSearcher
> >
> > public void deleteById(String id) throws IOException {
> > IndexWriter writer = IndexWriterFactory.factory();
> >
> > try {
> > writer.deleteDocuments(new Term(Configuration.Field.ID,
> > String.valueOf(id)));
> > writer.commit();
> > IndexSearcherFactory.reload();
> > } catch (ArrayIndexOutOfBoundsException e) {
> > // CHECK ignore this. Can happen if index has not been built
> yet
> > } catch (IOException e) {
> > System.out.println(e);
> > }
> > }
> >
> > Here is how IndexSearcherFactory#reload is defined
> >
> > public static void reload() throws CorruptIndexException, IOException
> {
> >
> > Set<Map.Entry<String, IndexReader>> set = readers.entrySet();
> > for (Map.Entry<String, IndexReader> entry : set) {
> > readers.put(entry.getKey(), entry.getValue().reopen(true));
> > }
> > }
> >
> > However, it does not work either.
> >
> > Is there any way to debug this situation?
> >
> > Thanks,
> >
> > Dinh
> >
> > On Wed, Oct 28, 2009 at 5:49 PM, Anshum <anshumg [at] gmail> wrote:
> >
> >> Hi Dinh,
> >> Is it that your engine keeps an IndexSearcher[Reader] open all through
> this
> >> while? For the deleted document to actually reflect in the search
> >> (service),
> >> you'd need to reload the index searcher with the latest version.
> >> --
> >> Anshum Gupta
> >> Naukri Labs!
> >> http://ai-cafe.blogspot.com
> >>
> >> The facts expressed here belong to everybody, the opinions to me. The
> >> distinction is yours to draw............
> >>
> >>
> >> On Wed, Oct 28, 2009 at 4:15 PM, Dinh <pcdinh [at] gmail> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I have a very simple method to delete a document that is indexed
> before
> >> >
> >> > /**
> >> > * @param id
> >> > */
> >> > public void deleteById(String id) throws IOException {
> >> > IndexWriter writer = IndexWriterFactory.factory();
> >> >
> >> > try {
> >> > writer.deleteDocuments(new Term(Configuration.Field.ID,
> >> > String.valueOf(id)));
> >> > writer.commit();
> >> > } catch (ArrayIndexOutOfBoundsException e) {
> >> > // CHECK ignore this. Can happen if index has not been
> built
> >> yet
> >> > } catch (IOException e) {
> >> > System.out.println(e);
> >> > }
> >> > }
> >> >
> >> > The problem is after executing this method without any exception, I
> come
> >> > back and try to do a search the supposed-to-be-deleted record is still
> >> > there. I need to restart my servlet engine to have that record been
> >> really
> >> > deleted. How can it happen?
> >> >
> >> > Thanks
> >> >
> >> > Dinh
> >> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Spica Framework: http://code.google.com/p/spica
http://www.twitter.com/pcdinh
http://groups.google.com/group/phpvietnam


lucene at mikemccandless

Nov 3, 2009, 1:50 AM

Post #6 of 8 (596 views)
Permalink
Re: deleteDocuments() does not work [In reply to]

On Tue, Nov 3, 2009 at 4:24 AM, Dinh <pcdinh [at] gmail> wrote:
> Hi Michael,
>
> Thank a lot for your advice
>
>> Can you verify you are in fact reopening the reader that's reading the
>> same Directory the writer is writing to?
>
> Yes. I have a single and configurable index path. So I can not make a
> mistake here

OK.

>> Also, you are failing to close the old reader after opening a new one.
>> This shouldn't cause the issue you're seeing, but, will lead
>> eventually to OOME or file descriptor exhaustion.
>
> I have rewritten the method as follows
>
>    /**
>     * Reloads searchers after index is changed (added, deleted or updated).
>     */
>    public static synchronized void reload() {
>
>        Set<Map.Entry<String, IndexSearcher>> set = searchers.entrySet();
>
>        for (Map.Entry<String, IndexSearcher> entry : set) {
>            try {
>                IndexSearcher searcher = entry.getValue();
>                IndexReader oldReader = searcher.getIndexReader();
>                IndexReader newReader = oldReader.reopen(true);
>
>                if (newReader != oldReader) {
>                    oldReader.close();
>                    searcher.close();
>                    searchers.put(entry.getKey(), new
> IndexSearcher(newReader));
>                }
>            } catch (Exception e) {
>                log.warn(e.getMessage(), e);
>            }
>        }
>    }

Your reload method looks better now! (You are now closing the old reader).

> And it works now.

Does that mean you no longer see the original problem (changes not
being reflected)?

>> Finally, are you sure the iteration over the Map entries, that
>> overwrites each entry, is safe?
>
> Do you think that my iteration is safe now? At least I have closed the
> previous searcher and oldReader before creating new ones. However, I don't
> know if it is a good practice to do so.

You get the entrySet from the Map, you then iterate over its
Map.Entry, then you replace in your original map some entries (the
ones that are opened). So, you are modifying a Java collection while
iterating over elements from its Set view... I just don't know if
that's safe (anyone?). Would be good to instrument/debug and confirm
that the precise IndexReader that's searching the Directory your
IndexWriter just committed to, is in fact reopened.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


pcdinh at gmail

Nov 3, 2009, 2:21 AM

Post #7 of 8 (589 views)
Permalink
Re: deleteDocuments() does not work [In reply to]

Hi Michael,

> Does that mean you no longer see the original problem (changes not
> being reflected)?

Yes. The deleted documents do not appear in search results any more. I am
not sure that if they are flushed to disk
at that time yet but at least there is a sign that they are "deleted". I
have stopped and started the servlet engine to ensure
that deleted document is no longer there. I think that Lucene requires the
previously opened IndexReader be closed before changes
can be reflected.

> You get the entrySet from the Map, you then iterate over its
> Map.Entry, then you replace in your original map some entries (the
> ones that are opened). So, you are modifying a Java collection while
> iterating over elements from its Set view... I just don't know if
> that's safe (anyone?)

I am a bit skeptical about my approach. Because the IndexSearchers can be
used by other threads (requests) at the same time.
So when I close them, some users can be affected. I will find a better way
to do it.

Also, because reload() is synchronized so there is a single thread accessing
it only. So I think that there will be no ConcurrentModificationException

> Would be good to instrument/debug and confirm
> that the precise IndexReader that's searching the Directory your
> IndexWriter just committed to, is in fact reopened.

Do you think that these code are enough

IndexReader oldReader = searcher.getIndexReader();
IndexReader newReader = oldReader.reopen(true);

if (newReader != oldReader) {
oldReader.close();
searcher.close();
searchers.put(entry.getKey(), new IndexSearcher(newReader));
}

Thanks,

Dinh

On Tue, Nov 3, 2009 at 4:50 PM, Michael McCandless <
lucene [at] mikemccandless> wrote:

> On Tue, Nov 3, 2009 at 4:24 AM, Dinh <pcdinh [at] gmail> wrote:
> > Hi Michael,
> >
> > Thank a lot for your advice
> >
> >> Can you verify you are in fact reopening the reader that's reading the
> >> same Directory the writer is writing to?
> >
> > Yes. I have a single and configurable index path. So I can not make a
> > mistake here
>
> OK.
>
> >> Also, you are failing to close the old reader after opening a new one.
> >> This shouldn't cause the issue you're seeing, but, will lead
> >> eventually to OOME or file descriptor exhaustion.
> >
> > I have rewritten the method as follows
> >
> > /**
> > * Reloads searchers after index is changed (added, deleted or
> updated).
> > */
> > public static synchronized void reload() {
> >
> > Set<Map.Entry<String, IndexSearcher>> set = searchers.entrySet();
> >
> > for (Map.Entry<String, IndexSearcher> entry : set) {
> > try {
> > IndexSearcher searcher = entry.getValue();
> > IndexReader oldReader = searcher.getIndexReader();
> > IndexReader newReader = oldReader.reopen(true);
> >
> > if (newReader != oldReader) {
> > oldReader.close();
> > searcher.close();
> > searchers.put(entry.getKey(), new
> > IndexSearcher(newReader));
> > }
> > } catch (Exception e) {
> > log.warn(e.getMessage(), e);
> > }
> > }
> > }
>
> Your reload method looks better now! (You are now closing the old reader).
>
> > And it works now.
>
> Does that mean you no longer see the original problem (changes not
> being reflected)?
>
> >> Finally, are you sure the iteration over the Map entries, that
> >> overwrites each entry, is safe?
> >
> > Do you think that my iteration is safe now? At least I have closed the
> > previous searcher and oldReader before creating new ones. However, I
> don't
> > know if it is a good practice to do so.
>
> You get the entrySet from the Map, you then iterate over its
> Map.Entry, then you replace in your original map some entries (the
> ones that are opened). So, you are modifying a Java collection while
> iterating over elements from its Set view... I just don't know if
> that's safe (anyone?). Would be good to instrument/debug and confirm
> that the precise IndexReader that's searching the Directory your
> IndexWriter just committed to, is in fact reopened.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


lucene at mikemccandless

Nov 3, 2009, 2:34 AM

Post #8 of 8 (592 views)
Permalink
Re: deleteDocuments() does not work [In reply to]

On Tue, Nov 3, 2009 at 5:21 AM, Dinh <pcdinh [at] gmail> wrote:
> Hi Michael,
>
>> Does that mean you no longer see the original problem (changes not
>> being reflected)?
>
> Yes. The deleted documents do not appear in search results any more. I am
> not sure that if they are flushed to disk
> at that time yet but at least there is a sign that they are "deleted". I
> have stopped and started the servlet engine to ensure
> that deleted document is no longer there. I think that Lucene requires the
> previously opened IndexReader be closed before changes
> can be reflected.

Hmmm: closing the old IndexReader shouldn't be necessary in order for
a newly opened IndexReader to see changes. Something else must've
been fixed at the same time (and I'm glad you got it fixed!).

>> You get the entrySet from the Map, you then iterate over its
>> Map.Entry, then you replace in your original map some entries (the
>> ones that are opened).  So, you are modifying a Java collection while
>> iterating over elements from its Set view... I just don't know if
>> that's safe (anyone?)
>
> I am a bit skeptical about my approach. Because the IndexSearchers can be
> used by other threads (requests) at the same time.

That's definitely a problem. [Shameless plug:] the next rev of Lucene
in Action has a class (SearcherManager) which gracefully handles
reopening in the presence of multiple threads still using the old
IndexSearcher. It uses the reference counting already builtin to
IndexReader to keep track of how many queries are still using the old
reader.

> Also, because reload() is synchronized so there is a single thread accessing
> it only. So I think that there will be no ConcurrentModificationException

Right, but your one thread is both iterating over and modifying the
Map, at once. That's what concerns me (but it could very well be
safe).

>> Would be good to instrument/debug and confirm
>> that the precise IndexReader that's searching the Directory your
>> IndexWriter just committed to, is in fact reopened.
>
> Do you think that these code are enough
>
> IndexReader oldReader = searcher.getIndexReader();
> IndexReader newReader = oldReader.reopen(true);
>
> if (newReader != oldReader) {
>    oldReader.close();
>    searcher.close();
>    searchers.put(entry.getKey(), new IndexSearcher(newReader));
> }

Yes, this code looks fine!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.