Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

remove duplicate when merging indexes

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


m.harig at gmail

Nov 10, 2009, 1:05 AM

Post #1 of 9 (808 views)
Permalink
remove duplicate when merging indexes

hello all,

This is my situation , i've multiple indexes , for example , index1 ,
index2 , index3 ... i've to update the indexes every night . If i open my
IndexWriter create=false (since i want to update the existing index) , am
getting duplicate documents appends with the existing indexes , anyone help
how do i remove duplicate documents by updating the existing index??????
--
View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280244.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


simon.willnauer at googlemail

Nov 10, 2009, 1:12 AM

Post #2 of 9 (788 views)
Permalink
Re: remove duplicate when merging indexes [In reply to]

You need some kind of unique ID for you documents like a primary key in a RDB.
If you have something like that you can call
IndexWriter#updateDocument(uniqueIDTerm, document) this will delete
the old document and add the new one.

simon

On Tue, Nov 10, 2009 at 10:05 AM, m.harig <m.harig [at] gmail> wrote:
>
> hello all,
>
>   This is my situation ,  i've multiple indexes , for example , index1 ,
> index2 , index3 ... i've to update the indexes every night . If i open my
> IndexWriter create=false (since i want to update the existing index) ,  am
> getting duplicate documents appends with the existing indexes , anyone help
> how do i remove duplicate documents by updating the existing index??????
> --
> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280244.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


m.harig at gmail

Nov 10, 2009, 1:22 AM

Post #3 of 9 (779 views)
Permalink
Re: remove duplicate when merging indexes [In reply to]

Thanks simon

How I do get the unique ID ? will it be added to the index?



Simon Willnauer wrote:
>
> You need some kind of unique ID for you documents like a primary key in a
> RDB.
> If you have something like that you can call
> IndexWriter#updateDocument(uniqueIDTerm, document) this will delete
> the old document and add the new one.
>
> simon
>
> On Tue, Nov 10, 2009 at 10:05 AM, m.harig <m.harig [at] gmail> wrote:
>>
>> hello all,
>>
>>   This is my situation ,  i've multiple indexes , for example , index1 ,
>> index2 , index3 ... i've to update the indexes every night . If i open my
>> IndexWriter create=false (since i want to update the existing index) ,
>>  am
>> getting duplicate documents appends with the existing indexes , anyone
>> help
>> how do i remove duplicate documents by updating the existing index??????
>> --
>> View this message in context:
>> http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280244.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280473.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


simon.willnauer at googlemail

Nov 10, 2009, 1:26 AM

Post #4 of 9 (778 views)
Permalink
Re: remove duplicate when merging indexes [In reply to]

On Tue, Nov 10, 2009 at 10:22 AM, m.harig <m.harig [at] gmail> wrote:
>
> Thanks simon
>
>    How I do get the unique ID ? will it be added to the index?
There is no such thing build into lucene. You need to generate your
own unique ID. Make sure you do NOT use the document ID as it is
volatile and is likely to change once you modify you index.

simon

>
>
>
> Simon Willnauer wrote:
>>
>> You need some kind of unique ID for you documents like a primary key in a
>> RDB.
>> If you have something like that you can call
>> IndexWriter#updateDocument(uniqueIDTerm, document) this will delete
>> the old document and add the new one.
>>
>> simon
>>
>> On Tue, Nov 10, 2009 at 10:05 AM, m.harig <m.harig [at] gmail> wrote:
>>>
>>> hello all,
>>>
>>>   This is my situation ,  i've multiple indexes , for example , index1 ,
>>> index2 , index3 ... i've to update the indexes every night . If i open my
>>> IndexWriter create=false (since i want to update the existing index) ,
>>>  am
>>> getting duplicate documents appends with the existing indexes , anyone
>>> help
>>> how do i remove duplicate documents by updating the existing index??????
>>> --
>>> View this message in context:
>>> http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280244.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280473.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


m.harig at gmail

Nov 10, 2009, 1:47 AM

Post #5 of 9 (781 views)
Permalink
Re: remove duplicate when merging indexes [In reply to]

Thanks again

this is my code ,

doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));

doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
Field.Index.ANALYZED));

doc.add(new Field("contents", indexForm.getContent(),
Field.Store.YES, Field.Index.ANALYZED));

writer.updateDocument(new Term(""+i), doc);

no changes still .. Am i doing wrong??? help me
--
View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280758.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


m.harig at gmail

Nov 10, 2009, 1:51 AM

Post #6 of 9 (786 views)
Permalink
Re: remove duplicate when merging indexes [In reply to]

Thanks simon ,,

this is my code

doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));

doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
Field.Index.ANALYZED));
doc.add(new Field("contents", indexForm.getContent(),
Field.Store.YES, Field.Index.ANALYZED));
writer.updateDocument(new Term("id"), doc);

but still no change .. where am doing wrong??
--
View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280792.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Nov 10, 2009, 1:58 AM

Post #7 of 9 (778 views)
Permalink
Re: remove duplicate when merging indexes [In reply to]

Try updateDocument(new Term("id", ""+i), doc).

See javadocs for Term constructors.



--
Ian.


On Tue, Nov 10, 2009 at 9:47 AM, m.harig <m.harig [at] gmail> wrote:
>
> Thanks again
>
> this is my code ,
>
>  doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
>
>  doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
>                                        Field.Index.ANALYZED));
>
>  doc.add(new Field("contents", indexForm.getContent(),
>                                        Field.Store.YES, Field.Index.ANALYZED));
>
>  writer.updateDocument(new Term(""+i), doc);
>
> no changes still .. Am i doing wrong??? help me
> --
> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280758.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


simon.willnauer at googlemail

Nov 10, 2009, 2:07 AM

Post #8 of 9 (783 views)
Permalink
Re: remove duplicate when merging indexes [In reply to]

Ian got it :)

simon

On Tue, Nov 10, 2009 at 10:58 AM, Ian Lea <ian.lea [at] gmail> wrote:
> Try updateDocument(new Term("id", ""+i), doc).
>
> See javadocs for Term constructors.
>
>
>
> --
> Ian.
>
>
> On Tue, Nov 10, 2009 at 9:47 AM, m.harig <m.harig [at] gmail> wrote:
>>
>> Thanks again
>>
>> this is my code ,
>>
>>  doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
>>
>>  doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
>>                                        Field.Index.ANALYZED));
>>
>>  doc.add(new Field("contents", indexForm.getContent(),
>>                                        Field.Store.YES, Field.Index.ANALYZED));
>>
>>  writer.updateDocument(new Term(""+i), doc);
>>
>> no changes still .. Am i doing wrong??? help me
>> --
>> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280758.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


m.harig at gmail

Nov 10, 2009, 2:08 AM

Post #9 of 9 (783 views)
Permalink
Re: remove duplicate when merging indexes [In reply to]

Thanks Ian , it works , thanks a lot.

Ian Lea wrote:
>
> Try updateDocument(new Term("id", ""+i), doc).
>
> See javadocs for Term constructors.
>
>
>
> --
> Ian.
>
>
> On Tue, Nov 10, 2009 at 9:47 AM, m.harig <m.harig [at] gmail> wrote:
>>
>> Thanks again
>>
>> this is my code ,
>>
>>  doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
>>
>>  doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
>>                                        Field.Index.ANALYZED));
>>
>>  doc.add(new Field("contents", indexForm.getContent(),
>>                                        Field.Store.YES,
>> Field.Index.ANALYZED));
>>
>>  writer.updateDocument(new Term(""+i), doc);
>>
>> no changes still .. Am i doing wrong??? help me
>> --
>> View this message in context:
>> http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280758.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280913.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.