Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Reusing indexed and analyzed documents

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


a.schrijvers at hippo

Jan 21, 2008, 7:37 AM

Post #1 of 4 (1104 views)
Permalink
Reusing indexed and analyzed documents

Hello,

is there a way to reuse a Lucene document which was indexed and analyzed
before, but only one single Field has changed? The use case (Jackrabbit
indexing) is when a *lot* of documents have a common field which
changes, and the rest of the document is unchanged . I would guess that
there is a more efficient way then reindexing and analyzing all fields
(it is present in the index already). I understand that I need to append
a new Lucene Document, so the old one needs to be deleted, but I hoped I
can somehow reuse the already analyzed unchanged fields. Does anybody
know if this is possible?

Do I understand correctly that the per-document payload feature will
give me this possibility in the future? Can I also query on payload
values (might be dumb question but I just learned about the Payload
feature :-) )?

thanks for any pointers,

Regards Ard

--

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel +31 (0)20 5224466
-------------------------------------------------------------
a.schrijvers [at] hippo / ard [at] apache / http://www.hippo.nl
--------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


karl.wettin at gmail

Jan 21, 2008, 11:04 PM

Post #2 of 4 (1016 views)
Permalink
Re: Reusing indexed and analyzed documents [In reply to]

21 jan 2008 kl. 16.37 skrev Ard Schrijvers:

> is there a way to reuse a Lucene document which was indexed and
> analyzed
> before, but only one single Field has changed?

I don't think you can reuse document instances like that, you could
however pre-tokenize them fields that will stay the same and reuse the
tokens in all documens (fields), perhaps using a CachingTokenFilter.

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/document/Field.html#Field(java.lang.String,%20org.apache.lucene.analysis.TokenStream)


--
karl


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


karl.wettin at gmail

Jan 21, 2008, 11:11 PM

Post #3 of 4 (1004 views)
Permalink
Re: Reusing indexed and analyzed documents [In reply to]

Forget all I said! I managed to answer a question that was not there! :)

If you have the term vectors stored it is fairly quick to re-assemble
a token stream from the document using a TermVectorMapper. Otherwise
it will be really slow.


--
karl

22 jan 2008 kl. 08.04 skrev Karl Wettin:

>
> 21 jan 2008 kl. 16.37 skrev Ard Schrijvers:
>
>> is there a way to reuse a Lucene document which was indexed and
>> analyzed
>> before, but only one single Field has changed?
>
> I don't think you can reuse document instances like that, you could
> however pre-tokenize them fields that will stay the same and reuse
> the tokens in all documens (fields), perhaps using a
> CachingTokenFilter.
>
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/document/Field.html#Field(java.lang.String,%20org.apache.lucene.analysis.TokenStream)
>
>
> --
> karl
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


a.schrijvers at hippo

Jan 21, 2008, 11:14 PM

Post #4 of 4 (1015 views)
Permalink
RE: Reusing indexed and analyzed documents [In reply to]

Hello,

> 21 jan 2008 kl. 16.37 skrev Ard Schrijvers:
>
> > is there a way to reuse a Lucene document which was indexed and
> > analyzed before, but only one single Field has changed?

> Karl Wetting wrote:
> I don't think you can reuse document instances like that, you
> could however pre-tokenize them fields that will stay the
> same and reuse the tokens in all documens (fields), perhaps
> using a CachingTokenFilter.

I was already afraid I could not reuse document instances. Caching
pre-tokenized fields won't be a solution to our issue: for Jackrabbit
NGP (next generation persistence) we are thinking about storing path
info (a hierarchical data structure) of nodes (node+props = lucene
document) in the lucene index.

Now, to be able to still have a reasonal acceptable performance in
moving nodes (and thus possibly changing a large subtree of nodes) we
would only like to reindex the changed path of all the subnodes, thus a
single Lucene Field. The other fields stay the same, so re-indexing and
analyzing all the content would really be inefficient (impossible).

Aaah, I just see your next mail arriving while typing :-) I think that
exactly covers my issue,

thanks a lot Karl for your pointers!

Regards Ard

>
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
javadoc/org/apache/lucene/document/Field.html#Field(java.lang.String,%20
org.apache.lucene.analysis.TokenStream)
>
>
> --
> karl
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.