
phaninra at gmail
Jul 27, 2012, 3:15 PM
Post #6 of 6
(279 views)
Permalink
|
|
Re: Getting terms from unstored fields, doc-wise
[In reply to]
|
|
Thanks a lot Aditya and Andrzej .. Your responses were really helpful. On Fri, Jul 27, 2012 at 6:15 AM, Andrzej Bialecki <ab [at] getopt> wrote: > On 26/07/2012 22:04, Phanindra R wrote: > >> Thanks for the reply Abdul. >> >> I was exploring the API and I think we can retrieve all those words by >> using a brute-force approach. >> >> 1) Get all the terms using indexReader.terms() >> >> 2) Process the term only if it belongs to the target field. >> >> 3) Get all the docs using indexReader.termDocs(term); >> >> 4) So, we have the term-doc pairs at this point. >> > > This procedure is implemented in Luke (http://code.google.com/p/luke**) > in the "Reconstruct & Edit" function. In case of larger indexes it's indeed > a time-consuming procedure. > > > >> Is there any better approach other than the above forever-taking >> procedure? >> > > No. Indexing is usually a lossy process - some data is irretrievably lost > - and the resulting data structure is not optimized for re-assembling the > original content. If you need to retrieve the original content you have to > store it, either using stored fields or in an external system. > > > -- > Best regards, > Andrzej Bialecki > http://www.sigram.com, blog http://www.sigram.com/blog > ___.,___,___,___,_._. __________________<><_________**___________ > [___||.__|__/|__||\/|: Information Retrieval, System Integration > ___|||__||..\|..||..|: Contact: info at sigram dot com > > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene**apache.org<java-user-unsubscribe [at] lucene> > For additional commands, e-mail: java-user-help [at] lucene**org<java-user-help [at] lucene> > >
|