Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Re: changing index format

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


lucene at mikemccandless

Jun 25, 2008, 3:40 AM

Post #1 of 3 (208 views)
Permalink
Re: changing index format

John Wang wrote:

> The problem I am having is stated below, I don't know how to
> add the minDoc and maxDoc values to the index while keeping backward
> compatibility.

Unfortunately, TermInfo file format just isn't extensible at the
moment, so I think for now you'll have to break backward compatibility
if you really want to store these new fields in the _X.tis/.tii files.

EG here is another recent example of wanting to alter what's stored in
TermInfo:

https://issues.apache.org/jira/browse/LUCENE-1278

For flexible indexing we clearly need to fix this, so that any
"plugin" in the indexing chain could stuff whatever it wants into the
TermInfo, and also override how TermInfo is read/written. Even the
things we now store in TermInfo should be optional. EG say you choose
not to store locations (prx) for a given field. Then, you would not
need the long proxPointer.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


john.wang at gmail

Jul 3, 2008, 5:05 PM

Post #2 of 3 (157 views)
Permalink
Re: changing index format [In reply to]

Hi Michael:

What is the plan/timeline for supporting flexible indexing?

Thanks

-John

On Wed, Jun 25, 2008 at 3:40 AM, Michael McCandless <
lucene[at]mikemccandless.com> wrote:

>
> John Wang wrote:
>
> The problem I am having is stated below, I don't know how to add the
>> minDoc and maxDoc values to the index while keeping backward compatibility.
>>
>
> Unfortunately, TermInfo file format just isn't extensible at the moment, so
> I think for now you'll have to break backward compatibility if you really
> want to store these new fields in the _X.tis/.tii files.
>
> EG here is another recent example of wanting to alter what's stored in
> TermInfo:
>
> https://issues.apache.org/jira/browse/LUCENE-1278
>
> For flexible indexing we clearly need to fix this, so that any "plugin" in
> the indexing chain could stuff whatever it wants into the TermInfo, and also
> override how TermInfo is read/written. Even the things we now store in
> TermInfo should be optional. EG say you choose not to store locations (prx)
> for a given field. Then, you would not need the long proxPointer.
>
> Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>


lucene at mikemccandless

Jul 4, 2008, 2:41 AM

Post #3 of 3 (152 views)
Permalink
Re: changing index format [In reply to]

Well .... there really is no concrete plan/timeline at this point --
that is the nature of open source.

But there are some issues in flight that take us on the first few
steps towards flexible indexing.

I think LUCENE-1301 (which I'm working on and should be done soon, in
2.4 I think) is a first solid step forward, from a "top down"
standpoint. It breaks DocumentsWriter into an indexing chain where
each plugin in the chain does its own thing. Ie, there are separate
plugins to write stored fields, term vectors, frq/prx, norms, etc.
But it's only a first step. EG I'm not exposing any of these plugins
outside of oal.index package (eventually I hope to, but for starters
we need to let it mature internally).

LUCENE-1231, which I think Michael Busch is working on, would then
push another aspect of flexible indexing, by enabling efficient stored
fields (column stride) and extending Fieldable/AbstractField/Field so
you can separately turn on/off a-la-cart details about how a field is
indexed.

But neither of these issues addresses making the tii/tis files
extensible. To do that we'd need to take the plugin that writes frq/
prx and make it extensible, itself. Not only for writing your own
stuff into tii/tis files, but also eg NOT writing prx info if you
don't need it, making separate file for skip data or not, etc. The
discussions recently with Marvin about cleanly separating the
container (tii/tis) from the codecs (frq/prx/skip) also fit in here.
I think of this part as the "bottoms up" part of flexible indexing.

Mike

John Wang wrote:

> Hi Michael:
>
> What is the plan/timeline for supporting flexible indexing?
>
> Thanks
>
> -John
>
> On Wed, Jun 25, 2008 at 3:40 AM, Michael McCandless <lucene[at]mikemccandless.com
> > wrote:
>
> John Wang wrote:
>
> The problem I am having is stated below, I don't know how to add
> the minDoc and maxDoc values to the index while keeping backward
> compatibility.
>
> Unfortunately, TermInfo file format just isn't extensible at the
> moment, so I think for now you'll have to break backward
> compatibility if you really want to store these new fields in the
> _X.tis/.tii files.
>
> EG here is another recent example of wanting to alter what's stored
> in TermInfo:
>
> https://issues.apache.org/jira/browse/LUCENE-1278
>
> For flexible indexing we clearly need to fix this, so that any
> "plugin" in the indexing chain could stuff whatever it wants into
> the TermInfo, and also override how TermInfo is read/written. Even
> the things we now store in TermInfo should be optional. EG say you
> choose not to store locations (prx) for a given field. Then, you
> would not need the long proxPointer.
>
> Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.