simon.willnauer at googlemail
May 12, 2012, 1:36 AM
Post #3 of 5
On Fri, May 11, 2012 at 7:56 AM, Jong Kim <jong.lucene [at] gmail> wrote:
> When I update a document in Lucene (i.e., re-indexing), I have to delete
> the existing document, and create a new one. My understanding is that this
> assigns a new doc ID for the newly created document. If that is the case,
> is it true that the system can rather quickly run out of doc ID space
> (which is about 2 billion since doc ID data type is integer) if the update
> frequency is extremly high in an application?
the Document IDs in Lucene are per segment. ie. they are always
segment based. There is certainly a limitation here that is 1. in the
API ie. all methods accepting internal doc ids expect int not long. 2.
on a segment level. Basically you gonna run into problems if you have
more than Integer.MAX_VALUE documents in one index. You can work
around that if everything is "per-segment", in such a case the
limitation only applies to a single segment.
Running out of "ids" won't be an issue as they are all relative
per-segment. ie. you can forever update a single document and don't
run out of ids.
> So, my question is -
> 1. Does Lucene always increment the doc ID for newly created document
> (hence, the risk of running out of ID space) just like auto increment
> column in the database does? Or does it re-use the numbers that are
> currently not in use (i.e. those IDs that were once assigned but since
> 2. If Lucene can recycle old IDs, it would be even better if I could force
> it to re-use a particular doc ID when updating a document by deleting old
> one and creating new one. This scheme will allow me to reference this doc
> ID from another doc in the index as if it was a foreign key value that
> doesn't change upon reindexing. I didn't see anything like this in the API,
> but is it ever possible?
> 3. If Lucene does not recycle old IDs, how do people deal with this issue
> when designing a system with extremely high re-indexing frequency?
the lucene internal ids should not be used in the application
integrating lucene or at least not in a way you would use a primary
"auto-incremented" key in a DB. you can specify your own "id" field
and reuse the ids (you actually have to if you want to update.
does that make sense?
> Thanks in advance for help
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene