
erickerickson at gmail
Sep 18, 2008, 6:34 AM
Post #5 of 7
(520 views)
Permalink
|
uuuhhhh, take anything Otis says as *much* more informed than anything I say on this topic <G>. Erick On Thu, Sep 18, 2008 at 2:32 AM, Tobias Larsson Hult < tobias.larsson.hult[at]findwise.se> wrote: > Thanks for the quick responses! > > Good point about the warmup issues Erick, that's something we will > consider. Good to know that this kind of setup has been proved working for > at least one :) I think we will do a small setup and test the performance. > > Thanks again for valuable input! > > Best Regards > Tobias > > > On 16 sep 2008, at 18.10, Otis Gospodnetic wrote: > > Tobias, >> >> That's the approach I took with Simpy.com and it's been working well for >> several years now. You'll have to keep track of searchers and close them >> when appropriate, of course. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> > On 16 sep 2008, at 17.17, Erick Erickson wrote: > >> >> The main arguments against using many separate indexes are >> 1> search warmup time. That is, each time you open an index >> the first few queries take much longer than subsequent searches. >> 2> Managing a bazillion indexes is non-trivial. >> >> >> That said, in your particular case these may not apply. I guess the >> piece of information that really counts is "how often do you expect >> to update/search a given index"? You could avoid the warmup issue >> by keeping an index open for some period of time after the first >> search on the assumption that the user is going to make multiple >> searches rather than just one. I'm sure there are other tricks >> you can try. >> >> So, how often do you expect >> 1> users to backup date >> 2> users to query data? >> and what is acceptable search response time? and are your >> users willing to live with a significant delay on the first couple >> of queries? >> >> I'd only be comfortable with choosing an approach if I tried >> it out with a single computer's content and generated a few >> stats.... >> >> Best >> Erick >> >> ----- Original Message ---- >> >>> From: Tobias Larsson Hult <tobias.larsson.hult[at]findwise.se> >>> To: java-user[at]lucene.apache.org >>> Sent: Tuesday, September 16, 2008 10:55:09 AM >>> Subject: Using separate index for each user >>> >>> Hi, >>> >>> We're thinking of using Lucene to integrate search in a backup service >>> application. The background is that we have a bunch of users using a >>> backup service, and we want them to be able to search their own, and >>> only their own, backups. >>> >>> The total amount of data that's being backed up is very large (size in >>> terabyte). Even though the index will probably be smaller due to only >>> indexing relevant fields, it is still to much to incorporate in one >>> index. But since a user will only search in his/her own files we're >>> thinking of creating one index for each user. There will be a lot of >>> indexes of course but each index will not span to more than a couple >>> of gigabytes at the most. >>> >>> So when a user searches or adds new content to the backup we will open >>> up his/her index and to a search/update in that particular index. That >>> way, each query/update should not be so performance intense. >>> >>> Does this sound like a reasonable solution? Of course this means >>> creating a lot of IndexReaders/Writers but I prefer that to searching >>> in a huge index everytime when a user only wants to search in a slice >>> of the total index. >>> >>> Best Regards, >>> Tobias Larsson Hult >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org >> For additional commands, e-mail: java-user-help[at]lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org > For additional commands, e-mail: java-user-help[at]lucene.apache.org > >
|