
ian.lea at gmail
May 10, 2012, 1:22 AM
Post #2 of 2
(130 views)
Permalink
|
Impossible to say - how big is big? How fast is fast? I'd start with the simplest option and if it's fast enough, stop. -- Ian. On Sat, May 5, 2012 at 12:47 AM, Yang <teddyyyy123 [at] gmail> wrote: > I have an index containing all students, now I want to do an index > search inside an Apache Hadoop mapper, > i.e. > > for each (record from mapper input reader) { > output = lucene.search("name:"+ record.name + " OR " + " id:" + > record.id ); > emit(output) > } > > > my question is whether I should shard the index (across terms, not > splitting the same postings list for one term) or simply replicate it. > the index for the entire dataset is not too big, so it can fig into > my local disk, and I can copy it to every node in the cluster, and let > them sit there all the time, so no copy overhead is incurred. > the only argument in favor of sharding is that a smaller index might > be faster. but since index search is only O(lg(n)) time, maybe this > time saving is very small. > > so will sharding be worth the effort? > > thanks > yang > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene > For additional commands, e-mail: java-user-help [at] lucene > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|