Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Distributing index over N disks

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


otis_gospodnetic at yahoo

Nov 24, 2009, 2:31 PM

Post #1 of 4 (374 views)
Permalink
Distributing index over N disks

Hello,

Would it make sense and be possible to spread different index files over multiple disks (without resorting to putting an index on a RAID)?
For example, what if the index files didn't live in a single index dir, but were organized by their type in a snallow dir tree, like this:

/path/to/index:
tis/<tis files here>
ftd/<fdt files here>
prx/<prx files here>
...

Then one could symlink these tis, fdt, prx, etc. dirs to locations that are really on different disks.
Is this doable and would it help imrpve performance? I think it could improve segment merging, index optimization, and searches, because N disk heads would be able to do ~N times more work because of parallelization.


But the idea seems to simple that it makes me think I'm missing something, otherwise it would have already been done. :)

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


uwe at thetaphi

Nov 24, 2009, 2:44 PM

Post #2 of 4 (339 views)
Permalink
RE: Distributing index over N disks [In reply to]

It is technically doable since 2.9 with FileSwitchDirectory, where you can
define file name endings as a filter to which underlying directory the
requests go, see
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/store/FileSwi
tchDirectory.html

To have more directories, just use another FileSwitchDirectory as secondary
and so on.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic [at] yahoo]
> Sent: Tuesday, November 24, 2009 11:32 PM
> To: java-dev [at] lucene
> Subject: Distributing index over N disks
>
> Hello,
>
> Would it make sense and be possible to spread different index files over
> multiple disks (without resorting to putting an index on a RAID)?
> For example, what if the index files didn't live in a single index dir,
> but were organized by their type in a snallow dir tree, like this:
>
> /path/to/index:
> tis/<tis files here>
> ftd/<fdt files here>
> prx/<prx files here>
> ...
>
> Then one could symlink these tis, fdt, prx, etc. dirs to locations that
> are really on different disks.
> Is this doable and would it help imrpve performance? I think it could
> improve segment merging, index optimization, and searches, because N disk
> heads would be able to do ~N times more work because of parallelization.
>
>
> But the idea seems to simple that it makes me think I'm missing something,
> otherwise it would have already been done. :)
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Nov 25, 2009, 1:58 AM

Post #3 of 4 (333 views)
Permalink
Re: Distributing index over N disks [In reply to]

I think this is a good idea, for indexes that can't fit in IO cache.
Report back if you get good results :) I think FSD opens up all sorts
of interesting possibilities.

Mike

On Tue, Nov 24, 2009 at 5:31 PM, Otis Gospodnetic
<otis_gospodnetic [at] yahoo> wrote:
> Hello,
>
> Would it make sense and be possible to spread different index files over multiple disks (without resorting to putting an index on a RAID)?
> For example, what if the index files didn't live in a single index dir, but were organized by their type in a snallow dir tree, like this:
>
> /path/to/index:
>   tis/<tis files here>
>   ftd/<fdt files here>
>   prx/<prx files here>
>   ...
>
> Then one could symlink these tis, fdt, prx, etc. dirs to locations that are really on different disks.
> Is this doable and would it help imrpve performance?  I think it could improve segment merging, index optimization, and searches, because N disk heads would be able to do ~N times more work because of parallelization.
>
>
> But the idea seems to simple that it makes me think I'm missing something, otherwise it would have already been done. :)
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


ab at getopt

Nov 25, 2009, 3:40 AM

Post #4 of 4 (332 views)
Permalink
Re: Distributing index over N disks [In reply to]

Uwe Schindler wrote:
> It is technically doable since 2.9 with FileSwitchDirectory, where you can
> define file name endings as a filter to which underlying directory the
> requests go, see
> http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/store/FileSwi
> tchDirectory.html
>
> To have more directories, just use another FileSwitchDirectory as secondary
> and so on.

You guys are too sophisticated ;) I know some people have been using a
lo-tek solution commonly known as symlinks - i.e. they put prx and frq
files on an SSD and the rest on a regular HDD, and create symlinks to
prx and frq. This works well with static indexes (no updates, no
merges), and doesn't require code modifications in existing apps.

Seriously, though, I agree that FileSwitchDirectory is the way to go.

--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.