Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Can Lucene unite multiple instances run as one ?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


zhaowb at gmail

Nov 15, 2009, 7:12 PM

Post #1 of 10 (982 views)
Permalink
Can Lucene unite multiple instances run as one ?

Hi, all
I'm facing a large index, on a x86 win platform which may not have big
enough jvm heap space to hold the entire index.
So, I think it's possible to split the index into several smaller
indexes, run them in different jvm instances on different machine.
Then for each query, I can concurrently run it one every indexes and
merge the result together.
This can be a workaround of OutOfMemory issue.
But before I start to do this, I want to ask if Lucene already have a
solution for things like this.
Thanks.

--

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jrhoden at unimelb

Nov 15, 2009, 7:22 PM

Post #2 of 10 (956 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

Not sure how large your index is, but it might be easier (if possible
to increase your memory) than to develop a fairly complicated
alternative strategy.

On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:

> Hi, all
> I'm facing a large index, on a x86 win platform which may not have big
> enough jvm heap space to hold the entire index.
> So, I think it's possible to split the index into several smaller
> indexes, run them in different jvm instances on different machine.
> Then for each query, I can concurrently run it one every indexes and
> merge the result together.
> This can be a workaround of OutOfMemory issue.
> But before I start to do this, I want to ask if Lucene already have a
> solution for things like this.
> Thanks.
>
> --
>
> Best Regards,
> ZHAO, Wenbo
>
> =======================
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

____________________________________
Information Technology Services,
The University of Melbourne

Email: jrhoden [at] unimelb
Phone: +61 3 8344 2884
Mobile: +61 4 1095 7575


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


zhaowb at gmail

Nov 15, 2009, 8:39 PM

Post #3 of 10 (946 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

My data is categorized by date. About 14M+ docs per month, 37M+ terms.
When I use 1G heap size to do search of 10 month index, I got OOM.
The problem is I can't increase heap size in an easy way.
I have several machines, all 32bit windows, 4G ram.
And my goal is to index 10 year's data, plus more data every day !
If I put all of them together, I will need 8G+ ram to run search.
Maybe another 8G+ ram to run indexwriter.

I think to split large index into smaller indexes and use a group of
machines to work as one is more flexible and faster compare to one
huge ram machine.
Any suggestions ? beside more rams.


2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
> Not sure how large your index is,  but it might be easier (if possible to
> increase your memory) than to develop a fairly complicated alternative
> strategy.
>
> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
>
>> Hi, all
>> I'm facing a large index, on a x86 win platform which may not have big
>> enough jvm heap space to hold the entire index.
>> So, I think it's possible to split the index into several smaller
>> indexes, run them in different jvm instances on different machine.
>> Then for each query, I can concurrently run it one every indexes and
>> merge the result together.
>> This can be a workaround of OutOfMemory issue.
>> But before I start to do this, I want to ask if Lucene already have a
>> solution for things like this.
>> Thanks.
>>
>> --
>>
>> Best Regards,
>> ZHAO, Wenbo
>>
>> =======================
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>
> ____________________________________
> Information Technology Services,
> The University of Melbourne
>
> Email: jrhoden [at] unimelb
> Phone: +61 3 8344 2884
> Mobile: +61 4 1095 7575
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>



--

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jrhoden at unimelb

Nov 15, 2009, 8:44 PM

Post #4 of 10 (943 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

Sounds like you may need to have some sort of distributed system, I
just wanted to make sure you were aware of the cost/benifits of just
buying a big 62bit/8Gb ram machine, vs having to not only maintain and
power several 32 bit machines, but also maintain and support your now
more complicated code.

I have seen it too many times developers/companies spend so much money
in not just the initial development, but long term support and
maintenance that could have been simplified by just buying a bigger/
better more powerful machine in the first place.

I am interested to see what other people have to say about how to
solve your problem.

Best regards,
Jacob

On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:

> My data is categorized by date. About 14M+ docs per month, 37M+
> terms.
> When I use 1G heap size to do search of 10 month index, I got OOM.
> The problem is I can't increase heap size in an easy way.
> I have several machines, all 32bit windows, 4G ram.
> And my goal is to index 10 year's data, plus more data every day !
> If I put all of them together, I will need 8G+ ram to run search.
> Maybe another 8G+ ram to run indexwriter.
>
> I think to split large index into smaller indexes and use a group of
> machines to work as one is more flexible and faster compare to one
> huge ram machine.
> Any suggestions ? beside more rams.
>
>
> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
>> Not sure how large your index is, but it might be easier (if
>> possible to
>> increase your memory) than to develop a fairly complicated
>> alternative
>> strategy.
>>
>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
>>
>>> Hi, all
>>> I'm facing a large index, on a x86 win platform which may not have
>>> big
>>> enough jvm heap space to hold the entire index.
>>> So, I think it's possible to split the index into several smaller
>>> indexes, run them in different jvm instances on different machine.
>>> Then for each query, I can concurrently run it one every indexes and
>>> merge the result together.
>>> This can be a workaround of OutOfMemory issue.
>>> But before I start to do this, I want to ask if Lucene already
>>> have a
>>> solution for things like this.
>>> Thanks.
>>>
>>> --
>>>
>>> Best Regards,
>>> ZHAO, Wenbo
>>>
>>> =======================
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>
>> ____________________________________
>> Information Technology Services,
>> The University of Melbourne
>>
>> Email: jrhoden [at] unimelb
>> Phone: +61 3 8344 2884
>> Mobile: +61 4 1095 7575
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
>
>
> --
>
> Best Regards,
> ZHAO, Wenbo
>
> =======================
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

____________________________________
Information Technology Services,
The University of Melbourne

Email: jrhoden [at] unimelb
Phone: +61 3 8344 2884
Mobile: +61 4 1095 7575


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


zhaowb at gmail

Nov 15, 2009, 11:13 PM

Post #5 of 10 (946 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

Yes, exactly 'distributed'...
From maintenance point of view, the 'horizontal' expandable is very important.
For my case, the data file is a kind of 'history' file, categorized
by date. Once the data file is indexed, it will not change, unless
the searching fields changed.
Say I make whole ten years data indexed, generated 400G index,
requiring 8G ram. When I do backup, I have to backup the entire 400G
every time. I need another 8G machine for backup. And 8G is not
enough, the index is increasing everyday.
Compare to distributed solution, I can split the index by year or by
seasons. Say I have 10x40G index. I can easily run 10 jvm process
each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
Consider the backup, 9 of 10 indexes are old, only need backup once,
they won't change. only 1 hot index is changing everyday, so I just
backup up to 40G. The spare machine is also very cheap. And the
machines are so cheap, I can use VMs to run this, it's more flexible
in resource management. As time goes by, I just install new jvm
instance when needed. I don't worry about ram and search speed
anymore.
I do think there should be more bigger cases out there just like mine.
The general distributed Lucene will be very useful. It will bring
Lucene to more enterprise applications, or more bigger, industry
applications.


2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
> Sounds like you may need to have some sort of distributed system, I just
> wanted to make sure you were aware of the cost/benifits of just buying a big
> 62bit/8Gb ram machine, vs having to not only maintain and power several 32
> bit machines, but also maintain and support your now more complicated code.
>
> I have seen it too many times developers/companies spend so much money in
> not just the initial development, but long term support and maintenance that
> could have been simplified by just buying a bigger/better more powerful
> machine in the first place.
>
> I am interested to see what other people have to say about how to solve your
> problem.
>
> Best regards,
> Jacob
>
> On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
>
>> My data is categorized by date.  About 14M+ docs per month, 37M+ terms.
>> When I use 1G heap size to do search of 10 month index, I got OOM.
>> The problem is I can't increase heap size in an easy way.
>> I have several machines, all 32bit windows, 4G ram.
>> And my goal is to index 10 year's data, plus more data every day !
>> If I put all of them together, I will need 8G+ ram to run search.
>> Maybe another 8G+ ram to run indexwriter.
>>
>> I think to split large index into smaller indexes and use a group of
>> machines to work as one is more flexible and faster compare to one
>> huge ram machine.
>> Any suggestions ?  beside more rams.
>>
>>
>> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
>>>
>>> Not sure how large your index is,  but it might be easier (if possible to
>>> increase your memory) than to develop a fairly complicated alternative
>>> strategy.
>>>
>>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
>>>
>>>> Hi, all
>>>> I'm facing a large index, on a x86 win platform which may not have big
>>>> enough jvm heap space to hold the entire index.
>>>> So, I think it's possible to split the index into several smaller
>>>> indexes, run them in different jvm instances on different machine.
>>>> Then for each query, I can concurrently run it one every indexes and
>>>> merge the result together.
>>>> This can be a workaround of OutOfMemory issue.
>>>> But before I start to do this, I want to ask if Lucene already have a
>>>> solution for things like this.
>>>> Thanks.
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>> ZHAO, Wenbo
>>>>
>>>> =======================
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>>> For additional commands, e-mail: java-user-help [at] lucene
>>>>
>>>
>>> ____________________________________
>>> Information Technology Services,
>>> The University of Melbourne
>>>
>>> Email: jrhoden [at] unimelb
>>> Phone: +61 3 8344 2884
>>> Mobile: +61 4 1095 7575
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>>
>>
>>
>>
>> --
>>
>> Best Regards,
>> ZHAO, Wenbo
>>
>> =======================
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>
> ____________________________________
> Information Technology Services,
> The University of Melbourne
>
> Email: jrhoden [at] unimelb
> Phone: +61 3 8344 2884
> Mobile: +61 4 1095 7575
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>



--

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


erickerickson at gmail

Nov 16, 2009, 5:19 AM

Post #6 of 10 (929 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

I confess that I've just skimmed your e-mail, but there's absolutely
no requirement that the entire index fit in RAM. The fact that your
index is larger than available RAM isn't the reason you're hitting OOM.

Typical reasons for this are:
1> you're sorting on a field with many, many, many unique values. If
you're sorting on a fine-grained timestamp, this is quite possible.
2> You've bumped MAX_BOOLEAN_CLAUSES and are searching
on, say, one-letter wildcards.
3> many other reasons.

I agree with Jacob, jumping into a multi-machine solution without
understanding the problem in detail may not be your best course.

So, can you tell us more about the conditions under which you hit
OOM? Maybe with more details we can come up with better solutions.

If you absolutely *must* implement a multi-machine solution, have
you seen ParallelMultiSearcher?

Best
Erick

On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zhaowb [at] gmail> wrote:

> Yes, exactly 'distributed'...
> From maintenance point of view, the 'horizontal' expandable is very
> important.
> For my case, the data file is a kind of 'history' file, categorized
> by date. Once the data file is indexed, it will not change, unless
> the searching fields changed.
> Say I make whole ten years data indexed, generated 400G index,
> requiring 8G ram. When I do backup, I have to backup the entire 400G
> every time. I need another 8G machine for backup. And 8G is not
> enough, the index is increasing everyday.
> Compare to distributed solution, I can split the index by year or by
> seasons. Say I have 10x40G index. I can easily run 10 jvm process
> each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
> Consider the backup, 9 of 10 indexes are old, only need backup once,
> they won't change. only 1 hot index is changing everyday, so I just
> backup up to 40G. The spare machine is also very cheap. And the
> machines are so cheap, I can use VMs to run this, it's more flexible
> in resource management. As time goes by, I just install new jvm
> instance when needed. I don't worry about ram and search speed
> anymore.
> I do think there should be more bigger cases out there just like mine.
> The general distributed Lucene will be very useful. It will bring
> Lucene to more enterprise applications, or more bigger, industry
> applications.
>
>
> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
> > Sounds like you may need to have some sort of distributed system, I just
> > wanted to make sure you were aware of the cost/benifits of just buying a
> big
> > 62bit/8Gb ram machine, vs having to not only maintain and power several
> 32
> > bit machines, but also maintain and support your now more complicated
> code.
> >
> > I have seen it too many times developers/companies spend so much money in
> > not just the initial development, but long term support and maintenance
> that
> > could have been simplified by just buying a bigger/better more powerful
> > machine in the first place.
> >
> > I am interested to see what other people have to say about how to solve
> your
> > problem.
> >
> > Best regards,
> > Jacob
> >
> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
> >
> >> My data is categorized by date. About 14M+ docs per month, 37M+ terms.
> >> When I use 1G heap size to do search of 10 month index, I got OOM.
> >> The problem is I can't increase heap size in an easy way.
> >> I have several machines, all 32bit windows, 4G ram.
> >> And my goal is to index 10 year's data, plus more data every day !
> >> If I put all of them together, I will need 8G+ ram to run search.
> >> Maybe another 8G+ ram to run indexwriter.
> >>
> >> I think to split large index into smaller indexes and use a group of
> >> machines to work as one is more flexible and faster compare to one
> >> huge ram machine.
> >> Any suggestions ? beside more rams.
> >>
> >>
> >> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
> >>>
> >>> Not sure how large your index is, but it might be easier (if possible
> to
> >>> increase your memory) than to develop a fairly complicated alternative
> >>> strategy.
> >>>
> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
> >>>
> >>>> Hi, all
> >>>> I'm facing a large index, on a x86 win platform which may not have big
> >>>> enough jvm heap space to hold the entire index.
> >>>> So, I think it's possible to split the index into several smaller
> >>>> indexes, run them in different jvm instances on different machine.
> >>>> Then for each query, I can concurrently run it one every indexes and
> >>>> merge the result together.
> >>>> This can be a workaround of OutOfMemory issue.
> >>>> But before I start to do this, I want to ask if Lucene already have a
> >>>> solution for things like this.
> >>>> Thanks.
> >>>>
> >>>> --
> >>>>
> >>>> Best Regards,
> >>>> ZHAO, Wenbo
> >>>>
> >>>> =======================
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >>>> For additional commands, e-mail: java-user-help [at] lucene
> >>>>
> >>>
> >>> ____________________________________
> >>> Information Technology Services,
> >>> The University of Melbourne
> >>>
> >>> Email: jrhoden [at] unimelb
> >>> Phone: +61 3 8344 2884
> >>> Mobile: +61 4 1095 7575
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >>> For additional commands, e-mail: java-user-help [at] lucene
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Best Regards,
> >> ZHAO, Wenbo
> >>
> >> =======================
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >
> > ____________________________________
> > Information Technology Services,
> > The University of Melbourne
> >
> > Email: jrhoden [at] unimelb
> > Phone: +61 3 8344 2884
> > Mobile: +61 4 1095 7575
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
>
>
> --
>
> Best Regards,
> ZHAO, Wenbo
>
> =======================
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


zhaowb at gmail

Nov 16, 2009, 6:02 AM

Post #7 of 10 (932 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

1. No, I'm not using sort. Actually I'm just going to start read that section.
2. No, I did only one search '1234567' to 'warmup' the searcher, then OOM
3. After IndexReader/searcher is created, I do a finalize and print
total mem used, then use '1234567' to do a search for warmup, and
another finalize + print total mem.
I prepared 3 indexes
A: from 2009/06 to now, 47G. 449M after reader created, then 649M
after first search.
B: from 2009/04 to now, 63G. 592M -> 869M
C: from 2009/01 to now, 100+G. 598M -> OOM
see, I have done nothing yet.
And I have data from year 2000 to index, while I have only 32bit
windows machines...
That's why I want to make 'distributed index'
And another reason, most likely the searches will be wildcard, it's
very slow in large single index.

2009/11/16 Erick Erickson <erickerickson [at] gmail>:
> I confess that I've just skimmed your e-mail, but there's absolutely
> no requirement that the entire index fit in RAM. The fact that your
> index is larger than available RAM isn't the reason you're hitting OOM.
>
> Typical reasons for this are:
> 1> you're sorting on a field with many, many, many unique values. If
> you're sorting on a fine-grained timestamp, this is quite possible.
> 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching
> on, say, one-letter wildcards.
> 3> many other reasons.
>
> I agree with Jacob, jumping into a multi-machine solution without
> understanding the problem in detail may not be your best course.
>
> So, can you tell us more about the conditions under which you hit
> OOM? Maybe with more details we can come up with better solutions.
>
> If you absolutely *must* implement a multi-machine solution, have
> you seen ParallelMultiSearcher?
>
> Best
> Erick
>
> On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zhaowb [at] gmail> wrote:
>
>> Yes, exactly 'distributed'...
>> From maintenance point of view, the 'horizontal' expandable is very
>> important.
>> For my case, the data file is a kind of 'history' file, categorized
>> by date.  Once the data file is indexed, it will not change, unless
>> the searching fields changed.
>> Say I make whole ten years data indexed, generated 400G index,
>> requiring 8G ram.  When I do backup, I have to backup the entire 400G
>> every time.  I need another 8G machine for backup.  And 8G is not
>> enough, the index is increasing everyday.
>> Compare to distributed solution, I can split the index by year or by
>> seasons.  Say I have 10x40G index.  I can easily run 10 jvm process
>> each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
>> Consider the backup, 9 of 10 indexes are old, only need backup once,
>> they won't change.  only 1 hot index is changing everyday, so I just
>> backup up to 40G.  The spare machine is also very cheap.  And the
>> machines are so cheap, I can use VMs to run this, it's more flexible
>> in resource management.  As time goes by, I just install new jvm
>> instance when needed.  I don't worry about ram and search speed
>> anymore.
>> I do think there should be more bigger cases out there just like mine.
>>  The general distributed Lucene will be very useful.  It will bring
>> Lucene to more enterprise applications, or more bigger, industry
>> applications.
>>
>>
>> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
>> > Sounds like you may need to have some sort of distributed system, I just
>> > wanted to make sure you were aware of the cost/benifits of just buying a
>> big
>> > 62bit/8Gb ram machine, vs having to not only maintain and power several
>> 32
>> > bit machines, but also maintain and support your now more complicated
>> code.
>> >
>> > I have seen it too many times developers/companies spend so much money in
>> > not just the initial development, but long term support and maintenance
>> that
>> > could have been simplified by just buying a bigger/better more powerful
>> > machine in the first place.
>> >
>> > I am interested to see what other people have to say about how to solve
>> your
>> > problem.
>> >
>> > Best regards,
>> > Jacob
>> >
>> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
>> >
>> >> My data is categorized by date.  About 14M+ docs per month, 37M+ terms.
>> >> When I use 1G heap size to do search of 10 month index, I got OOM.
>> >> The problem is I can't increase heap size in an easy way.
>> >> I have several machines, all 32bit windows, 4G ram.
>> >> And my goal is to index 10 year's data, plus more data every day !
>> >> If I put all of them together, I will need 8G+ ram to run search.
>> >> Maybe another 8G+ ram to run indexwriter.
>> >>
>> >> I think to split large index into smaller indexes and use a group of
>> >> machines to work as one is more flexible and faster compare to one
>> >> huge ram machine.
>> >> Any suggestions ?  beside more rams.
>> >>
>> >>
>> >> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
>> >>>
>> >>> Not sure how large your index is,  but it might be easier (if possible
>> to
>> >>> increase your memory) than to develop a fairly complicated alternative
>> >>> strategy.
>> >>>
>> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
>> >>>
>> >>>> Hi, all
>> >>>> I'm facing a large index, on a x86 win platform which may not have big
>> >>>> enough jvm heap space to hold the entire index.
>> >>>> So, I think it's possible to split the index into several smaller
>> >>>> indexes, run them in different jvm instances on different machine.
>> >>>> Then for each query, I can concurrently run it one every indexes and
>> >>>> merge the result together.
>> >>>> This can be a workaround of OutOfMemory issue.
>> >>>> But before I start to do this, I want to ask if Lucene already have a
>> >>>> solution for things like this.
>> >>>> Thanks.
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Best Regards,
>> >>>> ZHAO, Wenbo
>> >>>>
>> >>>> =======================
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >>>> For additional commands, e-mail: java-user-help [at] lucene
>> >>>>
>> >>>
>> >>> ____________________________________
>> >>> Information Technology Services,
>> >>> The University of Melbourne
>> >>>
>> >>> Email: jrhoden [at] unimelb
>> >>> Phone: +61 3 8344 2884
>> >>> Mobile: +61 4 1095 7575
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >>> For additional commands, e-mail: java-user-help [at] lucene
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best Regards,
>> >> ZHAO, Wenbo
>> >>
>> >> =======================
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> For additional commands, e-mail: java-user-help [at] lucene
>> >>
>> >
>> > ____________________________________
>> > Information Technology Services,
>> > The University of Melbourne
>> >
>> > Email: jrhoden [at] unimelb
>> > Phone: +61 3 8344 2884
>> > Mobile: +61 4 1095 7575
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> > For additional commands, e-mail: java-user-help [at] lucene
>> >
>> >
>>
>>
>>
>> --
>>
>> Best Regards,
>> ZHAO, Wenbo
>>
>> =======================
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>



--

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


zhaowb at gmail

Nov 16, 2009, 6:06 AM

Post #8 of 10 (928 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

About the ParallelMultiSearcher, I don't really know that yet, just a
quick look at jdoc. It seems to be a searcher searches other
searchables. If all searchables are in same jvm, it won't help. If
there is some searchable implementation can work as proxy for a
'remote' lucene instance, then it might be what I'm looking for. Is
there such a class ?

2009/11/16 Erick Erickson <erickerickson [at] gmail>:
> I confess that I've just skimmed your e-mail, but there's absolutely
> no requirement that the entire index fit in RAM. The fact that your
> index is larger than available RAM isn't the reason you're hitting OOM.
>
> Typical reasons for this are:
> 1> you're sorting on a field with many, many, many unique values. If
> you're sorting on a fine-grained timestamp, this is quite possible.
> 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching
> on, say, one-letter wildcards.
> 3> many other reasons.
>
> I agree with Jacob, jumping into a multi-machine solution without
> understanding the problem in detail may not be your best course.
>
> So, can you tell us more about the conditions under which you hit
> OOM? Maybe with more details we can come up with better solutions.
>
> If you absolutely *must* implement a multi-machine solution, have
> you seen ParallelMultiSearcher?
>
> Best
> Erick
>
> On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zhaowb [at] gmail> wrote:
>
>> Yes, exactly 'distributed'...
>> From maintenance point of view, the 'horizontal' expandable is very
>> important.
>> For my case, the data file is a kind of 'history' file, categorized
>> by date.  Once the data file is indexed, it will not change, unless
>> the searching fields changed.
>> Say I make whole ten years data indexed, generated 400G index,
>> requiring 8G ram.  When I do backup, I have to backup the entire 400G
>> every time.  I need another 8G machine for backup.  And 8G is not
>> enough, the index is increasing everyday.
>> Compare to distributed solution, I can split the index by year or by
>> seasons.  Say I have 10x40G index.  I can easily run 10 jvm process
>> each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
>> Consider the backup, 9 of 10 indexes are old, only need backup once,
>> they won't change.  only 1 hot index is changing everyday, so I just
>> backup up to 40G.  The spare machine is also very cheap.  And the
>> machines are so cheap, I can use VMs to run this, it's more flexible
>> in resource management.  As time goes by, I just install new jvm
>> instance when needed.  I don't worry about ram and search speed
>> anymore.
>> I do think there should be more bigger cases out there just like mine.
>>  The general distributed Lucene will be very useful.  It will bring
>> Lucene to more enterprise applications, or more bigger, industry
>> applications.
>>
>>
>> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
>> > Sounds like you may need to have some sort of distributed system, I just
>> > wanted to make sure you were aware of the cost/benifits of just buying a
>> big
>> > 62bit/8Gb ram machine, vs having to not only maintain and power several
>> 32
>> > bit machines, but also maintain and support your now more complicated
>> code.
>> >
>> > I have seen it too many times developers/companies spend so much money in
>> > not just the initial development, but long term support and maintenance
>> that
>> > could have been simplified by just buying a bigger/better more powerful
>> > machine in the first place.
>> >
>> > I am interested to see what other people have to say about how to solve
>> your
>> > problem.
>> >
>> > Best regards,
>> > Jacob
>> >
>> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
>> >
>> >> My data is categorized by date.  About 14M+ docs per month, 37M+ terms.
>> >> When I use 1G heap size to do search of 10 month index, I got OOM.
>> >> The problem is I can't increase heap size in an easy way.
>> >> I have several machines, all 32bit windows, 4G ram.
>> >> And my goal is to index 10 year's data, plus more data every day !
>> >> If I put all of them together, I will need 8G+ ram to run search.
>> >> Maybe another 8G+ ram to run indexwriter.
>> >>
>> >> I think to split large index into smaller indexes and use a group of
>> >> machines to work as one is more flexible and faster compare to one
>> >> huge ram machine.
>> >> Any suggestions ?  beside more rams.
>> >>
>> >>
>> >> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
>> >>>
>> >>> Not sure how large your index is,  but it might be easier (if possible
>> to
>> >>> increase your memory) than to develop a fairly complicated alternative
>> >>> strategy.
>> >>>
>> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
>> >>>
>> >>>> Hi, all
>> >>>> I'm facing a large index, on a x86 win platform which may not have big
>> >>>> enough jvm heap space to hold the entire index.
>> >>>> So, I think it's possible to split the index into several smaller
>> >>>> indexes, run them in different jvm instances on different machine.
>> >>>> Then for each query, I can concurrently run it one every indexes and
>> >>>> merge the result together.
>> >>>> This can be a workaround of OutOfMemory issue.
>> >>>> But before I start to do this, I want to ask if Lucene already have a
>> >>>> solution for things like this.
>> >>>> Thanks.
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Best Regards,
>> >>>> ZHAO, Wenbo
>> >>>>
>> >>>> =======================
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >>>> For additional commands, e-mail: java-user-help [at] lucene
>> >>>>
>> >>>
>> >>> ____________________________________
>> >>> Information Technology Services,
>> >>> The University of Melbourne
>> >>>
>> >>> Email: jrhoden [at] unimelb
>> >>> Phone: +61 3 8344 2884
>> >>> Mobile: +61 4 1095 7575
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >>> For additional commands, e-mail: java-user-help [at] lucene
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best Regards,
>> >> ZHAO, Wenbo
>> >>
>> >> =======================
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> For additional commands, e-mail: java-user-help [at] lucene
>> >>
>> >
>> > ____________________________________
>> > Information Technology Services,
>> > The University of Melbourne
>> >
>> > Email: jrhoden [at] unimelb
>> > Phone: +61 3 8344 2884
>> > Mobile: +61 4 1095 7575
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> > For additional commands, e-mail: java-user-help [at] lucene
>> >
>> >
>>
>>
>>
>> --
>>
>> Best Regards,
>> ZHAO, Wenbo
>>
>> =======================
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>



--

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


erickerickson at gmail

Nov 16, 2009, 6:13 AM

Post #9 of 10 (923 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

I should have read more carefully.

Look at the Searchable definition. One of the concrete realizations
of that interface is a RemoteSearchable, which is what you're
asking for I think.

Have you thought about SOLR? It's built on top of Lucene and has
lots of stuff built in for handling distributed indexes....

Best
Erick


On Mon, Nov 16, 2009 at 9:06 AM, Wenbo Zhao <zhaowb [at] gmail> wrote:

> About the ParallelMultiSearcher, I don't really know that yet, just a
> quick look at jdoc. It seems to be a searcher searches other
> searchables. If all searchables are in same jvm, it won't help. If
> there is some searchable implementation can work as proxy for a
> 'remote' lucene instance, then it might be what I'm looking for. Is
> there such a class ?
>
> 2009/11/16 Erick Erickson <erickerickson [at] gmail>:
> > I confess that I've just skimmed your e-mail, but there's absolutely
> > no requirement that the entire index fit in RAM. The fact that your
> > index is larger than available RAM isn't the reason you're hitting OOM.
> >
> > Typical reasons for this are:
> > 1> you're sorting on a field with many, many, many unique values. If
> > you're sorting on a fine-grained timestamp, this is quite possible.
> > 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching
> > on, say, one-letter wildcards.
> > 3> many other reasons.
> >
> > I agree with Jacob, jumping into a multi-machine solution without
> > understanding the problem in detail may not be your best course.
> >
> > So, can you tell us more about the conditions under which you hit
> > OOM? Maybe with more details we can come up with better solutions.
> >
> > If you absolutely *must* implement a multi-machine solution, have
> > you seen ParallelMultiSearcher?
> >
> > Best
> > Erick
> >
> > On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zhaowb [at] gmail> wrote:
> >
> >> Yes, exactly 'distributed'...
> >> From maintenance point of view, the 'horizontal' expandable is very
> >> important.
> >> For my case, the data file is a kind of 'history' file, categorized
> >> by date. Once the data file is indexed, it will not change, unless
> >> the searching fields changed.
> >> Say I make whole ten years data indexed, generated 400G index,
> >> requiring 8G ram. When I do backup, I have to backup the entire 400G
> >> every time. I need another 8G machine for backup. And 8G is not
> >> enough, the index is increasing everyday.
> >> Compare to distributed solution, I can split the index by year or by
> >> seasons. Say I have 10x40G index. I can easily run 10 jvm process
> >> each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
> >> Consider the backup, 9 of 10 indexes are old, only need backup once,
> >> they won't change. only 1 hot index is changing everyday, so I just
> >> backup up to 40G. The spare machine is also very cheap. And the
> >> machines are so cheap, I can use VMs to run this, it's more flexible
> >> in resource management. As time goes by, I just install new jvm
> >> instance when needed. I don't worry about ram and search speed
> >> anymore.
> >> I do think there should be more bigger cases out there just like mine.
> >> The general distributed Lucene will be very useful. It will bring
> >> Lucene to more enterprise applications, or more bigger, industry
> >> applications.
> >>
> >>
> >> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
> >> > Sounds like you may need to have some sort of distributed system, I
> just
> >> > wanted to make sure you were aware of the cost/benifits of just buying
> a
> >> big
> >> > 62bit/8Gb ram machine, vs having to not only maintain and power
> several
> >> 32
> >> > bit machines, but also maintain and support your now more complicated
> >> code.
> >> >
> >> > I have seen it too many times developers/companies spend so much money
> in
> >> > not just the initial development, but long term support and
> maintenance
> >> that
> >> > could have been simplified by just buying a bigger/better more
> powerful
> >> > machine in the first place.
> >> >
> >> > I am interested to see what other people have to say about how to
> solve
> >> your
> >> > problem.
> >> >
> >> > Best regards,
> >> > Jacob
> >> >
> >> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
> >> >
> >> >> My data is categorized by date. About 14M+ docs per month, 37M+
> terms.
> >> >> When I use 1G heap size to do search of 10 month index, I got OOM.
> >> >> The problem is I can't increase heap size in an easy way.
> >> >> I have several machines, all 32bit windows, 4G ram.
> >> >> And my goal is to index 10 year's data, plus more data every day !
> >> >> If I put all of them together, I will need 8G+ ram to run search.
> >> >> Maybe another 8G+ ram to run indexwriter.
> >> >>
> >> >> I think to split large index into smaller indexes and use a group of
> >> >> machines to work as one is more flexible and faster compare to one
> >> >> huge ram machine.
> >> >> Any suggestions ? beside more rams.
> >> >>
> >> >>
> >> >> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
> >> >>>
> >> >>> Not sure how large your index is, but it might be easier (if
> possible
> >> to
> >> >>> increase your memory) than to develop a fairly complicated
> alternative
> >> >>> strategy.
> >> >>>
> >> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
> >> >>>
> >> >>>> Hi, all
> >> >>>> I'm facing a large index, on a x86 win platform which may not have
> big
> >> >>>> enough jvm heap space to hold the entire index.
> >> >>>> So, I think it's possible to split the index into several smaller
> >> >>>> indexes, run them in different jvm instances on different machine.
> >> >>>> Then for each query, I can concurrently run it one every indexes
> and
> >> >>>> merge the result together.
> >> >>>> This can be a workaround of OutOfMemory issue.
> >> >>>> But before I start to do this, I want to ask if Lucene already have
> a
> >> >>>> solution for things like this.
> >> >>>> Thanks.
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Best Regards,
> >> >>>> ZHAO, Wenbo
> >> >>>>
> >> >>>> =======================
> >> >>>>
> >> >>>>
> ---------------------------------------------------------------------
> >> >>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> >>>> For additional commands, e-mail: java-user-help [at] lucene
> >> >>>>
> >> >>>
> >> >>> ____________________________________
> >> >>> Information Technology Services,
> >> >>> The University of Melbourne
> >> >>>
> >> >>> Email: jrhoden [at] unimelb
> >> >>> Phone: +61 3 8344 2884
> >> >>> Mobile: +61 4 1095 7575
> >> >>>
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> >>> For additional commands, e-mail: java-user-help [at] lucene
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Best Regards,
> >> >> ZHAO, Wenbo
> >> >>
> >> >> =======================
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> >> For additional commands, e-mail: java-user-help [at] lucene
> >> >>
> >> >
> >> > ____________________________________
> >> > Information Technology Services,
> >> > The University of Melbourne
> >> >
> >> > Email: jrhoden [at] unimelb
> >> > Phone: +61 3 8344 2884
> >> > Mobile: +61 4 1095 7575
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> > For additional commands, e-mail: java-user-help [at] lucene
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> Best Regards,
> >> ZHAO, Wenbo
> >>
> >> =======================
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >
>
>
>
> --
>
> Best Regards,
> ZHAO, Wenbo
>
> =======================
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


zhaowb at gmail

Nov 16, 2009, 6:35 AM

Post #10 of 10 (925 views)
Permalink
Re: Can Lucene unite multiple instances run as one ? [In reply to]

I just checked 2.9.1 doc from
http://lucene.apache.org/java/2_9_1/api/core/index.html
I can't find the RemoteSearchable you mentioned.
I don't know SOLR yet, look at it tomorrow.
Thanks

2009/11/16 Erick Erickson <erickerickson [at] gmail>:
> I should have read more carefully.
>
> Look at the Searchable definition. One of the concrete realizations
> of that interface is a RemoteSearchable, which is what you're
> asking for I think.
>
> Have you thought about SOLR? It's built on top of Lucene and has
> lots of stuff built in for handling distributed indexes....
>
> Best
> Erick
>
>
> On Mon, Nov 16, 2009 at 9:06 AM, Wenbo Zhao <zhaowb [at] gmail> wrote:
>
>> About the ParallelMultiSearcher, I don't really know that yet, just a
>> quick look at jdoc.  It seems to be a searcher searches other
>> searchables.   If all searchables are in same jvm, it won't help.   If
>> there is some searchable implementation can work as proxy for a
>> 'remote' lucene instance, then it might be what I'm looking for.  Is
>> there such a class ?
>>
>> 2009/11/16 Erick Erickson <erickerickson [at] gmail>:
>> > I confess that I've just skimmed your e-mail, but there's absolutely
>> > no requirement that the entire index fit in RAM. The fact that your
>> > index is larger than available RAM isn't the reason you're hitting OOM.
>> >
>> > Typical reasons for this are:
>> > 1> you're sorting on a field with many, many, many unique values. If
>> > you're sorting on a fine-grained timestamp, this is quite possible.
>> > 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching
>> > on, say, one-letter wildcards.
>> > 3> many other reasons.
>> >
>> > I agree with Jacob, jumping into a multi-machine solution without
>> > understanding the problem in detail may not be your best course.
>> >
>> > So, can you tell us more about the conditions under which you hit
>> > OOM? Maybe with more details we can come up with better solutions.
>> >
>> > If you absolutely *must* implement a multi-machine solution, have
>> > you seen ParallelMultiSearcher?
>> >
>> > Best
>> > Erick
>> >
>> > On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zhaowb [at] gmail> wrote:
>> >
>> >> Yes, exactly 'distributed'...
>> >> From maintenance point of view, the 'horizontal' expandable is very
>> >> important.
>> >> For my case, the data file is a kind of 'history' file, categorized
>> >> by date.  Once the data file is indexed, it will not change, unless
>> >> the searching fields changed.
>> >> Say I make whole ten years data indexed, generated 400G index,
>> >> requiring 8G ram.  When I do backup, I have to backup the entire 400G
>> >> every time.  I need another 8G machine for backup.  And 8G is not
>> >> enough, the index is increasing everyday.
>> >> Compare to distributed solution, I can split the index by year or by
>> >> seasons.  Say I have 10x40G index.  I can easily run 10 jvm process
>> >> each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
>> >> Consider the backup, 9 of 10 indexes are old, only need backup once,
>> >> they won't change.  only 1 hot index is changing everyday, so I just
>> >> backup up to 40G.  The spare machine is also very cheap.  And the
>> >> machines are so cheap, I can use VMs to run this, it's more flexible
>> >> in resource management.  As time goes by, I just install new jvm
>> >> instance when needed.  I don't worry about ram and search speed
>> >> anymore.
>> >> I do think there should be more bigger cases out there just like mine.
>> >>  The general distributed Lucene will be very useful.  It will bring
>> >> Lucene to more enterprise applications, or more bigger, industry
>> >> applications.
>> >>
>> >>
>> >> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
>> >> > Sounds like you may need to have some sort of distributed system, I
>> just
>> >> > wanted to make sure you were aware of the cost/benifits of just buying
>> a
>> >> big
>> >> > 62bit/8Gb ram machine, vs having to not only maintain and power
>> several
>> >> 32
>> >> > bit machines, but also maintain and support your now more complicated
>> >> code.
>> >> >
>> >> > I have seen it too many times developers/companies spend so much money
>> in
>> >> > not just the initial development, but long term support and
>> maintenance
>> >> that
>> >> > could have been simplified by just buying a bigger/better more
>> powerful
>> >> > machine in the first place.
>> >> >
>> >> > I am interested to see what other people have to say about how to
>> solve
>> >> your
>> >> > problem.
>> >> >
>> >> > Best regards,
>> >> > Jacob
>> >> >
>> >> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
>> >> >
>> >> >> My data is categorized by date.  About 14M+ docs per month, 37M+
>> terms.
>> >> >> When I use 1G heap size to do search of 10 month index, I got OOM.
>> >> >> The problem is I can't increase heap size in an easy way.
>> >> >> I have several machines, all 32bit windows, 4G ram.
>> >> >> And my goal is to index 10 year's data, plus more data every day !
>> >> >> If I put all of them together, I will need 8G+ ram to run search.
>> >> >> Maybe another 8G+ ram to run indexwriter.
>> >> >>
>> >> >> I think to split large index into smaller indexes and use a group of
>> >> >> machines to work as one is more flexible and faster compare to one
>> >> >> huge ram machine.
>> >> >> Any suggestions ?  beside more rams.
>> >> >>
>> >> >>
>> >> >> 2009/11/16 Jacob Rhoden <jrhoden [at] unimelb>:
>> >> >>>
>> >> >>> Not sure how large your index is,  but it might be easier (if
>> possible
>> >> to
>> >> >>> increase your memory) than to develop a fairly complicated
>> alternative
>> >> >>> strategy.
>> >> >>>
>> >> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
>> >> >>>
>> >> >>>> Hi, all
>> >> >>>> I'm facing a large index, on a x86 win platform which may not have
>> big
>> >> >>>> enough jvm heap space to hold the entire index.
>> >> >>>> So, I think it's possible to split the index into several smaller
>> >> >>>> indexes, run them in different jvm instances on different machine.
>> >> >>>> Then for each query, I can concurrently run it one every indexes
>> and
>> >> >>>> merge the result together.
>> >> >>>> This can be a workaround of OutOfMemory issue.
>> >> >>>> But before I start to do this, I want to ask if Lucene already have
>> a
>> >> >>>> solution for things like this.
>> >> >>>> Thanks.
>> >> >>>>
>> >> >>>> --
>> >> >>>>
>> >> >>>> Best Regards,
>> >> >>>> ZHAO, Wenbo
>> >> >>>>
>> >> >>>> =======================
>> >> >>>>
>> >> >>>>
>> ---------------------------------------------------------------------
>> >> >>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> >>>> For additional commands, e-mail: java-user-help [at] lucene
>> >> >>>>
>> >> >>>
>> >> >>> ____________________________________
>> >> >>> Information Technology Services,
>> >> >>> The University of Melbourne
>> >> >>>
>> >> >>> Email: jrhoden [at] unimelb
>> >> >>> Phone: +61 3 8344 2884
>> >> >>> Mobile: +61 4 1095 7575
>> >> >>>
>> >> >>>
>> >> >>>
>> ---------------------------------------------------------------------
>> >> >>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> >>> For additional commands, e-mail: java-user-help [at] lucene
>> >> >>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Best Regards,
>> >> >> ZHAO, Wenbo
>> >> >>
>> >> >> =======================
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> >> For additional commands, e-mail: java-user-help [at] lucene
>> >> >>
>> >> >
>> >> > ____________________________________
>> >> > Information Technology Services,
>> >> > The University of Melbourne
>> >> >
>> >> > Email: jrhoden [at] unimelb
>> >> > Phone: +61 3 8344 2884
>> >> > Mobile: +61 4 1095 7575
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> > For additional commands, e-mail: java-user-help [at] lucene
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best Regards,
>> >> ZHAO, Wenbo
>> >>
>> >> =======================
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> For additional commands, e-mail: java-user-help [at] lucene
>> >>
>> >>
>> >
>>
>>
>>
>> --
>>
>> Best Regards,
>> ZHAO, Wenbo
>>
>> =======================
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>



--

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.