Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

optimized searching

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


m.harig at gmail

Jun 29, 2009, 11:01 PM

Post #1 of 7 (453 views)
Permalink
optimized searching

hello all,

i've gone through most of the posts from this forum , i need a code
snippet for searching large index, currently am iterating ,

hits = searher.search(query);
for (int inc = 0; inc < hits.length(); inc++) {

Document doc = hits.doc(inc);

String title = doc.get("title");

/// etc.......
}


its not good by the way when you use large index. am running it from tomcat
6 .0 , java heap space is 256 MB ,
please any1 help me



--
View this message in context: http://www.nabble.com/optimized-searching-tp24266553p24266553.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


ian.lea at gmail

Jun 30, 2009, 2:02 AM

Post #2 of 7 (427 views)
Permalink
Re: optimized searching [In reply to]

What exactly is the problem? Are you concerned about the time that
your code snippet takes to run, or how much memory it uses?

If you have a query that matches many documents then iterating through
all of them, as your code does, is inevitably going to take time. See
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed for
suggestions.

It is unclear why your code snippet would use a lot of memory. Are
you maybe storing all the titles in memory?


--
Ian.



On Tue, Jun 30, 2009 at 7:01 AM, m.harig<m.harig[at]gmail.com> wrote:
>
> hello all,
>
>       i've gone through most of the posts from this forum , i need a code
> snippet for searching large index,  currently am iterating ,
>
>       hits = searher.search(query);
>      for (int inc = 0; inc < hits.length(); inc++) {
>
>                        Document doc = hits.doc(inc);
>
>                        String title = doc.get("title");
>
>        ///    etc.......
>        }
>
>
> its not good by the way when you use large index. am running it from tomcat
> 6 .0 ,  java heap space is 256 MB ,
> please any1 help me
>
>
>
> --
> View this message in context: http://www.nabble.com/optimized-searching-tp24266553p24266553.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


erickerickson at gmail

Jun 30, 2009, 4:50 AM

Post #3 of 7 (423 views)
Permalink
Re: optimized searching [In reply to]

in Ian's link, particularly see the section "Don't iterate over morehits
than necessary".

A couple of other things:
1> Loading the entire document just to get a field or two isn't
very efficient, think about lazy loading (See FieldSelector)
2> What do you mean when you say "not very good"? Using too
much memory? Slow?

Perhaps if you gave us a higher level idea of what you're trying to
accomplish we could make better suggestions.

Best
Erick

On Tue, Jun 30, 2009 at 5:02 AM, Ian Lea <ian.lea[at]gmail.com> wrote:

> What exactly is the problem? Are you concerned about the time that
> your code snippet takes to run, or how much memory it uses?
>
> If you have a query that matches many documents then iterating through
> all of them, as your code does, is inevitably going to take time. See
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed for
> suggestions.
>
> It is unclear why your code snippet would use a lot of memory. Are
> you maybe storing all the titles in memory?
>
>
> --
> Ian.
>
>
>
> On Tue, Jun 30, 2009 at 7:01 AM, m.harig<m.harig[at]gmail.com> wrote:
> >
> > hello all,
> >
> > i've gone through most of the posts from this forum , i need a code
> > snippet for searching large index, currently am iterating ,
> >
> > hits = searher.search(query);
> > for (int inc = 0; inc < hits.length(); inc++) {
> >
> > Document doc = hits.doc(inc);
> >
> > String title = doc.get("title");
> >
> > /// etc.......
> > }
> >
> >
> > its not good by the way when you use large index. am running it from
> tomcat
> > 6 .0 , java heap space is 256 MB ,
> > please any1 help me
> >
> >
> >
> > --
> > View this message in context:
> http://www.nabble.com/optimized-searching-tp24266553p24266553.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> > For additional commands, e-mail: java-user-help[at]lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>


m.harig at gmail

Jun 30, 2009, 5:38 AM

Post #4 of 7 (423 views)
Permalink
Re: optimized searching [In reply to]

Thanks eric

in Ian's link, particularly see the section "Don't iterate over morehits
than necessary".

A couple of other things:
1> Loading the entire document just to get a field or two isn't
very efficient, think about lazy loading (See FieldSelector)
i done it , but have couple of questions

2> What do you mean when you say "not very good"? Using too
much memory? Slow?
yes , of course , it went for java heap space .


here is my code

IndexReader open = IndexReader.open(indexDir);
IndexSearcher searcher = new IndexSearcher(open);
final String fName = "title";
QueryParser parser = new QueryParser("contents", new StopAnalyzer());
Query query = parser.parse(qryStr);

TopDocCollector collector = new TopDocCollector(1000);//
searcher.search(query, collector);

FieldSelector selector = new FieldSelector() {
public FieldSelectorResult accept(String fieldName) {
return fieldName == fName ? FieldSelectorResult.LOAD
: FieldSelectorResult.LAZY_LOAD;
}


};

final int totalHits = collector.getTotalHits();
ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;


for (int i = 0; i < totalHits; i++) {
Document doc = searcher.doc(scoreDocs[i].doc, selector);

System.out.println(i+" ) "+doc.get("title"));
System.out.println(doc.get("path"));

}

can you please tune my code to work it faster and better, is it possible to
display total hits like google , since am using new TopDocCollector(1000);
it won't allow you to pick total hits ?? am i right???

--
View this message in context: http://www.nabble.com/optimized-searching-tp24266553p24271145.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


ian.lea at gmail

Jun 30, 2009, 6:21 AM

Post #5 of 7 (421 views)
Permalink
Re: optimized searching [In reply to]

Have you read the javadocs? What does collector.getTotalHits() return?
Does it return the same when you use new TopDocCollector(1000) and
some other number? Are you asking basically the same questions in 2
different threads at the same time?

You are still iterating over many hits and that will still take longer
than if you iterate over fewer hits.


--
Ian.




On Tue, Jun 30, 2009 at 1:38 PM, m.harig<m.harig[at]gmail.com> wrote:
>
> Thanks eric
>
> in Ian's link, particularly see the section "Don't iterate over morehits
> than necessary".
>
> A couple of other things:
> 1> Loading the entire document just to get a field or two isn't
>     very efficient, think about lazy loading (See FieldSelector)
>   i done it , but have couple of questions
>
> 2> What do you mean when you say "not very good"? Using too
>      much memory? Slow?
>   yes , of course , it went for java heap space .
>
>
> here is my code
>
>                IndexReader open = IndexReader.open(indexDir);
>                IndexSearcher searcher = new IndexSearcher(open);
>                final String fName = "title";
>                QueryParser parser = new QueryParser("contents", new StopAnalyzer());
>                Query query = parser.parse(qryStr);
>
>                TopDocCollector collector = new TopDocCollector(1000);//
>                searcher.search(query, collector);
>
>                FieldSelector selector = new FieldSelector() {
>                        public FieldSelectorResult accept(String fieldName) {
>                                return fieldName == fName ? FieldSelectorResult.LOAD
>                                                : FieldSelectorResult.LAZY_LOAD;
>                        }
>
>
>                };
>
>                final int totalHits = collector.getTotalHits();
>                ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
>
>
>                for (int i = 0; i < totalHits; i++) {
>                        Document doc = searcher.doc(scoreDocs[i].doc, selector);
>
>                        System.out.println(i+" ) "+doc.get("title"));
>                        System.out.println(doc.get("path"));
>
>                }
>
> can you please tune my code to work it faster and better,  is it possible to
> display total hits like google , since am using new TopDocCollector(1000);
> it won't allow you to pick total hits ?? am i right???
>
> --
> View this message in context: http://www.nabble.com/optimized-searching-tp24266553p24271145.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


erickerickson at gmail

Jun 30, 2009, 6:45 AM

Post #6 of 7 (419 views)
Permalink
Re: optimized searching [In reply to]

<<<can you please tune my code to work it faster and better>>>

Are you willing to pay me to do your job for you? Sorry to besnarky, but
please be aware that we're volunteers here, it's
pretty presumptuous to ask for this.

You still haven't answered what it is you're trying to do. Why are
you collecting 1,000 titles? What's the purpose? Are you just
experimenting? Because I don't understand your use-case. There's
no point in trying to make code efficient unless you're trying to
solve a real problem.

*Of course* you'll see memory grow until the garbage
collector kicks in. This is what java does. System.out.printlns
are pretty slow. First queries are slow. So unless and until you
clearly state the problem you're trying to solve, there's not much
we can do.

Best
Erick

On Tue, Jun 30, 2009 at 8:38 AM, m.harig <m.harig[at]gmail.com> wrote:

>
> Thanks eric
>
> in Ian's link, particularly see the section "Don't iterate over morehits
> than necessary".
>
> A couple of other things:
> 1> Loading the entire document just to get a field or two isn't
> very efficient, think about lazy loading (See FieldSelector)
> i done it , but have couple of questions
>
> 2> What do you mean when you say "not very good"? Using too
> much memory? Slow?
> yes , of course , it went for java heap space .
>
>
> here is my code
>
> IndexReader open = IndexReader.open(indexDir);
> IndexSearcher searcher = new IndexSearcher(open);
> final String fName = "title";
> QueryParser parser = new QueryParser("contents", new
> StopAnalyzer());
> Query query = parser.parse(qryStr);
>
> TopDocCollector collector = new TopDocCollector(1000);//
> searcher.search(query, collector);
>
> FieldSelector selector = new FieldSelector() {
> public FieldSelectorResult accept(String fieldName)
> {
> return fieldName == fName ?
> FieldSelectorResult.LOAD
> :
> FieldSelectorResult.LAZY_LOAD;
> }
>
>
> };
>
> final int totalHits = collector.getTotalHits();
> ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
>
>
> for (int i = 0; i < totalHits; i++) {
> Document doc = searcher.doc(scoreDocs[i].doc,
> selector);
>
> System.out.println(i+" ) "+doc.get("title"));
> System.out.println(doc.get("path"));
>
> }
>
> can you please tune my code to work it faster and better, is it possible
> to
> display total hits like google , since am using new TopDocCollector(1000);
> it won't allow you to pick total hits ?? am i right???
>
> --
> View this message in context:
> http://www.nabble.com/optimized-searching-tp24266553p24271145.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>


simon.willnauer at googlemail

Jun 30, 2009, 7:22 AM

Post #7 of 7 (422 views)
Permalink
Re: optimized searching [In reply to]

On Tue, Jun 30, 2009 at 3:21 PM, Ian Lea<ian.lea[at]gmail.com> wrote:
> Have you read the javadocs? What does collector.getTotalHits() return?
>  Does it return the same when you use new TopDocCollector(1000) and
> some other number?  Are you asking basically the same questions in 2
> different threads at the same time?
>
> You are still iterating over many hits and that will still take longer
> than if you iterate over fewer hits.
>
>
> --
> Ian.
>
>
>
>
> On Tue, Jun 30, 2009 at 1:38 PM, m.harig<m.harig[at]gmail.com> wrote:
>>
>> Thanks eric
>>
>> in Ian's link, particularly see the section "Don't iterate over morehits
>> than necessary".
>>
>> A couple of other things:
>> 1> Loading the entire document just to get a field or two isn't
>>     very efficient, think about lazy loading (See FieldSelector)
>>   i done it , but have couple of questions
>>
>> 2> What do you mean when you say "not very good"? Using too
>>      much memory? Slow?
>>   yes , of course , it went for java heap space .
>>
>>
>> here is my code
>>
>>                IndexReader open = IndexReader.open(indexDir);
>>                IndexSearcher searcher = new IndexSearcher(open);
>>                final String fName = "title";
>>                QueryParser parser = new QueryParser("contents", new StopAnalyzer());
>>                Query query = parser.parse(qryStr);
>>
>>                TopDocCollector collector = new TopDocCollector(1000);//
>>                searcher.search(query, collector);
>>
>>                FieldSelector selector = new FieldSelector() {
>>                        public FieldSelectorResult accept(String fieldName) {
>>                                return fieldName == fName ? FieldSelectorResult.LOAD
>>                                                : FieldSelectorResult.LAZY_LOAD;
>>                        }
>>
>>
>>                };
>>
>>                final int totalHits = collector.getTotalHits();
>>                ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
>>
>>
>>                for (int i = 0; i < totalHits; i++) {
>>                        Document doc = searcher.doc(scoreDocs[i].doc, selector);
>>
>>                        System.out.println(i+" ) "+doc.get("title"));
>>                        System.out.println(doc.get("path"));
>>
>>                }
funny that you past the code I did send you as an example and ask in a
separate thread for tuning :D

simon
>>
>> can you please tune my code to work it faster and better,  is it possible to
>> display total hits like google , since am using new TopDocCollector(1000);
>> it won't allow you to pick total hits ?? am i right???
>>
>> --
>> View this message in context: http://www.nabble.com/optimized-searching-tp24266553p24271145.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.