Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Search Ranking

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


meeraj.kunnumpurath at asyska

May 16, 2012, 12:41 PM

Post #1 of 8 (386 views)
Permalink
Search Ranking

Hi,

I am quite new to Lucene. I am trying to use it to index listings of local
businesses. The index has only one field, that stores the attributes of a
listing as well as email addresses of users who have rated that business.

For example,

Listing 1: "XYZ Takeaway London fred [at] company barney [at] company
fred [at] company"
Listing 2: "ABC Takeaway London fred [at] company barney [at] company"

Now when the user does a search with "Takeaway fred [at] company", how do I
get listing 1 to always come before listing 2, because it has the term
fred [at] company appear twice where as listing 2 has it only once?

Regards
Meeraj


ivan at brusic

May 16, 2012, 12:49 PM

Post #2 of 8 (374 views)
Permalink
Re: Search Ranking [In reply to]

Use the explain function to understand why the query is producing the
results you see.

http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query,
int)

Does your current query return Listing 2 first? That might be because
of term frequencies. Which analyzers are you using?

http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63

Cheers,

Ivan

On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
<meeraj.kunnumpurath [at] asyska> wrote:
> Hi,
>
> I am quite new to Lucene. I am trying to use it to index listings of local
> businesses. The index has only one field, that stores the attributes of a
> listing as well as email addresses of users who have rated that business.
>
> For example,
>
> Listing 1: "XYZ Takeaway London fred [at] company barney [at] company
> fred [at] company"
> Listing 2: "ABC Takeaway London fred [at] company barney [at] company"
>
> Now when the user does a search with "Takeaway fred [at] company", how do I
> get listing 1 to always come before listing 2, because it has the term
> fred [at] company appear twice where as listing 2 has it only once?
>
> Regards
> Meeraj

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


meeraj.kunnumpurath at asyska

May 16, 2012, 1:21 PM

Post #3 of 8 (372 views)
Permalink
Re: Search Ranking [In reply to]

Thanks Ivan.

I don't use Lucene directly, it is used behind the scene by the Neo4J graph
database for full-text indexing. According to their documentation for full
text indexes they use white space tokenizer in the analyser. Yes, I do get
Listing 2 first now. Though if I exclude the term "Takeaway" from the
search string, and just put "fred [at] company", I get Listing 1 first.

Regards
Meeraj

On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan [at] brusic> wrote:

> Use the explain function to understand why the query is producing the
> results you see.
>
>
> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
> ,
> int)
>
> Does your current query return Listing 2 first? That might be because
> of term frequencies. Which analyzers are you using?
>
> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>
> Cheers,
>
> Ivan
>
> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
> <meeraj.kunnumpurath [at] asyska> wrote:
> > Hi,
> >
> > I am quite new to Lucene. I am trying to use it to index listings of
> local
> > businesses. The index has only one field, that stores the attributes of a
> > listing as well as email addresses of users who have rated that business.
> >
> > For example,
> >
> > Listing 1: "XYZ Takeaway London fred [at] company barney [at] company
> > fred [at] company"
> > Listing 2: "ABC Takeaway London fred [at] company barney [at] company"
> >
> > Now when the user does a search with "Takeaway fred [at] company", how
> do I
> > get listing 1 to always come before listing 2, because it has the term
> > fred [at] company appear twice where as listing 2 has it only once?
> >
> > Regards
> > Meeraj
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


meeraj.kunnumpurath at asyska

May 16, 2012, 1:48 PM

Post #4 of 8 (373 views)
Permalink
Re: Search Ranking [In reply to]

I have tried the same using Lucene directly with the following code,

import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.util.Version;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.search.ScoreDoc;

public class LuceneTest {

public static void main(String[] args) throws Exception {

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
RAMDirectory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35,
analyzer);
IndexWriter indexWriter = new IndexWriter(index, config);

Document doc1 = new Document();
doc1.add(new Field("searchText", "ABC Takeaway fred [at] company
fred [at] company", Field.Store.YES, Field.Index.ANALYZED));
Document doc2 = new Document();
doc2.add(new Field("searchText", "XYZ Takeaway fred [at] company",
Field.Store.YES, Field.Index.ANALYZED));

indexWriter.addDocument(doc1);
indexWriter.addDocument(doc2);
indexWriter.close();

Query q = new QueryParser(Version.LUCENE_35, "searchText",
analyzer).parse("Takeaway");

int hitsPerPage = 10;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector =
TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

System.out.println("Found " + hits.length + " hits.");
for(int i=0;i<hits.length;++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get("searchText"));
}

}

}

The output is ..

Found 2 hits.
1. XYZ Takeaway fred [at] company
2. ABC Takeaway fred [at] company fred [at] company

On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
meeraj.kunnumpurath [at] asyska> wrote:

> Thanks Ivan.
>
> I don't use Lucene directly, it is used behind the scene by the Neo4J
> graph database for full-text indexing. According to their documentation for
> full text indexes they use white space tokenizer in the analyser. Yes, I do
> get Listing 2 first now. Though if I exclude the term "Takeaway" from the
> search string, and just put "fred [at] company", I get Listing 1 first.
>
> Regards
> Meeraj
>
>
> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan [at] brusic> wrote:
>
>> Use the explain function to understand why the query is producing the
>> results you see.
>>
>>
>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>> ,
>> int)
>>
>> Does your current query return Listing 2 first? That might be because
>> of term frequencies. Which analyzers are you using?
>>
>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>
>> Cheers,
>>
>> Ivan
>>
>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>> <meeraj.kunnumpurath [at] asyska> wrote:
>> > Hi,
>> >
>> > I am quite new to Lucene. I am trying to use it to index listings of
>> local
>> > businesses. The index has only one field, that stores the attributes of
>> a
>> > listing as well as email addresses of users who have rated that
>> business.
>> >
>> > For example,
>> >
>> > Listing 1: "XYZ Takeaway London fred [at] company barney [at] company
>> > fred [at] company"
>> > Listing 2: "ABC Takeaway London fred [at] company barney [at] company"
>> >
>> > Now when the user does a search with "Takeaway fred [at] company", how
>> do I
>> > get listing 1 to always come before listing 2, because it has the term
>> > fred [at] company appear twice where as listing 2 has it only once?
>> >
>> > Regards
>> > Meeraj
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>


meeraj.kunnumpurath at asyska

May 16, 2012, 1:50 PM

Post #5 of 8 (375 views)
Permalink
Re: Search Ranking [In reply to]

The actual query is

Query q = new QueryParser(Version.LUCENE_35, "searchText",
analyzer).parse("Takeaway fred [at] company");

If I use

Query q = new QueryParser(Version.LUCENE_35, "searchText", analyzer).parse("
fred [at] company");

I get them in the reverse order.

Regards
Meeraj

On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath <
meeraj.kunnumpurath [at] asyska> wrote:

> I have tried the same using Lucene directly with the following code,
>
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.util.Version;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.TopScoreDocCollector;
> import org.apache.lucene.search.ScoreDoc;
>
> public class LuceneTest {
>
> public static void main(String[] args) throws Exception {
>
> StandardAnalyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_35);
> RAMDirectory index = new RAMDirectory();
> IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35,
> analyzer);
> IndexWriter indexWriter = new IndexWriter(index, config);
>
> Document doc1 = new Document();
> doc1.add(new Field("searchText", "ABC Takeaway fred [at] company
> fred [at] company", Field.Store.YES, Field.Index.ANALYZED));
> Document doc2 = new Document();
> doc2.add(new Field("searchText", "XYZ Takeaway fred [at] company",
> Field.Store.YES, Field.Index.ANALYZED));
>
> indexWriter.addDocument(doc1);
> indexWriter.addDocument(doc2);
> indexWriter.close();
>
> Query q = new QueryParser(Version.LUCENE_35, "searchText",
> analyzer).parse("Takeaway");
>
> int hitsPerPage = 10;
> IndexReader reader = IndexReader.open(index);
> IndexSearcher searcher = new IndexSearcher(reader);
> TopScoreDocCollector collector =
> TopScoreDocCollector.create(hitsPerPage, true);
> searcher.search(q, collector);
> ScoreDoc[] hits = collector.topDocs().scoreDocs;
>
> System.out.println("Found " + hits.length + " hits.");
> for(int i=0;i<hits.length;++i) {
> int docId = hits[i].doc;
> Document d = searcher.doc(docId);
> System.out.println((i + 1) + ". " + d.get("searchText"));
> }
>
> }
>
> }
>
> The output is ..
>
> Found 2 hits.
> 1. XYZ Takeaway fred [at] company
> 2. ABC Takeaway fred [at] company fred [at] company
>
>
> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
> meeraj.kunnumpurath [at] asyska> wrote:
>
>> Thanks Ivan.
>>
>> I don't use Lucene directly, it is used behind the scene by the Neo4J
>> graph database for full-text indexing. According to their documentation for
>> full text indexes they use white space tokenizer in the analyser. Yes, I do
>> get Listing 2 first now. Though if I exclude the term "Takeaway" from the
>> search string, and just put "fred [at] company", I get Listing 1 first.
>>
>> Regards
>> Meeraj
>>
>>
>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan [at] brusic> wrote:
>>
>>> Use the explain function to understand why the query is producing the
>>> results you see.
>>>
>>>
>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>>> ,
>>> int)
>>>
>>> Does your current query return Listing 2 first? That might be because
>>> of term frequencies. Which analyzers are you using?
>>>
>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>>
>>> Cheers,
>>>
>>> Ivan
>>>
>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>>> <meeraj.kunnumpurath [at] asyska> wrote:
>>> > Hi,
>>> >
>>> > I am quite new to Lucene. I am trying to use it to index listings of
>>> local
>>> > businesses. The index has only one field, that stores the attributes
>>> of a
>>> > listing as well as email addresses of users who have rated that
>>> business.
>>> >
>>> > For example,
>>> >
>>> > Listing 1: "XYZ Takeaway London fred [at] company barney [at] company
>>> > fred [at] company"
>>> > Listing 2: "ABC Takeaway London fred [at] company barney [at] company"
>>> >
>>> > Now when the user does a search with "Takeaway fred [at] company", how
>>> do I
>>> > get listing 1 to always come before listing 2, because it has the term
>>> > fred [at] company appear twice where as listing 2 has it only once?
>>> >
>>> > Regards
>>> > Meeraj
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>>
>>
>


meeraj.kunnumpurath at asyska

May 16, 2012, 1:52 PM

Post #6 of 8 (376 views)
Permalink
Re: Search Ranking [In reply to]

This is the output I get from explaining the plan ..

Found 2 hits.
1. XYZ Takeaway fred [at] company
0.5148823 = (MATCH) sum of:
0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of:
0.57735026 = queryWeight(searchText:takeaway), product of:
0.5945349 = idf(docFreq=2, maxDocs=2)
0.97109574 = queryNorm
0.29726744 = (MATCH) fieldWeight(searchText:takeaway in 1), product of:
1.0 = tf(termFreq(searchText:takeaway)=1)
0.5945349 = idf(docFreq=2, maxDocs=2)
0.5 = fieldNorm(field=searchText, doc=1)
0.34325486 = (MATCH) sum of:
0.17162743 = (MATCH) weight(searchText:fred in 1), product of:
0.57735026 = queryWeight(searchText:fred), product of:
0.5945349 = idf(docFreq=2, maxDocs=2)
0.97109574 = queryNorm
0.29726744 = (MATCH) fieldWeight(searchText:fred in 1), product of:
1.0 = tf(termFreq(searchText:fred)=1)
0.5945349 = idf(docFreq=2, maxDocs=2)
0.5 = fieldNorm(field=searchText, doc=1)
0.17162743 = (MATCH) weight(searchText:company.com in 1), product of:
0.57735026 = queryWeight(searchText:company.com), product of:
0.5945349 = idf(docFreq=2, maxDocs=2)
0.97109574 = queryNorm
0.29726744 = (MATCH) fieldWeight(searchText:company.com in 1),
product of:
1.0 = tf(termFreq(searchText:company.com)=1)
0.5945349 = idf(docFreq=2, maxDocs=2)
0.5 = fieldNorm(field=searchText, doc=1)

2. ABC Takeaway fred [at] company fred [at] company
0.49279732 = (MATCH) sum of:
0.12872057 = (MATCH) weight(searchText:takeaway in 0), product of:
0.57735026 = queryWeight(searchText:takeaway), product of:
0.5945349 = idf(docFreq=2, maxDocs=2)
0.97109574 = queryNorm
0.22295058 = (MATCH) fieldWeight(searchText:takeaway in 0), product of:
1.0 = tf(termFreq(searchText:takeaway)=1)
0.5945349 = idf(docFreq=2, maxDocs=2)
0.375 = fieldNorm(field=searchText, doc=0)
0.36407676 = (MATCH) sum of:
0.18203838 = (MATCH) weight(searchText:fred in 0), product of:
0.57735026 = queryWeight(searchText:fred), product of:
0.5945349 = idf(docFreq=2, maxDocs=2)
0.97109574 = queryNorm
0.31529972 = (MATCH) fieldWeight(searchText:fred in 0), product of:
1.4142135 = tf(termFreq(searchText:fred)=2)
0.5945349 = idf(docFreq=2, maxDocs=2)
0.375 = fieldNorm(field=searchText, doc=0)
0.18203838 = (MATCH) weight(searchText:company.com in 0), product of:
0.57735026 = queryWeight(searchText:company.com), product of:
0.5945349 = idf(docFreq=2, maxDocs=2)
0.97109574 = queryNorm
0.31529972 = (MATCH) fieldWeight(searchText:company.com in 0),
product of:
1.4142135 = tf(termFreq(searchText:company.com)=2)
0.5945349 = idf(docFreq=2, maxDocs=2)
0.375 = fieldNorm(field=searchText, doc=0)

On Wed, May 16, 2012 at 9:50 PM, Meeraj Kunnumpurath <
meeraj.kunnumpurath [at] asyska> wrote:

> The actual query is
>
> Query q = new QueryParser(Version.LUCENE_35, "searchText",
> analyzer).parse("Takeaway fred [at] company");
>
> If I use
>
> Query q = new QueryParser(Version.LUCENE_35, "searchText",
> analyzer).parse("fred [at] company");
>
> I get them in the reverse order.
>
> Regards
> Meeraj
>
>
> On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath <
> meeraj.kunnumpurath [at] asyska> wrote:
>
>> I have tried the same using Lucene directly with the following code,
>>
>> import org.apache.lucene.store.RAMDirectory;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.index.IndexWriterConfig;
>> import org.apache.lucene.util.Version;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.index.IndexReader;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.search.Query;
>> import org.apache.lucene.search.TopScoreDocCollector;
>> import org.apache.lucene.search.ScoreDoc;
>>
>> public class LuceneTest {
>>
>> public static void main(String[] args) throws Exception {
>>
>> StandardAnalyzer analyzer = new
>> StandardAnalyzer(Version.LUCENE_35);
>> RAMDirectory index = new RAMDirectory();
>> IndexWriterConfig config = new
>> IndexWriterConfig(Version.LUCENE_35,
>> analyzer);
>> IndexWriter indexWriter = new IndexWriter(index, config);
>>
>> Document doc1 = new Document();
>> doc1.add(new Field("searchText", "ABC Takeaway fred [at] company
>> fred [at] company", Field.Store.YES, Field.Index.ANALYZED));
>> Document doc2 = new Document();
>> doc2.add(new Field("searchText", "XYZ Takeaway fred [at] company",
>> Field.Store.YES, Field.Index.ANALYZED));
>>
>> indexWriter.addDocument(doc1);
>> indexWriter.addDocument(doc2);
>> indexWriter.close();
>>
>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>> analyzer).parse("Takeaway");
>>
>> int hitsPerPage = 10;
>> IndexReader reader = IndexReader.open(index);
>> IndexSearcher searcher = new IndexSearcher(reader);
>> TopScoreDocCollector collector =
>> TopScoreDocCollector.create(hitsPerPage, true);
>> searcher.search(q, collector);
>> ScoreDoc[] hits = collector.topDocs().scoreDocs;
>>
>> System.out.println("Found " + hits.length + " hits.");
>> for(int i=0;i<hits.length;++i) {
>> int docId = hits[i].doc;
>> Document d = searcher.doc(docId);
>> System.out.println((i + 1) + ". " + d.get("searchText"));
>> }
>>
>> }
>>
>> }
>>
>> The output is ..
>>
>> Found 2 hits.
>> 1. XYZ Takeaway fred [at] company
>> 2. ABC Takeaway fred [at] company fred [at] company
>>
>>
>> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
>> meeraj.kunnumpurath [at] asyska> wrote:
>>
>>> Thanks Ivan.
>>>
>>> I don't use Lucene directly, it is used behind the scene by the Neo4J
>>> graph database for full-text indexing. According to their documentation for
>>> full text indexes they use white space tokenizer in the analyser. Yes, I do
>>> get Listing 2 first now. Though if I exclude the term "Takeaway" from the
>>> search string, and just put "fred [at] company", I get Listing 1 first.
>>>
>>> Regards
>>> Meeraj
>>>
>>>
>>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan [at] brusic> wrote:
>>>
>>>> Use the explain function to understand why the query is producing the
>>>> results you see.
>>>>
>>>>
>>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>>>> ,
>>>> int)
>>>>
>>>> Does your current query return Listing 2 first? That might be because
>>>> of term frequencies. Which analyzers are you using?
>>>>
>>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>>>
>>>> Cheers,
>>>>
>>>> Ivan
>>>>
>>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>>>> <meeraj.kunnumpurath [at] asyska> wrote:
>>>> > Hi,
>>>> >
>>>> > I am quite new to Lucene. I am trying to use it to index listings of
>>>> local
>>>> > businesses. The index has only one field, that stores the attributes
>>>> of a
>>>> > listing as well as email addresses of users who have rated that
>>>> business.
>>>> >
>>>> > For example,
>>>> >
>>>> > Listing 1: "XYZ Takeaway London fred [at] company barney [at] company
>>>> > fred [at] company"
>>>> > Listing 2: "ABC Takeaway London fred [at] company barney [at] company"
>>>> >
>>>> > Now when the user does a search with "Takeaway fred [at] company",
>>>> how do I
>>>> > get listing 1 to always come before listing 2, because it has the term
>>>> > fred [at] company appear twice where as listing 2 has it only once?
>>>> >
>>>> > Regards
>>>> > Meeraj
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>>> For additional commands, e-mail: java-user-help [at] lucene
>>>>
>>>>
>>>
>>
>


meeraj.kunnumpurath at asyska

May 16, 2012, 1:54 PM

Post #7 of 8 (382 views)
Permalink
Re: Search Ranking [In reply to]

Also, if I do the below

Query q = new QueryParser(Version.LUCENE_35, "searchText",
analyzer).parse("Takeaway fred [at] company^100")

I get them in reverse order. Do I need to boost the term, even if it
appears more than once in the document?

Regards
Meeraj

On Wed, May 16, 2012 at 9:52 PM, Meeraj Kunnumpurath <
meeraj.kunnumpurath [at] asyska> wrote:

> This is the output I get from explaining the plan ..
>
>
> Found 2 hits.
> 1. XYZ Takeaway fred [at] company
> 0.5148823 = (MATCH) sum of:
> 0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of:
> 0.57735026 = queryWeight(searchText:takeaway), product of:
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.97109574 = queryNorm
> 0.29726744 = (MATCH) fieldWeight(searchText:takeaway in 1), product of:
> 1.0 = tf(termFreq(searchText:takeaway)=1)
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.5 = fieldNorm(field=searchText, doc=1)
> 0.34325486 = (MATCH) sum of:
> 0.17162743 = (MATCH) weight(searchText:fred in 1), product of:
> 0.57735026 = queryWeight(searchText:fred), product of:
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.97109574 = queryNorm
> 0.29726744 = (MATCH) fieldWeight(searchText:fred in 1), product of:
> 1.0 = tf(termFreq(searchText:fred)=1)
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.5 = fieldNorm(field=searchText, doc=1)
> 0.17162743 = (MATCH) weight(searchText:company.com in 1), product of:
> 0.57735026 = queryWeight(searchText:company.com), product of:
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.97109574 = queryNorm
> 0.29726744 = (MATCH) fieldWeight(searchText:company.com in 1),
> product of:
> 1.0 = tf(termFreq(searchText:company.com)=1)
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.5 = fieldNorm(field=searchText, doc=1)
>
>
> 2. ABC Takeaway fred [at] company fred [at] company
> 0.49279732 = (MATCH) sum of:
> 0.12872057 = (MATCH) weight(searchText:takeaway in 0), product of:
> 0.57735026 = queryWeight(searchText:takeaway), product of:
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.97109574 = queryNorm
> 0.22295058 = (MATCH) fieldWeight(searchText:takeaway in 0), product of:
> 1.0 = tf(termFreq(searchText:takeaway)=1)
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.375 = fieldNorm(field=searchText, doc=0)
> 0.36407676 = (MATCH) sum of:
> 0.18203838 = (MATCH) weight(searchText:fred in 0), product of:
> 0.57735026 = queryWeight(searchText:fred), product of:
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.97109574 = queryNorm
> 0.31529972 = (MATCH) fieldWeight(searchText:fred in 0), product of:
> 1.4142135 = tf(termFreq(searchText:fred)=2)
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.375 = fieldNorm(field=searchText, doc=0)
> 0.18203838 = (MATCH) weight(searchText:company.com in 0), product of:
> 0.57735026 = queryWeight(searchText:company.com), product of:
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.97109574 = queryNorm
> 0.31529972 = (MATCH) fieldWeight(searchText:company.com in 0),
> product of:
> 1.4142135 = tf(termFreq(searchText:company.com)=2)
> 0.5945349 = idf(docFreq=2, maxDocs=2)
> 0.375 = fieldNorm(field=searchText, doc=0)
>
>
> On Wed, May 16, 2012 at 9:50 PM, Meeraj Kunnumpurath <
> meeraj.kunnumpurath [at] asyska> wrote:
>
>> The actual query is
>>
>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>> analyzer).parse("Takeaway fred [at] company");
>>
>> If I use
>>
>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>> analyzer).parse("fred [at] company");
>>
>> I get them in the reverse order.
>>
>> Regards
>> Meeraj
>>
>>
>> On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath <
>> meeraj.kunnumpurath [at] asyska> wrote:
>>
>>> I have tried the same using Lucene directly with the following code,
>>>
>>> import org.apache.lucene.store.RAMDirectory;
>>> import org.apache.lucene.document.Document;
>>> import org.apache.lucene.document.Field;
>>> import org.apache.lucene.index.IndexWriterConfig;
>>> import org.apache.lucene.util.Version;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.index.IndexWriter;
>>> import org.apache.lucene.queryParser.QueryParser;
>>> import org.apache.lucene.index.IndexReader;
>>> import org.apache.lucene.search.IndexSearcher;
>>> import org.apache.lucene.search.Query;
>>> import org.apache.lucene.search.TopScoreDocCollector;
>>> import org.apache.lucene.search.ScoreDoc;
>>>
>>> public class LuceneTest {
>>>
>>> public static void main(String[] args) throws Exception {
>>>
>>> StandardAnalyzer analyzer = new
>>> StandardAnalyzer(Version.LUCENE_35);
>>> RAMDirectory index = new RAMDirectory();
>>> IndexWriterConfig config = new
>>> IndexWriterConfig(Version.LUCENE_35,
>>> analyzer);
>>> IndexWriter indexWriter = new IndexWriter(index, config);
>>>
>>> Document doc1 = new Document();
>>> doc1.add(new Field("searchText", "ABC Takeaway fred [at] company
>>> fred [at] company", Field.Store.YES, Field.Index.ANALYZED));
>>> Document doc2 = new Document();
>>> doc2.add(new Field("searchText", "XYZ Takeaway fred [at] company",
>>> Field.Store.YES, Field.Index.ANALYZED));
>>>
>>> indexWriter.addDocument(doc1);
>>> indexWriter.addDocument(doc2);
>>> indexWriter.close();
>>>
>>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>>> analyzer).parse("Takeaway");
>>>
>>> int hitsPerPage = 10;
>>> IndexReader reader = IndexReader.open(index);
>>> IndexSearcher searcher = new IndexSearcher(reader);
>>> TopScoreDocCollector collector =
>>> TopScoreDocCollector.create(hitsPerPage, true);
>>> searcher.search(q, collector);
>>> ScoreDoc[] hits = collector.topDocs().scoreDocs;
>>>
>>> System.out.println("Found " + hits.length + " hits.");
>>> for(int i=0;i<hits.length;++i) {
>>> int docId = hits[i].doc;
>>> Document d = searcher.doc(docId);
>>> System.out.println((i + 1) + ". " + d.get("searchText"));
>>> }
>>>
>>> }
>>>
>>> }
>>>
>>> The output is ..
>>>
>>> Found 2 hits.
>>> 1. XYZ Takeaway fred [at] company
>>> 2. ABC Takeaway fred [at] company fred [at] company
>>>
>>>
>>> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
>>> meeraj.kunnumpurath [at] asyska> wrote:
>>>
>>>> Thanks Ivan.
>>>>
>>>> I don't use Lucene directly, it is used behind the scene by the Neo4J
>>>> graph database for full-text indexing. According to their documentation for
>>>> full text indexes they use white space tokenizer in the analyser. Yes, I do
>>>> get Listing 2 first now. Though if I exclude the term "Takeaway" from the
>>>> search string, and just put "fred [at] company", I get Listing 1 first.
>>>>
>>>> Regards
>>>> Meeraj
>>>>
>>>>
>>>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan [at] brusic> wrote:
>>>>
>>>>> Use the explain function to understand why the query is producing the
>>>>> results you see.
>>>>>
>>>>>
>>>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>>>>> ,
>>>>> int)
>>>>>
>>>>> Does your current query return Listing 2 first? That might be because
>>>>> of term frequencies. Which analyzers are you using?
>>>>>
>>>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ivan
>>>>>
>>>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>>>>> <meeraj.kunnumpurath [at] asyska> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am quite new to Lucene. I am trying to use it to index listings of
>>>>> local
>>>>> > businesses. The index has only one field, that stores the attributes
>>>>> of a
>>>>> > listing as well as email addresses of users who have rated that
>>>>> business.
>>>>> >
>>>>> > For example,
>>>>> >
>>>>> > Listing 1: "XYZ Takeaway London fred [at] company barney [at] company
>>>>> > fred [at] company"
>>>>> > Listing 2: "ABC Takeaway London fred [at] company barney [at] company"
>>>>> >
>>>>> > Now when the user does a search with "Takeaway fred [at] company",
>>>>> how do I
>>>>> > get listing 1 to always come before listing 2, because it has the
>>>>> term
>>>>> > fred [at] company appear twice where as listing 2 has it only once?
>>>>> >
>>>>> > Regards
>>>>> > Meeraj
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>>>> For additional commands, e-mail: java-user-help [at] lucene
>>>>>
>>>>>
>>>>
>>>
>>
>


ivan at brusic

May 17, 2012, 3:52 PM

Post #8 of 8 (362 views)
Permalink
Re: Search Ranking [In reply to]

If you read the explain output, you can see where the scores are
different. One difference with a noticeable affect is:

1.0 = tf(termFreq(searchText:fred)=1)
0.5 = fieldNorm(field=searchText, doc=1)
vs.
1.4142135 = tf(termFreq(searchText:fred)=2)
0.375 = fieldNorm(field=searchText, doc=0)

As predicted, the term frequencies and norms are affecting the
scoring. Try omitting norms on the field and try your query again.

field.setOmitNorms(true) or Field.Index.ANALYZED_NO_NORMS

Cheers,

Ivan

On Wed, May 16, 2012 at 1:54 PM, Meeraj Kunnumpurath
<meeraj.kunnumpurath [at] asyska> wrote:
> Also, if I do the below
>
> Query q = new QueryParser(Version.LUCENE_35, "searchText",
> analyzer).parse("Takeaway fred [at] company^100")
>
> I get them in reverse order. Do I need to boost the term, even if it
> appears more than once in the document?
>
> Regards
> Meeraj
>
> On Wed, May 16, 2012 at 9:52 PM, Meeraj Kunnumpurath <
> meeraj.kunnumpurath [at] asyska> wrote:
>
>> This is the output I get from explaining the plan ..
>>
>>
>> Found 2 hits.
>> 1. XYZ Takeaway fred [at] company
>> 0.5148823 = (MATCH) sum of:
>>   0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of:
>>     0.57735026 = queryWeight(searchText:takeaway), product of:
>>       0.5945349 = idf(docFreq=2, maxDocs=2)
>>       0.97109574 = queryNorm
>>     0.29726744 = (MATCH) fieldWeight(searchText:takeaway in 1), product of:
>>       1.0 = tf(termFreq(searchText:takeaway)=1)
>>       0.5945349 = idf(docFreq=2, maxDocs=2)
>>       0.5 = fieldNorm(field=searchText, doc=1)
>>   0.34325486 = (MATCH) sum of:
>>     0.17162743 = (MATCH) weight(searchText:fred in 1), product of:
>>       0.57735026 = queryWeight(searchText:fred), product of:
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.97109574 = queryNorm
>>       0.29726744 = (MATCH) fieldWeight(searchText:fred in 1), product of:
>>         1.0 = tf(termFreq(searchText:fred)=1)
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.5 = fieldNorm(field=searchText, doc=1)
>>     0.17162743 = (MATCH) weight(searchText:company.com in 1), product of:
>>       0.57735026 = queryWeight(searchText:company.com), product of:
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.97109574 = queryNorm
>>       0.29726744 = (MATCH) fieldWeight(searchText:company.com in 1),
>> product of:
>>         1.0 = tf(termFreq(searchText:company.com)=1)
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.5 = fieldNorm(field=searchText, doc=1)
>>
>>
>> 2. ABC Takeaway fred [at] company fred [at] company
>> 0.49279732 = (MATCH) sum of:
>>   0.12872057 = (MATCH) weight(searchText:takeaway in 0), product of:
>>     0.57735026 = queryWeight(searchText:takeaway), product of:
>>       0.5945349 = idf(docFreq=2, maxDocs=2)
>>       0.97109574 = queryNorm
>>     0.22295058 = (MATCH) fieldWeight(searchText:takeaway in 0), product of:
>>       1.0 = tf(termFreq(searchText:takeaway)=1)
>>       0.5945349 = idf(docFreq=2, maxDocs=2)
>>       0.375 = fieldNorm(field=searchText, doc=0)
>>   0.36407676 = (MATCH) sum of:
>>     0.18203838 = (MATCH) weight(searchText:fred in 0), product of:
>>       0.57735026 = queryWeight(searchText:fred), product of:
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.97109574 = queryNorm
>>       0.31529972 = (MATCH) fieldWeight(searchText:fred in 0), product of:
>>         1.4142135 = tf(termFreq(searchText:fred)=2)
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.375 = fieldNorm(field=searchText, doc=0)
>>     0.18203838 = (MATCH) weight(searchText:company.com in 0), product of:
>>       0.57735026 = queryWeight(searchText:company.com), product of:
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.97109574 = queryNorm
>>       0.31529972 = (MATCH) fieldWeight(searchText:company.com in 0),
>> product of:
>>         1.4142135 = tf(termFreq(searchText:company.com)=2)
>>         0.5945349 = idf(docFreq=2, maxDocs=2)
>>         0.375 = fieldNorm(field=searchText, doc=0)
>>
>>
>> On Wed, May 16, 2012 at 9:50 PM, Meeraj Kunnumpurath <
>> meeraj.kunnumpurath [at] asyska> wrote:
>>
>>> The actual query is
>>>
>>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>>> analyzer).parse("Takeaway fred [at] company");
>>>
>>> If I use
>>>
>>> Query q = new QueryParser(Version.LUCENE_35, "searchText",
>>> analyzer).parse("fred [at] company");
>>>
>>> I get them in the reverse order.
>>>
>>> Regards
>>> Meeraj
>>>
>>>
>>> On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath <
>>> meeraj.kunnumpurath [at] asyska> wrote:
>>>
>>>> I have tried the same using Lucene directly with the following code,
>>>>
>>>> import org.apache.lucene.store.RAMDirectory;
>>>> import org.apache.lucene.document.Document;
>>>> import org.apache.lucene.document.Field;
>>>> import org.apache.lucene.index.IndexWriterConfig;
>>>> import org.apache.lucene.util.Version;
>>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>>> import org.apache.lucene.index.IndexWriter;
>>>> import org.apache.lucene.queryParser.QueryParser;
>>>> import org.apache.lucene.index.IndexReader;
>>>> import org.apache.lucene.search.IndexSearcher;
>>>> import org.apache.lucene.search.Query;
>>>> import org.apache.lucene.search.TopScoreDocCollector;
>>>> import org.apache.lucene.search.ScoreDoc;
>>>>
>>>> public class LuceneTest {
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>
>>>>         StandardAnalyzer analyzer = new
>>>> StandardAnalyzer(Version.LUCENE_35);
>>>>         RAMDirectory index = new RAMDirectory();
>>>>         IndexWriterConfig config = new
>>>> IndexWriterConfig(Version.LUCENE_35,
>>>>                 analyzer);
>>>>         IndexWriter indexWriter = new IndexWriter(index, config);
>>>>
>>>>         Document doc1 = new Document();
>>>>         doc1.add(new Field("searchText", "ABC Takeaway fred [at] company
>>>> fred [at] company", Field.Store.YES, Field.Index.ANALYZED));
>>>>         Document doc2 = new Document();
>>>>         doc2.add(new Field("searchText", "XYZ Takeaway fred [at] company",
>>>> Field.Store.YES, Field.Index.ANALYZED));
>>>>
>>>>         indexWriter.addDocument(doc1);
>>>>         indexWriter.addDocument(doc2);
>>>>         indexWriter.close();
>>>>
>>>>         Query q = new QueryParser(Version.LUCENE_35, "searchText",
>>>> analyzer).parse("Takeaway");
>>>>
>>>>         int hitsPerPage = 10;
>>>>         IndexReader reader = IndexReader.open(index);
>>>>         IndexSearcher searcher = new IndexSearcher(reader);
>>>>         TopScoreDocCollector collector =
>>>> TopScoreDocCollector.create(hitsPerPage, true);
>>>>         searcher.search(q, collector);
>>>>         ScoreDoc[] hits = collector.topDocs().scoreDocs;
>>>>
>>>>         System.out.println("Found " + hits.length + " hits.");
>>>>         for(int i=0;i<hits.length;++i) {
>>>>             int docId = hits[i].doc;
>>>>             Document d = searcher.doc(docId);
>>>>             System.out.println((i + 1) + ". " + d.get("searchText"));
>>>>         }
>>>>
>>>>     }
>>>>
>>>> }
>>>>
>>>> The output is ..
>>>>
>>>> Found 2 hits.
>>>> 1. XYZ Takeaway fred [at] company
>>>> 2. ABC Takeaway fred [at] company fred [at] company
>>>>
>>>>
>>>> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath <
>>>> meeraj.kunnumpurath [at] asyska> wrote:
>>>>
>>>>> Thanks Ivan.
>>>>>
>>>>> I don't use Lucene directly, it is used behind the scene by the Neo4J
>>>>> graph database for full-text indexing. According to their documentation for
>>>>> full text indexes they use white space tokenizer in the analyser. Yes, I do
>>>>> get Listing 2 first now. Though if I exclude the term "Takeaway" from the
>>>>> search string, and just put "fred [at] company", I get Listing 1 first.
>>>>>
>>>>> Regards
>>>>> Meeraj
>>>>>
>>>>>
>>>>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <ivan [at] brusic> wrote:
>>>>>
>>>>>> Use the explain function to understand why the query is producing the
>>>>>> results you see.
>>>>>>
>>>>>>
>>>>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query
>>>>>> ,
>>>>>> int)
>>>>>>
>>>>>> Does your current query return Listing 2 first? That might be because
>>>>>> of term frequencies. Which analyzers are you using?
>>>>>>
>>>>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath
>>>>>> <meeraj.kunnumpurath [at] asyska> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I am quite new to Lucene. I am trying to use it to index listings of
>>>>>> local
>>>>>> > businesses. The index has only one field, that stores the attributes
>>>>>> of a
>>>>>> > listing as well as email addresses of users who have rated that
>>>>>> business.
>>>>>> >
>>>>>> > For example,
>>>>>> >
>>>>>> > Listing 1: "XYZ Takeaway London fred [at] company barney [at] company
>>>>>> > fred [at] company"
>>>>>> > Listing 2: "ABC Takeaway London fred [at] company barney [at] company"
>>>>>> >
>>>>>> > Now when the user does a search with "Takeaway fred [at] company",
>>>>>> how do I
>>>>>> > get listing 1 to always come before listing 2, because it has the
>>>>>> term
>>>>>> > fred [at] company appear twice where as listing 2 has it only once?
>>>>>> >
>>>>>> > Regards
>>>>>> > Meeraj
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>>>>> For additional commands, e-mail: java-user-help [at] lucene
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.