Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

PhoneticFilterFactory 's inject parameter

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


evanchastelet at gmail

Apr 23, 2012, 12:27 PM

Post #1 of 7 (398 views)
Permalink
PhoneticFilterFactory 's inject parameter

Hi all,

(scroll to bottom for question)

I was setting up a simple web app to play around with phonetic filters.
The idea is simple, I just create a document for each word in the
English dictionary, each document containing a single search field
holding the value after it is preprocessed using the following analyzer
def (in our own dsl syntax, which gets transformed to java):

analyzer soundslike{
tokenizer = KeywordTokenizer
tokenfilter = LowerCaseFilter
tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
}

I can run the web app and I get results that indeed (in some way) sound
like the original query term.

But what confuses me is the ranking of the results, knowing that I set
the inject param to true. If I search for the query term 'compete', the
parsed query becomes '(value:KMPT value:compete)', and therefore I
expect the word 'compete' to be ranked highest in the list than any
other word.... but this wasn't the case.

Looking further at the explanation of results, I saw that the term
'compete' in the parsed query is totally absent, and only the phonetic
encoding seems affect the ranking:

* COMPETITOR
o 4.368826 = (MATCH) sum of:
+ 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
# 0.52838135 = queryWeight(value:KMPT), product of:
* 8.26832 = idf(docFreq=150, maxDocs=216555)
* 0.063904315 = queryNorm
# 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
product of:
* 1.0 = tf(termFreq(value:KMPT)=1)
* 8.26832 = idf(docFreq=150, maxDocs=216555)
* 1.0 = fieldNorm(field=value, doc=3174)

The next thing I did was running our friend Luke. In Luke, I opened the
documents tab, and started iterating over some terms for the field
'value' until I found 'compete'. When I hit 'Show All Docs', the search
tab opens and it displays the one and only document holding this value
(i.e. the document representing the word 'compete'). It shows the query:
'value:compete '. Then, when I hit the search button again (query is
still 'value:compete '), it says that there are no results !?

Probably, the 'Show All Docs' button does something different than
performing a query using the search tab in Luke.

Q: Can somebody explain why the injected original terms seem to get
ignored at query time? Or may it be related to the name of the search
field ('value'), or something else?

We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).

-Elmer


evanchastelet at gmail

Apr 24, 2012, 2:12 AM

Post #2 of 7 (396 views)
Permalink
Re: PhoneticFilterFactory 's inject parameter [In reply to]

Little correction:

> Looking further at the explanation of results, I saw that the term
> 'compete' in the parsed query is totally absent, and only the phonetic
> encoding seems affect the ranking...

should be:
> Looking further at the explanation of results, I saw that _the term
> 'compete' is totally absent _/_in the scoring*_/, and only the
> phonetic encoding seems affect the ranking...

* and /present/ in the parsed query as previously stated.

-Elmer


evanchastelet at gmail

Apr 25, 2012, 3:25 AM

Post #3 of 7 (381 views)
Permalink
Re: PhoneticFilterFactory 's inject parameter [In reply to]

Problem solved. Long story short: for some reason I had deleted
documents in the index and the non-deleted documents used the phonetic
filter with inject set to false.

Works fine now :)

On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
> Hi all,
>
> (scroll to bottom for question)
>
> I was setting up a simple web app to play around with phonetic filters.
> The idea is simple, I just create a document for each word in the
> English dictionary, each document containing a single search field
> holding the value after it is preprocessed using the following
> analyzer def (in our own dsl syntax, which gets transformed to java):
>
> analyzer soundslike{
> tokenizer = KeywordTokenizer
> tokenfilter = LowerCaseFilter
> tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
> }
>
> I can run the web app and I get results that indeed (in some way)
> sound like the original query term.
>
> But what confuses me is the ranking of the results, knowing that I set
> the inject param to true. If I search for the query term 'compete',
> the parsed query becomes '(value:KMPT value:compete)', and therefore I
> expect the word 'compete' to be ranked highest in the list than any
> other word.... but this wasn't the case.
>
> Looking further at the explanation of results, I saw that the term
> 'compete' in the parsed query is totally absent, and only the phonetic
> encoding seems affect the ranking:
>
> * COMPETITOR
> o 4.368826 = (MATCH) sum of:
> + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
> # 0.52838135 = queryWeight(value:KMPT), product of:
> * 8.26832 = idf(docFreq=150, maxDocs=216555)
> * 0.063904315 = queryNorm
> # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
> product of:
> * 1.0 = tf(termFreq(value:KMPT)=1)
> * 8.26832 = idf(docFreq=150, maxDocs=216555)
> * 1.0 = fieldNorm(field=value, doc=3174)
>
> The next thing I did was running our friend Luke. In Luke, I opened
> the documents tab, and started iterating over some terms for the field
> 'value' until I found 'compete'. When I hit 'Show All Docs', the
> search tab opens and it displays the one and only document holding
> this value (i.e. the document representing the word 'compete'). It
> shows the query: 'value:compete '. Then, when I hit the search button
> again (query is still 'value:compete '), it says that there are no
> results !?
>
> Probably, the 'Show All Docs' button does something different than
> performing a query using the search tab in Luke.
>
> Q: Can somebody explain why the injected original terms seem to get
> ignored at query time? Or may it be related to the name of the search
> field ('value'), or something else?
>
> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>
> -Elmer
>
>


evanchastelet at gmail

Apr 25, 2012, 5:22 AM

Post #4 of 7 (384 views)
Permalink
Re: PhoneticFilterFactory 's inject parameter [In reply to]

I keep replying to myself, it all gets a bit confusing.
The problem still exists and I don't understand why, and why it worked once.

I have the same behavior again as posted in my first mail:
- Inject parameter is set to true.
- The index has _no deleted documents_ and is optimized.
- The term 'compete' is in there.
- If I ask Luke to show all docs for term 'compete' it shows me the one
and only document that represents this word. But...
- If I perform the query 'value:compete' in luke again, it says there
are no results.

Here is the index I'm currently using. It contains various fields for
the available phonetic filter encoders:
https://www.box.com/s/34212e82227e102f6734

Can somebody explain this behavior? What's the real use of the inject
parameter of the PhoneticFilterFactory?

Thanks in advance.

-Elmer


On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
> Problem solved. Long story short: for some reason I had deleted
> documents in the index and the non-deleted documents used the phonetic
> filter with inject set to false.
>
> Works fine now :)
>
> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>> Hi all,
>>
>> (scroll to bottom for question)
>>
>> I was setting up a simple web app to play around with phonetic filters.
>> The idea is simple, I just create a document for each word in the
>> English dictionary, each document containing a single search field
>> holding the value after it is preprocessed using the following
>> analyzer def (in our own dsl syntax, which gets transformed to java):
>>
>> analyzer soundslike{
>> tokenizer = KeywordTokenizer
>> tokenfilter = LowerCaseFilter
>> tokenfilter = PhoneticFilter(encoder="DoubleMetaphone",
>> inject="true")
>> }
>>
>> I can run the web app and I get results that indeed (in some way)
>> sound like the original query term.
>>
>> But what confuses me is the ranking of the results, knowing that I
>> set the inject param to true. If I search for the query term
>> 'compete', the parsed query becomes '(value:KMPT value:compete)', and
>> therefore I expect the word 'compete' to be ranked highest in the
>> list than any other word.... but this wasn't the case.
>>
>> Looking further at the explanation of results, I saw that the term
>> 'compete' in the parsed query is totally absent, and only the
>> phonetic encoding seems affect the ranking:
>>
>> * COMPETITOR
>> o 4.368826 = (MATCH) sum of:
>> + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>> # 0.52838135 = queryWeight(value:KMPT), product of:
>> * 8.26832 = idf(docFreq=150, maxDocs=216555)
>> * 0.063904315 = queryNorm
>> # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>> product of:
>> * 1.0 = tf(termFreq(value:KMPT)=1)
>> * 8.26832 = idf(docFreq=150, maxDocs=216555)
>> * 1.0 = fieldNorm(field=value, doc=3174)
>>
>> The next thing I did was running our friend Luke. In Luke, I opened
>> the documents tab, and started iterating over some terms for the
>> field 'value' until I found 'compete'. When I hit 'Show All Docs',
>> the search tab opens and it displays the one and only document
>> holding this value (i.e. the document representing the word
>> 'compete'). It shows the query: 'value:compete '. Then, when I hit
>> the search button again (query is still 'value:compete '), it says
>> that there are no results !?
>>
>> Probably, the 'Show All Docs' button does something different than
>> performing a query using the search tab in Luke.
>>
>> Q: Can somebody explain why the injected original terms seem to get
>> ignored at query time? Or may it be related to the name of the search
>> field ('value'), or something else?
>>
>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>
>> -Elmer
>>
>>
>


ian.lea at gmail

Apr 25, 2012, 5:53 AM

Post #5 of 7 (379 views)
Permalink
Re: PhoneticFilterFactory 's inject parameter [In reply to]

You seem to be quietly going round in circles, by yourself! I suggest
a small self-contained program/test case with a RAM index created from
scratch. You can then experiment with inject on or off and if you
still can't figure it out, post the code and hopefully someone will be
able to help you make sense of it.

Make sure you tell us what version of Lucene you are using. If not
the latest, wouldn't hurt to try with the latest.


--
Ian.


On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
<evanchastelet [at] gmail> wrote:
> I keep replying to myself, it all gets a bit confusing.
> The problem still exists and I don't understand why, and why it worked once.
>
> I have the same behavior again as posted in my first mail:
> - Inject parameter is set to true.
> - The index has _no deleted documents_ and is optimized.
> - The term 'compete' is in there.
> - If I ask Luke to show all docs for term 'compete' it shows me the one and
> only document that represents this word. But...
> - If I perform the query 'value:compete' in luke again, it says there are no
> results.
>
> Here is the index I'm currently using. It contains various fields for the
> available phonetic filter encoders:
> https://www.box.com/s/34212e82227e102f6734
>
> Can somebody explain this behavior? What's the real use of the inject
> parameter of the PhoneticFilterFactory?
>
> Thanks in advance.
>
> -Elmer
>
>
> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>
>> Problem solved. Long story short: for some reason I had deleted documents
>> in the index and the non-deleted documents used the phonetic filter with
>> inject set to false.
>>
>> Works fine now :)
>>
>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>
>>> Hi all,
>>>
>>> (scroll to bottom for question)
>>>
>>> I was setting up a simple web app to play around with phonetic filters.
>>> The idea is simple, I just create a document for each word in the English
>>> dictionary, each document containing a single search field holding the value
>>> after it is preprocessed using the following analyzer def (in our own dsl
>>> syntax, which gets transformed to java):
>>>
>>> analyzer soundslike{
>>>    tokenizer = KeywordTokenizer
>>>    tokenfilter = LowerCaseFilter
>>>    tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
>>> }
>>>
>>> I can run the web app and I get results that indeed (in some way) sound
>>> like the original query term.
>>>
>>> But what confuses me is the ranking of the results, knowing that I set
>>> the inject param to true. If I search for the query term 'compete', the
>>> parsed query becomes '(value:KMPT value:compete)', and therefore I expect
>>> the word 'compete' to be ranked highest in the list than any other word....
>>> but this wasn't the case.
>>>
>>> Looking further at the explanation of results, I saw that the term
>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>> encoding seems affect the ranking:
>>>
>>>  * COMPETITOR
>>>      o 4.368826 = (MATCH) sum of:
>>>          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>>              # 0.52838135 = queryWeight(value:KMPT), product of:
>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>                  * 0.063904315 = queryNorm
>>>              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>                product of:
>>>                  * 1.0 = tf(termFreq(value:KMPT)=1)
>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>                  * 1.0 = fieldNorm(field=value, doc=3174)
>>>
>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>> documents tab, and started iterating over some terms for the field 'value'
>>> until I found 'compete'. When I hit 'Show All Docs', the search tab opens
>>> and it displays the one and only document holding this value (i.e. the
>>> document representing the word 'compete'). It shows the query:
>>> 'value:compete '. Then, when I hit the search button again (query is still
>>> 'value:compete '), it says that there are no results !?
>>>
>>> Probably, the 'Show All Docs' button does something different than
>>> performing a query using the search tab in Luke.
>>>
>>> Q: Can somebody explain why the injected original terms seem to get
>>> ignored at query time? Or may it be related to the name of the search field
>>> ('value'), or something else?
>>>
>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>
>>> -Elmer
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


evanchastelet at gmail

Apr 25, 2012, 8:02 AM

Post #6 of 7 (378 views)
Permalink
Re: PhoneticFilterFactory 's inject parameter [In reply to]

Thanks for your suggestion Ian, but I just found out that if I replace
the KeywordTokenizer with a WhitespaceTokenizer, all seems to work fine.

Just to test what happens, I created another field 'orig', using this
analyzer:
analyzer KeywordLowered{
tokenizer = KeywordTokenizer
tokenfilter = LowerCaseFilter
}

Guess what.. exactly the same problem, also in Luke.
It finds no documents with for query:
orig:strange
While the term 'strange' is in the index for the field 'orig'.

Does anybody have a clue why documents are not matched when using the
KeywordTokenizer? Remember that all queries and terms don't contain
white spaces.


Thanks again.
-Elmer


On 04/25/2012 02:53 PM, Ian Lea wrote:
> You seem to be quietly going round in circles, by yourself! I suggest
> a small self-contained program/test case with a RAM index created from
> scratch. You can then experiment with inject on or off and if you
> still can't figure it out, post the code and hopefully someone will be
> able to help you make sense of it.
>
> Make sure you tell us what version of Lucene you are using. If not
> the latest, wouldn't hurt to try with the latest.
>
>
> --
> Ian.
>
>
> On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
> <evanchastelet [at] gmail> wrote:
>> I keep replying to myself, it all gets a bit confusing.
>> The problem still exists and I don't understand why, and why it worked once.
>>
>> I have the same behavior again as posted in my first mail:
>> - Inject parameter is set to true.
>> - The index has _no deleted documents_ and is optimized.
>> - The term 'compete' is in there.
>> - If I ask Luke to show all docs for term 'compete' it shows me the one and
>> only document that represents this word. But...
>> - If I perform the query 'value:compete' in luke again, it says there are no
>> results.
>>
>> Here is the index I'm currently using. It contains various fields for the
>> available phonetic filter encoders:
>> https://www.box.com/s/34212e82227e102f6734
>>
>> Can somebody explain this behavior? What's the real use of the inject
>> parameter of the PhoneticFilterFactory?
>>
>> Thanks in advance.
>>
>> -Elmer
>>
>>
>> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>> Problem solved. Long story short: for some reason I had deleted documents
>>> in the index and the non-deleted documents used the phonetic filter with
>>> inject set to false.
>>>
>>> Works fine now :)
>>>
>>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>> Hi all,
>>>>
>>>> (scroll to bottom for question)
>>>>
>>>> I was setting up a simple web app to play around with phonetic filters.
>>>> The idea is simple, I just create a document for each word in the English
>>>> dictionary, each document containing a single search field holding the value
>>>> after it is preprocessed using the following analyzer def (in our own dsl
>>>> syntax, which gets transformed to java):
>>>>
>>>> analyzer soundslike{
>>>> tokenizer = KeywordTokenizer
>>>> tokenfilter = LowerCaseFilter
>>>> tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
>>>> }
>>>>
>>>> I can run the web app and I get results that indeed (in some way) sound
>>>> like the original query term.
>>>>
>>>> But what confuses me is the ranking of the results, knowing that I set
>>>> the inject param to true. If I search for the query term 'compete', the
>>>> parsed query becomes '(value:KMPT value:compete)', and therefore I expect
>>>> the word 'compete' to be ranked highest in the list than any other word....
>>>> but this wasn't the case.
>>>>
>>>> Looking further at the explanation of results, I saw that the term
>>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>>> encoding seems affect the ranking:
>>>>
>>>> * COMPETITOR
>>>> o 4.368826 = (MATCH) sum of:
>>>> + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>>> # 0.52838135 = queryWeight(value:KMPT), product of:
>>>> * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>> * 0.063904315 = queryNorm
>>>> # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>> product of:
>>>> * 1.0 = tf(termFreq(value:KMPT)=1)
>>>> * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>> * 1.0 = fieldNorm(field=value, doc=3174)
>>>>
>>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>>> documents tab, and started iterating over some terms for the field 'value'
>>>> until I found 'compete'. When I hit 'Show All Docs', the search tab opens
>>>> and it displays the one and only document holding this value (i.e. the
>>>> document representing the word 'compete'). It shows the query:
>>>> 'value:compete '. Then, when I hit the search button again (query is still
>>>> 'value:compete '), it says that there are no results !?
>>>>
>>>> Probably, the 'Show All Docs' button does something different than
>>>> performing a query using the search tab in Luke.
>>>>
>>>> Q: Can somebody explain why the injected original terms seem to get
>>>> ignored at query time? Or may it be related to the name of the search field
>>>> ('value'), or something else?
>>>>
>>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>>
>>>> -Elmer
>>>>
>>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Apr 26, 2012, 5:51 AM

Post #7 of 7 (377 views)
Permalink
Re: PhoneticFilterFactory 's inject parameter [In reply to]

There are useful tips in the FAQ,
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F.

I still think you should come up with small self-contained example code.


--
Ian.


On Wed, Apr 25, 2012 at 4:02 PM, Elmer van Chastelet
<evanchastelet [at] gmail> wrote:
> Thanks for your suggestion Ian, but I just found out that if I replace the
> KeywordTokenizer with a WhitespaceTokenizer, all seems to work fine.
>
> Just to test what happens, I created another field 'orig', using this
> analyzer:
> analyzer KeywordLowered{
>    tokenizer = KeywordTokenizer
>    tokenfilter = LowerCaseFilter
> }
>
> Guess what.. exactly the same problem, also in Luke.
> It finds no documents with for query:
> orig:strange
> While the term 'strange' is in the index for the field 'orig'.
>
> Does anybody have a clue why documents are not matched when using the
> KeywordTokenizer? Remember that all queries and terms don't contain white
> spaces.
>
>
> Thanks again.
> -Elmer
>
>
> On 04/25/2012 02:53 PM, Ian Lea wrote:
>>
>> You seem to be quietly going round in circles, by yourself!  I suggest
>> a small self-contained program/test case with a RAM index created from
>> scratch.  You can then experiment with inject on or off and if you
>> still can't figure it out, post the code and hopefully someone will be
>> able to help you make sense of it.
>>
>> Make sure you tell us what version of Lucene you are using.  If not
>> the latest, wouldn't hurt to try with the latest.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
>> <evanchastelet [at] gmail>  wrote:
>>>
>>> I keep replying to myself, it all gets a bit confusing.
>>> The problem still exists and I don't understand why, and why it worked
>>> once.
>>>
>>> I have the same behavior again as posted in my first mail:
>>> - Inject parameter is set to true.
>>> - The index has _no deleted documents_ and is optimized.
>>> - The term 'compete' is in there.
>>> - If I ask Luke to show all docs for term 'compete' it shows me the one
>>> and
>>> only document that represents this word. But...
>>> - If I perform the query 'value:compete' in luke again, it says there are
>>> no
>>> results.
>>>
>>> Here is the index I'm currently using. It contains various fields for the
>>> available phonetic filter encoders:
>>> https://www.box.com/s/34212e82227e102f6734
>>>
>>> Can somebody explain this behavior? What's the real use of the inject
>>> parameter of the PhoneticFilterFactory?
>>>
>>> Thanks in advance.
>>>
>>> -Elmer
>>>
>>>
>>> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>>>
>>>> Problem solved. Long story short: for some reason I had deleted
>>>> documents
>>>> in the index and the non-deleted documents used the phonetic filter with
>>>> inject set to false.
>>>>
>>>> Works fine now :)
>>>>
>>>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> (scroll to bottom for question)
>>>>>
>>>>> I was setting up a simple web app to play around with phonetic filters.
>>>>> The idea is simple, I just create a document for each word in the
>>>>> English
>>>>> dictionary, each document containing a single search field holding the
>>>>> value
>>>>> after it is preprocessed using the following analyzer def (in our own
>>>>> dsl
>>>>> syntax, which gets transformed to java):
>>>>>
>>>>> analyzer soundslike{
>>>>>    tokenizer = KeywordTokenizer
>>>>>    tokenfilter = LowerCaseFilter
>>>>>    tokenfilter = PhoneticFilter(encoder="DoubleMetaphone",
>>>>> inject="true")
>>>>> }
>>>>>
>>>>> I can run the web app and I get results that indeed (in some way) sound
>>>>> like the original query term.
>>>>>
>>>>> But what confuses me is the ranking of the results, knowing that I set
>>>>> the inject param to true. If I search for the query term 'compete', the
>>>>> parsed query becomes '(value:KMPT value:compete)', and therefore I
>>>>> expect
>>>>> the word 'compete' to be ranked highest in the list than any other
>>>>> word....
>>>>> but this wasn't the case.
>>>>>
>>>>> Looking further at the explanation of results, I saw that the term
>>>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>>>> encoding seems affect the ranking:
>>>>>
>>>>>  * COMPETITOR
>>>>>      o 4.368826 = (MATCH) sum of:
>>>>>          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>>>>              # 0.52838135 = queryWeight(value:KMPT), product of:
>>>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>>                  * 0.063904315 = queryNorm
>>>>>              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>>>                product of:
>>>>>                  * 1.0 = tf(termFreq(value:KMPT)=1)
>>>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>>                  * 1.0 = fieldNorm(field=value, doc=3174)
>>>>>
>>>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>>>> documents tab, and started iterating over some terms for the field
>>>>> 'value'
>>>>> until I found 'compete'. When I hit 'Show All Docs', the search tab
>>>>> opens
>>>>> and it displays the one and only document holding this value (i.e. the
>>>>> document representing the word 'compete'). It shows the query:
>>>>> 'value:compete '. Then, when I hit the search button again (query is
>>>>> still
>>>>> 'value:compete '), it says that there are no results !?
>>>>>
>>>>> Probably, the 'Show All Docs' button does something different than
>>>>> performing a query using the search tab in Luke.
>>>>>
>>>>> Q: Can somebody explain why the injected original terms seem to get
>>>>> ignored at query time? Or may it be related to the name of the search
>>>>> field
>>>>> ('value'), or something else?
>>>>>
>>>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>>>
>>>>> -Elmer
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.