Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-736) Sloppy Phrase Scoring Misbehavior

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Apr 16, 2007, 11:25 AM

Post #1 of 2 (288 views)
Permalink
[jira] Commented: (LUCENE-736) Sloppy Phrase Scoring Misbehavior

[ https://issues.apache.org/jira/browse/LUCENE-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489187 ]

Otis Gospodnetic commented on LUCENE-736:
-----------------------------------------

Doron, sounds like this is ripe for a commit now to take care of both this and LUCENE-697.


> Sloppy Phrase Scoring Misbehavior
> ---------------------------------
>
> Key: LUCENE-736
> URL: https://issues.apache.org/jira/browse/LUCENE-736
> Project: Lucene - Java
> Issue Type: Bug
> Components: Search
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Priority: Minor
> Attachments: perf-search-new.log, perf-search-orig.log, res-search-new2.log, res-search-orig2.log, sloppy_phrase.patch2.txt, sloppy_phrase.patch3.txt, sloppy_phrase_java.patch.txt, sloppy_phrase_tests.patch.txt
>
>
> This is an extension of https://issues.apache.org/jira/browse/LUCENE-697
> In addition to abnormalities Yonik pointed out in 697, there seem to be other issues with slopy phrase search and scoring.
> 1) A phrase with a repeated word would be detected in a document although it is not there.
> I.e. document = A B D C E , query = "B C B" would not find this document (as expected), but query "B C B"~2 would find it.
> I think that no matter how large the slop is, this document should not be a match.
> 2) A document containing both orders of a query, symmetrically, would score differently for the queru and for its reveresed form.
> I.e. document = A B C B A would score differently for queries "B C"~2 and "C B"~2, although it is symmetric to both.
> I will attach test cases that show both these problems and the one reported by Yonik in 697.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Apr 18, 2007, 10:23 PM

Post #2 of 2 (251 views)
Permalink
[jira] Commented: (LUCENE-736) Sloppy Phrase Scoring Misbehavior [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489930 ]

Doron Cohen commented on LUCENE-736:
------------------------------------

Need to see if the parts of the test (in QueryUtils) that were disabled by LUCENE-730 (BooleanScorer2 sometimes falls back to BooleanScorer). One possibility is to have two versions of this - a BooleanScoere version, and the rest - this issue (736) is about sloppy/exact phrase scoring, so it would fall into the "rest", and so the test would still catch this.

> Sloppy Phrase Scoring Misbehavior
> ---------------------------------
>
> Key: LUCENE-736
> URL: https://issues.apache.org/jira/browse/LUCENE-736
> Project: Lucene - Java
> Issue Type: Bug
> Components: Search
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Priority: Minor
> Attachments: perf-search-new.log, perf-search-orig.log, res-search-new2.log, res-search-orig2.log, sloppy_phrase.patch2.txt, sloppy_phrase.patch3.txt, sloppy_phrase_java.patch.txt, sloppy_phrase_tests.patch.txt
>
>
> This is an extension of https://issues.apache.org/jira/browse/LUCENE-697
> In addition to abnormalities Yonik pointed out in 697, there seem to be other issues with slopy phrase search and scoring.
> 1) A phrase with a repeated word would be detected in a document although it is not there.
> I.e. document = A B D C E , query = "B C B" would not find this document (as expected), but query "B C B"~2 would find it.
> I think that no matter how large the slop is, this document should not be a match.
> 2) A document containing both orders of a query, symmetrically, would score differently for the queru and for its reveresed form.
> I.e. document = A B C B A would score differently for queries "B C"~2 and "C B"~2, although it is symmetric to both.
> I will attach test cases that show both these problems and the one reported by Yonik in 697.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.