Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Feb 3, 2012, 5:39 PM

Post #1 of 4 (32 views)
Permalink
[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200258#comment-13200258 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

bq. changing the visibility of "field" in the QueryParser base class to "protected".

This seems reasonable?

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/queryparser
> Affects Versions: 2.4
> Reporter: Mark Harwood
> Priority: Minor
> Fix For: 4.0
>
> Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, Lucene-1486 non default field.patch, TestComplexPhraseQuery.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works
> checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 7, 2012, 6:11 AM

Post #2 of 4 (25 views)
Permalink
[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202409#comment-13202409 ]

Tomás Fernández Löbbe commented on LUCENE-1486:
-----------------------------------------------

Hi I'm working in this change to allow field queries. I noted that queries like:
name:"de*"
name:de*
fail due to the exception thrown in the "rewrite" method:
{code:java}
public Query rewrite(IndexReader reader) throws IOException {
// ArrayList spanClauses = new ArrayList();
if (contents instanceof TermQuery) {
return contents;
}

// Build a sequence of Span clauses arranged in a SpanNear - child
// clauses can be complex
// Booleans e.g. nots and ors etc
int numNegatives = 0;
if (!(contents instanceof BooleanQuery)) {
throw new IllegalArgumentException("Unknown query type \""
+ contents.getClass().getName()
+ "\" found in phrase query string \"" + phrasedQueryStringContents
+ "\"");
}
...
{code}
By changing it to something like:
{code:java}
if (!(contents instanceof BooleanQuery)) {
return contents;
}
{code}
queries like the one above work, together with all the other queries available in the unit test. Is there something I'm missing with the previous change? I know the ComplexPhraseQueryParser is not intended to be used for queries like the ones I'm proposing, but why does it needs to fail in those cases?


> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/queryparser
> Affects Versions: 2.4
> Reporter: Mark Harwood
> Priority: Minor
> Fix For: 4.0
>
> Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, Lucene-1486 non default field.patch, TestComplexPhraseQuery.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works
> checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 8, 2012, 4:33 AM

Post #3 of 4 (25 views)
Permalink
[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203514#comment-13203514 ]

Ahmet Arslan commented on LUCENE-1486:
--------------------------------------

Thanks for looking into this, Mark and Tomas. Do you think this issue is the right place to introduce boolean inOrder parameter? Currently always inOrder=true is passed to SpanNearQuery's ctor.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/queryparser
> Affects Versions: 2.4
> Reporter: Mark Harwood
> Priority: Minor
> Fix For: 4.0
>
> Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, Lucene-1486 non default field.patch, TestComplexPhraseQuery.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works
> checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 8, 2012, 7:02 AM

Post #4 of 4 (25 views)
Permalink
[jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203652#comment-13203652 ]

Tomás Fernández Löbbe commented on LUCENE-1486:
-----------------------------------------------

Ahmet, I created a Jira for the "inOrder" in the ComplexPhraseQueryParser. See LUCENE-3758.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/queryparser
> Affects Versions: 2.4
> Reporter: Mark Harwood
> Priority: Minor
> Fix For: 4.0
>
> Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, Lucene-1486 non default field.patch, TestComplexPhraseQuery.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works
> checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.