Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Feb 23, 2007, 12:02 PM

Post #1 of 5 (992 views)
Permalink
[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475452 ]

Doron Cohen commented on LUCENE-800:
------------------------------------

Michael, I've been looking into this and think I made some progress. Are you just starting, or do you have it solved already?

> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Feb 23, 2007, 12:10 PM

Post #2 of 5 (897 views)
Permalink
[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475457 ]

Michael Busch commented on LUCENE-800:
--------------------------------------

Hi Dilip,

the backslash is the escape character in Lucene's queryparser syntax. So if you want to search for a backslash you have to escape it. That means that the first two examples you provides are working as expected:

item:\\ -> item:\ is correct
item:\\* -> item:\* is correct too

If you want to search for two backslashes you have to escape both, meaning you have to put four backslashes in the query string:
item:\\\\* -> item:\\*


But you indeed found two other problems. You are right, the last example should not throw a ParseException.
In (item:\\ item:ABCD\\) the queryparser falsely thinks that the closing parenthesis is escaped, but actually the backslash is the escaped character. I will provide a patch for this problem soon.

And as you said the third example should throw a ParseException because there are too many closing parenthesis. There is already a patch for this problem in JIRA:
http://issues.apache.org/jira/browse/LUCENE-372

I will commit fixes for both problems soon.

Thanks again, Dilip! Good catches :-)


> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Feb 23, 2007, 12:12 PM

Post #3 of 5 (895 views)
Permalink
[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475461 ]

Michael Busch commented on LUCENE-800:
--------------------------------------

Doron,

the problem here is that a backslash is a valid TERM_CHAR and an ESCAPE_CHAR at the same time. The fix is to exclude \ from the TERM_CHAR list. I tried this fix and it works fine for me. I'm going to attach a patch today. Would be great if you could review it before I commit it, Doron!



> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Feb 23, 2007, 1:58 PM

Post #4 of 5 (899 views)
Permalink
[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475513 ]

Dilip Nimkar commented on LUCENE-800:
-------------------------------------

In my test code, I took care of the difference between \ as the Java escape character and \ as the Lucene escape character.

System.out.println(new QueryParser("_default_", analyzer).parse( "item:\\\\")) //note the 4 backslashes.
should print on the console item:\\
But it is printing item:\
Same is the case with the second string in the test code.

in general, the boolean test
str.equals(QueryParser("_default_", analyzer).parse( str).toString())
should always evaluate to true if the analyzer is not changing the string. But in our case it is evaluating to false.

The behavior I have consitently found is that - "Whenever and wherever a java String contains an unbroken sequence of N escaped backslashes (that is, N pairs of unescaped backslashes, totalling 2N backslashes) where N>= 2, the parse() method creates a Query that has only n-1 escaped backslashes in the corresponding place. " If you have 20 escaped backslashes in a java string, the Lucene query will end up with 19.

Thank you much for your time, attention and efforts.
Thanks.


> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Feb 23, 2007, 2:53 PM

Post #5 of 5 (921 views)
Permalink
[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475529 ]

Michael Busch commented on LUCENE-800:
--------------------------------------

Dilip,

are you using Lucene 1.9? The problem you are referring to (a sequence of N escaped backslashes) has been fixed in Lucene 2.1:
http://issues.apache.org/jira/browse/LUCENE-573

Could you test your code with the new version, please?

However, the two other problems you pointed out and which I talked about in my previous comment are still there (but I'm working on it ;))

Thanks,
Michael


> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.