Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

When does Query Parser do its analysis ?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


paul_t100 at fastmail

Feb 1, 2012, 1:32 PM

Post #1 of 6 (293 views)
Permalink
When does Query Parser do its analysis ?

So I subclass Query Parser and give it query

dug up

then debugging shows it calls getFieldQuery(String field, String
queryText, boolean quoted) twice
once with

queryText=dug

and one with

queryText=up

but then when I run it with query dúg up the first call is

queryText=dúg

even though the analyser I use remove accents

So it seems like it just broke the text up at spaces, and does text
analysis within getFieldQuery(), but how can it make the assumption that
text should only be broken at whitespace ?
This seemed to be confirmed that when i pass it query 'dug/up' it just
passes it as one string, but then its seems to get converted to 'dug up'
within the getFieldQuery()


Sorry I don't get it.

Paul





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


rcmuir at gmail

Feb 1, 2012, 2:03 PM

Post #2 of 6 (285 views)
Permalink
Re: When does Query Parser do its analysis ? [In reply to]

On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor <paul_t100 [at] fastmail> wrote:
>
> So it seems like it just broke the text up at spaces, and does text analysis
> within getFieldQuery(), but how can it make the assumption that text should
> only be broken at whitespace ?

you are right, see this bug report:
https://issues.apache.org/jira/browse/LUCENE-2605

--
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Feb 1, 2012, 3:02 PM

Post #3 of 6 (284 views)
Permalink
Re: When does Query Parser do its analysis ? [In reply to]

: So it seems like it just broke the text up at spaces, and does text analysis
: within getFieldQuery(), but how can it make the assumption that text should
: only be broken at whitespace ?

whitespace is a significant metacharacter to the Queryparser - it is used
to distinguish multiple clauses of a BooleanQuery.

if you want whitepace to be treated as a literal part of the query, you
need to either escape it, or quote it...

dug\ up
"dug up"

: This seemed to be confirmed that when i pass it query 'dug/up' it just passes
: it as one string, but then its seems to get converted to 'dug up' within the
: getFieldQuery()

getFieldQuery is responsible for calling the analyzer - so in your
'dug/up' example the analyzer you are using in your QueryParser instance
is evidently tokenizing on "/"


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


paul_t100 at fastmail

Feb 1, 2012, 3:45 PM

Post #4 of 6 (282 views)
Permalink
Re: When does Query Parser do its analysis ? [In reply to]

On 01/02/2012 22:03, Robert Muir wrote:
> On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor<paul_t100 [at] fastmail> wrote:
>> So it seems like it just broke the text up at spaces, and does text analysis
>> within getFieldQuery(), but how can it make the assumption that text should
>> only be broken at whitespace ?
> you are right, see this bug report:
> https://issues.apache.org/jira/browse/LUCENE-2605
>
I've voted on it, although reading the Hoss Mans reply I understand the
issue.

In my particular case I add album catalogsno to my index as a keyword
field , but of course if the cat log number contains a space as they
often do (i.e. cad 6) there is a mismatch. Ive now changed my indexing
to index the value as 'cad6' removing spaces. Now if the query sent to
the query parser is just

cad 6

there is the issue that it breaks them up into two separate fields ,
but I thought it that if the query sent to the parser was

"cad 6"

then the complete string would be passed using the analyzer , but it
doesn't seem to quite work, it creates a TermQuery instead of a
PhraseQuery , yet the explain shows the query to have the value

catno:cad 6

rather than

catno:cad6

and I dont get a match, what does that mean ?

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


cdoronc at gmail

Feb 1, 2012, 11:27 PM

Post #5 of 6 (272 views)
Permalink
Re: When does Query Parser do its analysis ? [In reply to]

>
> In my particular case I add album catalogsno to my index as a keyword
> field , but of course if the cat log number contains a space as they often
> do (i.e. cad 6) there is a mismatch. Ive now changed my indexing to index
> the value as 'cad6' removing spaces. Now if the query sent to the query
> parser is just
>
> cad 6
>
> there is the issue that it breaks them up into two separate fields , but
> I thought it that if the query sent to the parser was
>
> "cad 6"
>
> then the complete string would be passed using the analyzer , but it
> doesn't seem to quite work, it creates a TermQuery instead of a PhraseQuery
> , yet the explain shows the query to have the value
>
> catno:cad 6
>
> rather than
>
> catno:cad6
>
> and I dont get a match, what does that mean ?


Seems like at query time a KeywordAnalyzer was applied, while at indexing
time additional logic of removing spaces was (first) applied, therefore the
different results at indexing and search.

Doron


paul_t100 at fastmail

Feb 2, 2012, 12:26 AM

Post #6 of 6 (277 views)
Permalink
Re: When does Query Parser do its analysis ? [In reply to]

On 02/02/2012 07:27, Doron Cohen wrote:
>
> In my particular case I add album catalogsno to my index as a
> keyword field , but of course if the cat log number contains a
> space as they often do (i.e. cad 6) there is a mismatch. Ive now
> changed my indexing to index the value as 'cad6' removing spaces.
> Now if the query sent to the query parser is just
>
> cad 6
>
> there is the issue that it breaks them up into two separate
> fields , but I thought it that if the query sent to the parser was
>
> "cad 6"
>
> then the complete string would be passed using the analyzer , but
> it doesn't seem to quite work, it creates a TermQuery instead of a
> PhraseQuery , yet the explain shows the query to have the value
>
> catno:cad 6
>
> rather than
>
> catno:cad6
>
> and I dont get a match, what does that mean ?
>
>
> Seems like at query time a KeywordAnalyzer was applied, while at
> indexing time additional logic of removing spaces was (first) applied,
> therefore the different results at indexing and search.
>
> Doron
Hi, sort of I had an error in the reusableTokenStream() method of my
analyzer, so it wasn't doing the full analysis at query time, working now.

thanks Paul

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.