Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

keyword indexing

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


jan at agermose

Jul 16, 2003, 10:03 AM

Post #1 of 3 (799 views)
Permalink
keyword indexing

I'm having some problems with chars in keywords that are not a-z0-9 chars...

If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name "Jan Agermose" - well besides the fact I need to lowercase the keywords as the querystring is lowercased by lucene, I still cannot get any hits on the keywords.

"Det Naturvidenskabelige Fakultet" - hits = 0
Det* - hits!
Det Naturvidenskabelige Fakultet - hits = 0

I can understand the last one - but shouldn't the first one return hits? If not, using keywords seems to be limited to keywords composed of [a-z0-9]+ ???

Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars) but this gives the queryparse some problems I would think - unless in my special case where the user is not really free to compose queries on there own - therefore I can do the same stringreplace thing on the input :-D But I would like for the poweruser to input real queries - and this leaves me with the problem of parsing queries. I need to do stringreplace only within double quotes... This should be lucenes problem not mine :-D

Am I missing something ??

Jan Agermose


amordo at infosciences

Jul 16, 2003, 10:23 AM

Post #2 of 3 (751 views)
Permalink
RE: keyword indexing [In reply to]

If you are searching on keyword you might need to use TermQuery in order
to have an exact match

-----Original Message-----
From: Jan Agermose [mailto:jan [at] agermose]
Sent: Wednesday, July 16, 2003 1:04 PM
To: lucene-user [at] jakarta
Subject: keyword indexing


I'm having some problems with chars in keywords that are not a-z0-9
chars...

If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name
"Jan Agermose" - well besides the fact I need to lowercase the keywords
as the querystring is lowercased by lucene, I still cannot get any hits
on the keywords.

"Det Naturvidenskabelige Fakultet" - hits = 0
Det* - hits!
Det Naturvidenskabelige Fakultet - hits = 0

I can understand the last one - but shouldn't the first one return hits?
If not, using keywords seems to be limited to keywords composed of
[a-z0-9]+ ???

Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars)
but this gives the queryparse some problems I would think - unless in my
special case where the user is not really free to compose queries on
there own - therefore I can do the same stringreplace thing on the input
:-D But I would like for the poweruser to input real queries - and this
leaves me with the problem of parsing queries. I need to do
stringreplace only within double quotes... This should be lucenes
problem not mine :-D

Am I missing something ??

Jan Agermose



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta


jan at agermose

Jul 16, 2003, 10:56 AM

Post #3 of 3 (757 views)
Permalink
Re: keyword indexing [In reply to]

So you cannot use the QueryBuilder if You are using keywords - is that it?

Jan


----- Original Message -----
From: "Aviran Mordo" <amordo [at] infosciences>
To: "'Lucene Users List'" <lucene-user [at] jakarta>
Sent: Wednesday, July 16, 2003 7:23 PM
Subject: RE: keyword indexing


> If you are searching on keyword you might need to use TermQuery in order
> to have an exact match
>
> -----Original Message-----
> From: Jan Agermose [mailto:jan [at] agermose]
> Sent: Wednesday, July 16, 2003 1:04 PM
> To: lucene-user [at] jakarta
> Subject: keyword indexing
>
>
> I'm having some problems with chars in keywords that are not a-z0-9
> chars...
>
> If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name
> "Jan Agermose" - well besides the fact I need to lowercase the keywords
> as the querystring is lowercased by lucene, I still cannot get any hits
> on the keywords.
>
> "Det Naturvidenskabelige Fakultet" - hits = 0
> Det* - hits!
> Det Naturvidenskabelige Fakultet - hits = 0
>
> I can understand the last one - but shouldn't the first one return hits?
> If not, using keywords seems to be limited to keywords composed of
> [a-z0-9]+ ???
>
> Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars)
> but this gives the queryparse some problems I would think - unless in my
> special case where the user is not really free to compose queries on
> there own - therefore I can do the same stringreplace thing on the input
> :-D But I would like for the poweruser to input real queries - and this
> leaves me with the problem of parsing queries. I need to do
> stringreplace only within double quotes... This should be lucenes
> problem not mine :-D
>
> Am I missing something ??
>
> Jan Agermose
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-user-help [at] jakarta
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.