Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Lucene fields not analyzed

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


iamrohitbanga at gmail

Feb 8, 2010, 11:26 PM

Post #1 of 8 (1377 views)
Permalink
Lucene fields not analyzed

Hello

i have a field that stores names of people. i have used the NOT_ANALYZED
parameter to index the names.

this is what happens during indexing

doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
Field.Index.NOT_ANALYZED));



when i search it, i create a query parser using standardanalyzer and append
~0.5 to the search query.

the problem is that if the indexed name is "Mr. Kumar", my search does not
work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the space).

// searching code
File index_directory = new File(INDEX_DIR_PATH);
IndexReader reader =
IndexReader.open(FSDirectory.open(index_directory), true);
Searcher searcher = new IndexSearcher(reader);

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "name",
analyzer);

Query query;
query = parser.parse(text + "~0.5");

how to make it work?

Rohit Banga


uwe at thetaphi

Feb 8, 2010, 11:35 PM

Post #2 of 8 (1351 views)
Permalink
RE: Lucene fields not analyzed [In reply to]

QueryParser uses the given Analyzer when constructing they query, so it will never hit a NOT_ANALYZED term. In general, it is a bad idea to use QueryParser on fields that are not analyzed. There are two possibilities to solve the problem:

- Instantiate the query to match the not-analyzed (but indexed field) directly as a TermQuery.
- Use a PerFieldAnalyzerWrapper and choose a specific analyzer for this field that does not touch your names (e.g. KeywordAnalyzer). Use this wrapped analyzer for the both searching an indexing (and use Field.Index.ANALYZED!).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

> -----Original Message-----
> From: Rohit Banga [mailto:iamrohitbanga [at] gmail]
> Sent: Tuesday, February 09, 2010 8:27 AM
> To: java-user [at] lucene
> Subject: Lucene fields not analyzed
>
> Hello
>
> i have a field that stores names of people. i have used the
> NOT_ANALYZED
> parameter to index the names.
>
> this is what happens during indexing
>
> doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>
>
>
> when i search it, i create a query parser using standardanalyzer and
> append
> ~0.5 to the search query.
>
> the problem is that if the indexed name is "Mr. Kumar", my search does
> not
> work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the
> space).
>
> // searching code
> File index_directory = new File(INDEX_DIR_PATH);
> IndexReader reader =
> IndexReader.open(FSDirectory.open(index_directory), true);
> Searcher searcher = new IndexSearcher(reader);
>
> Analyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_CURRENT);
>
> QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> "name",
> analyzer);
>
> Query query;
> query = parser.parse(text + "~0.5");
>
> how to make it work?
>
> Rohit Banga


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markharw00d at yahoo

Feb 8, 2010, 11:36 PM

Post #3 of 8 (1338 views)
Permalink
Re: Lucene fields not analyzed [In reply to]

I suspect it is because QueryParser uses space characters to separate different clauses in a query string while you want the space to represent some content in your "name" field. Try escaping the space character.

Cheers
Mark



On 9 Feb 2010, at 07:26, Rohit Banga wrote:

> Hello
>
> i have a field that stores names of people. i have used the NOT_ANALYZED
> parameter to index the names.
>
> this is what happens during indexing
>
> doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>
>
>
> when i search it, i create a query parser using standardanalyzer and append
> ~0.5 to the search query.
>
> the problem is that if the indexed name is "Mr. Kumar", my search does not
> work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the space).
>
> // searching code
> File index_directory = new File(INDEX_DIR_PATH);
> IndexReader reader =
> IndexReader.open(FSDirectory.open(index_directory), true);
> Searcher searcher = new IndexSearcher(reader);
>
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>
> QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "name",
> analyzer);
>
> Query query;
> query = parser.parse(text + "~0.5");
>
> how to make it work?
>
> Rohit Banga


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


iamrohitbanga at gmail

Feb 9, 2010, 12:03 AM

Post #4 of 8 (1333 views)
Permalink
Re: Lucene fields not analyzed [In reply to]

let us assume this is the only field that is relevant (others are stored and
not indexed).
i tried termquery and it does not work.
i also tried keyword analyzer and still could not make it work.

@Mark
i cannot escape the spaces in my query as i am using Lucene to identify
occurences of names among other things in the unstructured sentence.
so while adding names to the index, i used keyword analyzer and changed the
name to be added to the index to "Mr.\\ Kumar"
but still couldn't get it to work.






Rohit Banga


On Tue, Feb 9, 2010 at 1:06 PM, Mark Harwood <markharw00d [at] yahoo>wrote:

> I suspect it is because QueryParser uses space characters to separate
> different clauses in a query string while you want the space to represent
> some content in your "name" field. Try escaping the space character.
>
> Cheers
> Mark
>
>
>
> On 9 Feb 2010, at 07:26, Rohit Banga wrote:
>
> > Hello
> >
> > i have a field that stores names of people. i have used the NOT_ANALYZED
> > parameter to index the names.
> >
> > this is what happens during indexing
> >
> > doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
> > Field.Index.NOT_ANALYZED));
> >
> >
> >
> > when i search it, i create a query parser using standardanalyzer and
> append
> > ~0.5 to the search query.
> >
> > the problem is that if the indexed name is "Mr. Kumar", my search does
> not
> > work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the
> space).
> >
> > // searching code
> > File index_directory = new File(INDEX_DIR_PATH);
> > IndexReader reader =
> > IndexReader.open(FSDirectory.open(index_directory), true);
> > Searcher searcher = new IndexSearcher(reader);
> >
> > Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
> >
> > QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> "name",
> > analyzer);
> >
> > Query query;
> > query = parser.parse(text + "~0.5");
> >
> > how to make it work?
> >
> > Rohit Banga
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


markharw00d at yahoo

Feb 9, 2010, 12:23 AM

Post #5 of 8 (1334 views)
Permalink
Re: Lucene fields not analyzed [In reply to]

Use Luke.

It can show you the index contents and your parsed query and should show what is breaking down here.

On 9 Feb 2010, at 08:03, Rohit Banga wrote:

> let us assume this is the only field that is relevant (others are stored and
> not indexed).
> i tried termquery and it does not work.
> i also tried keyword analyzer and still could not make it work.
>
> @Mark
> i cannot escape the spaces in my query as i am using Lucene to identify
> occurences of names among other things in the unstructured sentence.
> so while adding names to the index, i used keyword analyzer and changed the
> name to be added to the index to "Mr.\\ Kumar"
> but still couldn't get it to work.
>
>
>
>
>
>
> Rohit Banga
>
>
> On Tue, Feb 9, 2010 at 1:06 PM, Mark Harwood <markharw00d [at] yahoo>wrote:
>
>> I suspect it is because QueryParser uses space characters to separate
>> different clauses in a query string while you want the space to represent
>> some content in your "name" field. Try escaping the space character.
>>
>> Cheers
>> Mark
>>
>>
>>
>> On 9 Feb 2010, at 07:26, Rohit Banga wrote:
>>
>>> Hello
>>>
>>> i have a field that stores names of people. i have used the NOT_ANALYZED
>>> parameter to index the names.
>>>
>>> this is what happens during indexing
>>>
>>> doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
>>> Field.Index.NOT_ANALYZED));
>>>
>>>
>>>
>>> when i search it, i create a query parser using standardanalyzer and
>> append
>>> ~0.5 to the search query.
>>>
>>> the problem is that if the indexed name is "Mr. Kumar", my search does
>> not
>>> work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the
>> space).
>>>
>>> // searching code
>>> File index_directory = new File(INDEX_DIR_PATH);
>>> IndexReader reader =
>>> IndexReader.open(FSDirectory.open(index_directory), true);
>>> Searcher searcher = new IndexSearcher(reader);
>>>
>>> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>>>
>>> QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
>> "name",
>>> analyzer);
>>>
>>> Query query;
>>> query = parser.parse(text + "~0.5");
>>>
>>> how to make it work?
>>>
>>> Rohit Banga
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Feb 9, 2010, 12:27 AM

Post #6 of 8 (1343 views)
Permalink
RE: Lucene fields not analyzed [In reply to]

If you don't get it working that way, then you have to ask you the question: Why do you want it indexed that way? Is it because you don't want to find all people in that field when you add ony "Mr." to a search query? It looks like you use StandardAnalyzer, and in this case, I would add "mr", not "mr!", to the stop word list and index the name field as any other field. Before doing this, it would be good to explain, what you are intending to do/prevent by indexing with NOT_ANALYZED, which is the source of your problem.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: Rohit Banga [mailto:iamrohitbanga [at] gmail]
> Sent: Tuesday, February 09, 2010 9:03 AM
> To: java-user [at] lucene
> Subject: Re: Lucene fields not analyzed
>
> let us assume this is the only field that is relevant (others are
> stored and
> not indexed).
> i tried termquery and it does not work.
> i also tried keyword analyzer and still could not make it work.
>
> @Mark
> i cannot escape the spaces in my query as i am using Lucene to identify
> occurences of names among other things in the unstructured sentence.
> so while adding names to the index, i used keyword analyzer and changed
> the
> name to be added to the index to "Mr.\\ Kumar"
> but still couldn't get it to work.
>
>
>
>
>
>
> Rohit Banga
>
>
> On Tue, Feb 9, 2010 at 1:06 PM, Mark Harwood
> <markharw00d [at] yahoo>wrote:
>
> > I suspect it is because QueryParser uses space characters to separate
> > different clauses in a query string while you want the space to
> represent
> > some content in your "name" field. Try escaping the space character.
> >
> > Cheers
> > Mark
> >
> >
> >
> > On 9 Feb 2010, at 07:26, Rohit Banga wrote:
> >
> > > Hello
> > >
> > > i have a field that stores names of people. i have used the
> NOT_ANALYZED
> > > parameter to index the names.
> > >
> > > this is what happens during indexing
> > >
> > > doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
> > > Field.Index.NOT_ANALYZED));
> > >
> > >
> > >
> > > when i search it, i create a query parser using standardanalyzer
> and
> > append
> > > ~0.5 to the search query.
> > >
> > > the problem is that if the indexed name is "Mr. Kumar", my search
> does
> > not
> > > work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the
> > space).
> > >
> > > // searching code
> > > File index_directory = new File(INDEX_DIR_PATH);
> > > IndexReader reader =
> > > IndexReader.open(FSDirectory.open(index_directory), true);
> > > Searcher searcher = new IndexSearcher(reader);
> > >
> > > Analyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_CURRENT);
> > >
> > > QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> > "name",
> > > analyzer);
> > >
> > > Query query;
> > > query = parser.parse(text + "~0.5");
> > >
> > > how to make it work?
> > >
> > > Rohit Banga
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


iamrohitbanga at gmail

Feb 9, 2010, 1:12 AM

Post #7 of 8 (1329 views)
Permalink
Re: Lucene fields not analyzed [In reply to]

i'll try using Luke.

how i want to use Lucene?

there is a sentence that may contain the names of some people from among
those in a list. the names may be incomplete or may have spelling mistakes.

so i created a lucene index, with each person as a document.

eg.

Mr. Arun Kumar

with a hit highlighter i get

<B>Mr</B>. <B>Arun</B> <B>Kumar</B>

what i want is
<B>Mr. Arun Kumar</B>

even when there are spelling mistakes.


Rohit Banga


On Tue, Feb 9, 2010 at 1:57 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> If you don't get it working that way, then you have to ask you the
> question: Why do you want it indexed that way? Is it because you don't want
> to find all people in that field when you add ony "Mr." to a search query?
> It looks like you use StandardAnalyzer, and in this case, I would add "mr",
> not "mr!", to the stop word list and index the name field as any other
> field. Before doing this, it would be good to explain, what you are
> intending to do/prevent by indexing with NOT_ANALYZED, which is the source
> of your problem.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
> > -----Original Message-----
> > From: Rohit Banga [mailto:iamrohitbanga [at] gmail]
> > Sent: Tuesday, February 09, 2010 9:03 AM
> > To: java-user [at] lucene
> > Subject: Re: Lucene fields not analyzed
> >
> > let us assume this is the only field that is relevant (others are
> > stored and
> > not indexed).
> > i tried termquery and it does not work.
> > i also tried keyword analyzer and still could not make it work.
> >
> > @Mark
> > i cannot escape the spaces in my query as i am using Lucene to identify
> > occurences of names among other things in the unstructured sentence.
> > so while adding names to the index, i used keyword analyzer and changed
> > the
> > name to be added to the index to "Mr.\\ Kumar"
> > but still couldn't get it to work.
> >
> >
> >
> >
> >
> >
> > Rohit Banga
> >
> >
> > On Tue, Feb 9, 2010 at 1:06 PM, Mark Harwood
> > <markharw00d [at] yahoo>wrote:
> >
> > > I suspect it is because QueryParser uses space characters to separate
> > > different clauses in a query string while you want the space to
> > represent
> > > some content in your "name" field. Try escaping the space character.
> > >
> > > Cheers
> > > Mark
> > >
> > >
> > >
> > > On 9 Feb 2010, at 07:26, Rohit Banga wrote:
> > >
> > > > Hello
> > > >
> > > > i have a field that stores names of people. i have used the
> > NOT_ANALYZED
> > > > parameter to index the names.
> > > >
> > > > this is what happens during indexing
> > > >
> > > > doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
> > > > Field.Index.NOT_ANALYZED));
> > > >
> > > >
> > > >
> > > > when i search it, i create a query parser using standardanalyzer
> > and
> > > append
> > > > ~0.5 to the search query.
> > > >
> > > > the problem is that if the indexed name is "Mr. Kumar", my search
> > does
> > > not
> > > > work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the
> > > space).
> > > >
> > > > // searching code
> > > > File index_directory = new File(INDEX_DIR_PATH);
> > > > IndexReader reader =
> > > > IndexReader.open(FSDirectory.open(index_directory), true);
> > > > Searcher searcher = new IndexSearcher(reader);
> > > >
> > > > Analyzer analyzer = new
> > StandardAnalyzer(Version.LUCENE_CURRENT);
> > > >
> > > > QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> > > "name",
> > > > analyzer);
> > > >
> > > > Query query;
> > > > query = parser.parse(text + "~0.5");
> > > >
> > > > how to make it work?
> > > >
> > > > Rohit Banga
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > > For additional commands, e-mail: java-user-help [at] lucene
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


iamrohitbanga at gmail

Feb 9, 2010, 1:16 AM

Post #8 of 8 (1325 views)
Permalink
Re: Lucene fields not analyzed [In reply to]

moreover, search for Mr. Arun Kumar also matches other names because Mr.
matches.
i am ready to use Mr. as a stop word in an analyzer.

Rohit Banga


On Tue, Feb 9, 2010 at 2:42 PM, Rohit Banga <iamrohitbanga [at] gmail> wrote:

> i'll try using Luke.
>
> how i want to use Lucene?
>
> there is a sentence that may contain the names of some people from among
> those in a list. the names may be incomplete or may have spelling mistakes.
>
> so i created a lucene index, with each person as a document.
>
> eg.
>
> Mr. Arun Kumar
>
> with a hit highlighter i get
>
> <B>Mr</B>. <B>Arun</B> <B>Kumar</B>
>
> what i want is
> <B>Mr. Arun Kumar</B>
>
> even when there are spelling mistakes.
>
>
> Rohit Banga
>
>
>
> On Tue, Feb 9, 2010 at 1:57 PM, Uwe Schindler <uwe [at] thetaphi> wrote:
>
>> If you don't get it working that way, then you have to ask you the
>> question: Why do you want it indexed that way? Is it because you don't want
>> to find all people in that field when you add ony "Mr." to a search query?
>> It looks like you use StandardAnalyzer, and in this case, I would add "mr",
>> not "mr!", to the stop word list and index the name field as any other
>> field. Before doing this, it would be good to explain, what you are
>> intending to do/prevent by indexing with NOT_ANALYZED, which is the source
>> of your problem.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe [at] thetaphi
>>
>>
>> > -----Original Message-----
>> > From: Rohit Banga [mailto:iamrohitbanga [at] gmail]
>> > Sent: Tuesday, February 09, 2010 9:03 AM
>> > To: java-user [at] lucene
>> > Subject: Re: Lucene fields not analyzed
>> >
>> > let us assume this is the only field that is relevant (others are
>> > stored and
>> > not indexed).
>> > i tried termquery and it does not work.
>> > i also tried keyword analyzer and still could not make it work.
>> >
>> > @Mark
>> > i cannot escape the spaces in my query as i am using Lucene to identify
>> > occurences of names among other things in the unstructured sentence.
>> > so while adding names to the index, i used keyword analyzer and changed
>> > the
>> > name to be added to the index to "Mr.\\ Kumar"
>> > but still couldn't get it to work.
>> >
>> >
>> >
>> >
>> >
>> >
>> > Rohit Banga
>> >
>> >
>> > On Tue, Feb 9, 2010 at 1:06 PM, Mark Harwood
>> > <markharw00d [at] yahoo>wrote:
>> >
>> > > I suspect it is because QueryParser uses space characters to separate
>> > > different clauses in a query string while you want the space to
>> > represent
>> > > some content in your "name" field. Try escaping the space character.
>> > >
>> > > Cheers
>> > > Mark
>> > >
>> > >
>> > >
>> > > On 9 Feb 2010, at 07:26, Rohit Banga wrote:
>> > >
>> > > > Hello
>> > > >
>> > > > i have a field that stores names of people. i have used the
>> > NOT_ANALYZED
>> > > > parameter to index the names.
>> > > >
>> > > > this is what happens during indexing
>> > > >
>> > > > doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
>> > > > Field.Index.NOT_ANALYZED));
>> > > >
>> > > >
>> > > >
>> > > > when i search it, i create a query parser using standardanalyzer
>> > and
>> > > append
>> > > > ~0.5 to the search query.
>> > > >
>> > > > the problem is that if the indexed name is "Mr. Kumar", my search
>> > does
>> > > not
>> > > > work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the
>> > > space).
>> > > >
>> > > > // searching code
>> > > > File index_directory = new File(INDEX_DIR_PATH);
>> > > > IndexReader reader =
>> > > > IndexReader.open(FSDirectory.open(index_directory), true);
>> > > > Searcher searcher = new IndexSearcher(reader);
>> > > >
>> > > > Analyzer analyzer = new
>> > StandardAnalyzer(Version.LUCENE_CURRENT);
>> > > >
>> > > > QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
>> > > "name",
>> > > > analyzer);
>> > > >
>> > > > Query query;
>> > > > query = parser.parse(text + "~0.5");
>> > > >
>> > > > how to make it work?
>> > > >
>> > > > Rohit Banga
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> > > For additional commands, e-mail: java-user-help [at] lucene
>> > >
>> > >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.