Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

help required urgent!!!!!!!!!!!

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


Shakti_Sareen at satyam

Nov 22, 2007, 7:46 AM

Post #1 of 7 (478 views)
Permalink
help required urgent!!!!!!!!!!!

Hi
I am using StandardAnalyser() to index the data.
But I want to do a like search on a word containing Hyphen
For example it want to search a word "soft-wa*"

I am getting no hits for that. It is said that if the hyphen is there in
the word, then we should include that word in the double quotes ("). But
enclosing the word in a double quotes (") means the exact word search.

How can I perform the like search on a word containing hyphen???????

Please help.

Regards,
Shakti Sareen





DISCLAIMER:
This email (including any attachments) is intended for the sole use of the intended recipient/s and may contain material that is CONFIDENTIAL AND PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or distribution or forwarding of any or all of the contents in this message is STRICTLY PROHIBITED. If you are not the intended recipient, please contact the sender by email and delete all copies; your cooperation in this regard is appreciated.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


serera at gmail

Nov 22, 2007, 7:54 AM

Post #2 of 7 (462 views)
Permalink
Re: help required urgent!!!!!!!!!!! [In reply to]

Hi

You can simply create a PrefixQuery. However, if you're using
StandardAnalyzer, and the word is added as Index.TOKENIZED,
sotf-wa<something> will be broken to 'soft' and 'wa<something>'. Therefore
you'll need to add the word as Index.UN_TOKENIZED, or use a different
Analyzer when you index the data (for this field at least).

Here's a sample code:

// Indexing.
Document doc = new Document();
doc.add(new Field("field", "soft-wash", Store.NO, Index.UN_TOKENIZED
));

// Search
Query q = new PrefixQuery(new Term("field", "soft-wa"));

Does that help?

On Nov 22, 2007 5:46 PM, Shakti_Sareen <Shakti_Sareen [at] satyam> wrote:

> Hi
> I am using StandardAnalyser() to index the data.
> But I want to do a like search on a word containing Hyphen
> For example it want to search a word "soft-wa*"
>
> I am getting no hits for that. It is said that if the hyphen is there in
> the word, then we should include that word in the double quotes ("). But
> enclosing the word in a double quotes (") means the exact word search.
>
> How can I perform the like search on a word containing hyphen???????
>
> Please help.
>
> Regards,
> Shakti Sareen
>
>
>
>
>
> DISCLAIMER:
> This email (including any attachments) is intended for the sole use of the
> intended recipient/s and may contain material that is CONFIDENTIAL AND
> PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or
> distribution or forwarding of any or all of the contents in this message is
> STRICTLY PROHIBITED. If you are not the intended recipient, please contact
> the sender by email and delete all copies; your cooperation in this regard
> is appreciated.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Regards,

Shai Erera


Shakti_Sareen at satyam

Nov 22, 2007, 8:02 AM

Post #3 of 7 (468 views)
Permalink
RE: help required urgent!!!!!!!!!!! [In reply to]

Hi

But the file I am indexing is very big and I don't know which word will
contain the hyphen. The thing you suggest can be implemented only if
there are some specific words in the file.

Apart from StandardAnalyzer I have got no option.

Thanks a lot for your reply.

Please suggest me how can I go ahead.


SHAKTI SAREEN
GE-GDC
STC HYDERABAD
9948777794

-----Original Message-----
From: Shai Erera [mailto:serera [at] gmail]
Sent: Thursday, November 22, 2007 9:25 PM
To: java-user [at] lucene
Subject: Re: help required urgent!!!!!!!!!!!

Hi

You can simply create a PrefixQuery. However, if you're using
StandardAnalyzer, and the word is added as Index.TOKENIZED,
sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
Therefore
you'll need to add the word as Index.UN_TOKENIZED, or use a different
Analyzer when you index the data (for this field at least).

Here's a sample code:

// Indexing.
Document doc = new Document();
doc.add(new Field("field", "soft-wash", Store.NO,
Index.UN_TOKENIZED
));

// Search
Query q = new PrefixQuery(new Term("field", "soft-wa"));

Does that help?

On Nov 22, 2007 5:46 PM, Shakti_Sareen <Shakti_Sareen [at] satyam> wrote:

> Hi
> I am using StandardAnalyser() to index the data.
> But I want to do a like search on a word containing Hyphen
> For example it want to search a word "soft-wa*"
>
> I am getting no hits for that. It is said that if the hyphen is there
in
> the word, then we should include that word in the double quotes (").
But
> enclosing the word in a double quotes (") means the exact word search.
>
> How can I perform the like search on a word containing hyphen???????
>
> Please help.
>
> Regards,
> Shakti Sareen
>
>
>
>
>
> DISCLAIMER:
> This email (including any attachments) is intended for the sole use of
the
> intended recipient/s and may contain material that is CONFIDENTIAL AND
> PRIVATE COMPANY INFORMATION. Any review or reliance by others or
copying or
> distribution or forwarding of any or all of the contents in this
message is
> STRICTLY PROHIBITED. If you are not the intended recipient, please
contact
> the sender by email and delete all copies; your cooperation in this
regard
> is appreciated.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Regards,

Shai Erera


DISCLAIMER:
This email (including any attachments) is intended for the sole use of the intended recipient/s and may contain material that is CONFIDENTIAL AND PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or distribution or forwarding of any or all of the contents in this message is STRICTLY PROHIBITED. If you are not the intended recipient, please contact the sender by email and delete all copies; your cooperation in this regard is appreciated.


markharw00d at yahoo

Nov 22, 2007, 8:19 AM

Post #4 of 7 (463 views)
Permalink
Re: help required urgent!!!!!!!!!!! [In reply to]

>>Re: help required urgent!!!!!!!!!!!

Yikes!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I'm guessing that the question was more about how to support this in the standard query syntax where there are multiple words.

i.e. http://www.google.com/search?q=lucene+wildcard+in+phrase

This post seems close to a solution to that problem:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200607.mbox/%3CB50FE8BF4F9CF24FB800E28DBB1A2FE026D681 [at] gaia%3E


Cheers,
Mark


----- Original Message ----
From: Shai Erera <serera [at] gmail>
To: java-user [at] lucene
Sent: Thursday, 22 November, 2007 3:54:51 PM
Subject: Re: help required urgent!!!!!!!!!!!

Hi

You can simply create a PrefixQuery. However, if you're using
StandardAnalyzer, and the word is added as Index.TOKENIZED,
sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
Therefore
you'll need to add the word as Index.UN_TOKENIZED, or use a different
Analyzer when you index the data (for this field at least).

Here's a sample code:

// Indexing.
Document doc = new Document();
doc.add(new Field("field", "soft-wash", Store.NO,
Index.UN_TOKENIZED
));

// Search
Query q = new PrefixQuery(new Term("field", "soft-wa"));

Does that help?

On Nov 22, 2007 5:46 PM, Shakti_Sareen <Shakti_Sareen [at] satyam>
wrote:

> Hi
> I am using StandardAnalyser() to index the data.
> But I want to do a like search on a word containing Hyphen
> For example it want to search a word "soft-wa*"
>
> I am getting no hits for that. It is said that if the hyphen is there
in
> the word, then we should include that word in the double quotes (").
But
> enclosing the word in a double quotes (") means the exact word
search.
>
> How can I perform the like search on a word containing hyphen???????
>
> Please help.
>
> Regards,
> Shakti Sareen
>
>
>
>
>
> DISCLAIMER:
> This email (including any attachments) is intended for the sole use
of the
> intended recipient/s and may contain material that is CONFIDENTIAL
AND
> PRIVATE COMPANY INFORMATION. Any review or reliance by others or
copying or
> distribution or forwarding of any or all of the contents in this
message is
> STRICTLY PROHIBITED. If you are not the intended recipient, please
contact
> the sender by email and delete all copies; your cooperation in this
regard
> is appreciated.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Regards,

Shai Erera





___________________________________________________________
Want ideas for reducing your carbon footprint? Visit Yahoo! For Good http://uk.promotions.yahoo.com/forgood/environment.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


serera at gmail

Nov 22, 2007, 8:23 AM

Post #5 of 7 (459 views)
Permalink
Re: help required urgent!!!!!!!!!!! [In reply to]

The thing is - StandardAnalyzer breaks on hyphen. You'll need to work around
this by either extend StandardAnalyzer

From StandardTokenizer's documentation (which is used by StandardAnalyzer):
* <li> *Splits words at hyphens, unless there's a number in the token, in
which case
* the whole token is interpreted as a product number and is not split.*

I've investigated StandardAnalyzer's tokenization and it doesn't look simple
to disable that behavior. What you can do is extend StandardAnalyzer and
override its tokenStream method to create a TokenStream of your own. If you
know your text is space separated, you can use StringTokenizer to split the
text on spaces. If a token contains '-', don't break it, otherwise pass it
forward the the TokenStream returned by StandardAnalyzer.

Maybe someone else has a better answer, but if you insist on using
StandardAnalyzer, I have a feeling it will be problematic.

On Nov 22, 2007 6:02 PM, Shakti_Sareen < Shakti_Sareen [at] satyam> wrote:

> Hi
>
> But the file I am indexing is very big and I don't know which word will
> contain the hyphen. The thing you suggest can be implemented only if
> there are some specific words in the file.
>
> Apart from StandardAnalyzer I have got no option.
>
> Thanks a lot for your reply.
>
> Please suggest me how can I go ahead.
>
>
> SHAKTI SAREEN
> GE-GDC
> STC HYDERABAD
> 9948777794
>
> -----Original Message-----
> From: Shai Erera [mailto:serera [at] gmail]
> Sent: Thursday, November 22, 2007 9:25 PM
> To: java-user [at] lucene
> Subject: Re: help required urgent!!!!!!!!!!!
>
> Hi
>
> You can simply create a PrefixQuery. However, if you're using
> StandardAnalyzer, and the word is added as Index.TOKENIZED,
> sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
> Therefore
> you'll need to add the word as Index.UN_TOKENIZED, or use a different
> Analyzer when you index the data (for this field at least).
>
> Here's a sample code:
>
> // Indexing.
> Document doc = new Document();
> doc.add(new Field("field", "soft-wash", Store.NO,
> Index.UN_TOKENIZED
> ));
>
> // Search
> Query q = new PrefixQuery(new Term("field", "soft-wa"));
>
> Does that help?
>
> On Nov 22, 2007 5:46 PM, Shakti_Sareen < Shakti_Sareen [at] satyam> wrote:
>
> > Hi
> > I am using StandardAnalyser() to index the data.
> > But I want to do a like search on a word containing Hyphen
> > For example it want to search a word "soft-wa*"
> >
> > I am getting no hits for that. It is said that if the hyphen is there
> in
> > the word, then we should include that word in the double quotes (").
> But
> > enclosing the word in a double quotes (") means the exact word search.
> >
> > How can I perform the like search on a word containing hyphen???????
> >
> > Please help.
> >
> > Regards,
> > Shakti Sareen
> >
> >
> >
> >
> >
> > DISCLAIMER:
> > This email (including any attachments) is intended for the sole use of
> the
> > intended recipient/s and may contain material that is CONFIDENTIAL AND
> > PRIVATE COMPANY INFORMATION. Any review or reliance by others or
> copying or
> > distribution or forwarding of any or all of the contents in this
> message is
> > STRICTLY PROHIBITED. If you are not the intended recipient, please
> contact
> > the sender by email and delete all copies; your cooperation in this
> regard
> > is appreciated.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
>
> --
> Regards,
>
> Shai Erera
>
>
> DISCLAIMER:
> This email (including any attachments) is intended for the sole use of the
> intended recipient/s and may contain material that is CONFIDENTIAL AND
> PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or
> distribution or forwarding of any or all of the contents in this message is
> STRICTLY PROHIBITED. If you are not the intended recipient, please contact
> the sender by email and delete all copies; your cooperation in this regard
> is appreciated.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>



--
Regards,

Shai Erera


serera at gmail

Nov 22, 2007, 8:31 AM

Post #6 of 7 (460 views)
Permalink
Re: help required urgent!!!!!!!!!!! [In reply to]

Yep - that's a good one. Only it may be a very heavy query, and throw
TooManyClausesException, if the number of terms that start with the prefix
is too much. But that certainly would work.
BTW - MultiPhraseQuery's documentation specifically explains how to use it
for exactly the same purpose.

On Nov 22, 2007 6:19 PM, mark harwood <markharw00d [at] yahoo> wrote:

> >>Re: help required urgent!!!!!!!!!!!
>
> Yikes!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> I'm guessing that the question was more about how to support this in the
> standard query syntax where there are multiple words.
>
> i.e. http://www.google.com/search?q=lucene+wildcard+in+phrase
>
> This post seems close to a solution to that problem:
>
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200607.mbox/%3CB50FE8BF4F9CF24FB800E28DBB1A2FE026D681 [at] gaia%3E
>
>
> Cheers,
> Mark
>
>
> ----- Original Message ----
> From: Shai Erera <serera [at] gmail>
> To: java-user [at] lucene
> Sent: Thursday, 22 November, 2007 3:54:51 PM
> Subject: Re: help required urgent!!!!!!!!!!!
>
> Hi
>
> You can simply create a PrefixQuery. However, if you're using
> StandardAnalyzer, and the word is added as Index.TOKENIZED,
> sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
> Therefore
> you'll need to add the word as Index.UN_TOKENIZED, or use a different
> Analyzer when you index the data (for this field at least).
>
> Here's a sample code:
>
> // Indexing.
> Document doc = new Document();
> doc.add(new Field("field", "soft-wash", Store.NO,
> Index.UN_TOKENIZED
> ));
>
> // Search
> Query q = new PrefixQuery(new Term("field", "soft-wa"));
>
> Does that help?
>
> On Nov 22, 2007 5:46 PM, Shakti_Sareen <Shakti_Sareen [at] satyam>
> wrote:
>
> > Hi
> > I am using StandardAnalyser() to index the data.
> > But I want to do a like search on a word containing Hyphen
> > For example it want to search a word "soft-wa*"
> >
> > I am getting no hits for that. It is said that if the hyphen is there
> in
> > the word, then we should include that word in the double quotes (").
> But
> > enclosing the word in a double quotes (") means the exact word
> search.
> >
> > How can I perform the like search on a word containing hyphen???????
> >
> > Please help.
> >
> > Regards,
> > Shakti Sareen
> >
> >
> >
> >
> >
> > DISCLAIMER:
> > This email (including any attachments) is intended for the sole use
> of the
> > intended recipient/s and may contain material that is CONFIDENTIAL
> AND
> > PRIVATE COMPANY INFORMATION. Any review or reliance by others or
> copying or
> > distribution or forwarding of any or all of the contents in this
> message is
> > STRICTLY PROHIBITED. If you are not the intended recipient, please
> contact
> > the sender by email and delete all copies; your cooperation in this
> regard
> > is appreciated.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
>
> --
> Regards,
>
> Shai Erera
>
>
>
>
>
> ___________________________________________________________
> Want ideas for reducing your carbon footprint? Visit Yahoo! For Good
> http://uk.promotions.yahoo.com/forgood/environment.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Regards,

Shai Erera


matthijs at impressie

Nov 22, 2007, 8:51 AM

Post #7 of 7 (460 views)
Permalink
Re: help required urgent!!!!!!!!!!! [In reply to]

Hi

Simply create your own analyzer with JavaCC. See the repository for the
latest StandardAnalyzer.jj file, make sure the Analyzer accepts anything
with a hypen as a single token.
And try not to yell, please. Most of the questions are urgent, there is
no need for emphasis - especially in this manner.

Good luck,
Matthijs



Shakti_Sareen wrote:
> Hi
>
> But the file I am indexing is very big and I don't know which word will
> contain the hyphen. The thing you suggest can be implemented only if
> there are some specific words in the file.
>
> Apart from StandardAnalyzer I have got no option.
>
> Thanks a lot for your reply.
>
> Please suggest me how can I go ahead.
>
>
> SHAKTI SAREEN
> GE-GDC
> STC HYDERABAD
> 9948777794
>
> -----Original Message-----
> From: Shai Erera [mailto:serera [at] gmail]
> Sent: Thursday, November 22, 2007 9:25 PM
> To: java-user [at] lucene
> Subject: Re: help required urgent!!!!!!!!!!!
>
> Hi
>
> You can simply create a PrefixQuery. However, if you're using
> StandardAnalyzer, and the word is added as Index.TOKENIZED,
> sotf-wa<something> will be broken to 'soft' and 'wa<something>'.
> Therefore
> you'll need to add the word as Index.UN_TOKENIZED, or use a different
> Analyzer when you index the data (for this field at least).
>
> Here's a sample code:
>
> // Indexing.
> Document doc = new Document();
> doc.add(new Field("field", "soft-wash", Store.NO,
> Index.UN_TOKENIZED
> ));
>
> // Search
> Query q = new PrefixQuery(new Term("field", "soft-wa"));
>
> Does that help?
>
> On Nov 22, 2007 5:46 PM, Shakti_Sareen <Shakti_Sareen [at] satyam> wrote:
>
>
>> Hi
>> I am using StandardAnalyser() to index the data.
>> But I want to do a like search on a word containing Hyphen
>> For example it want to search a word "soft-wa*"
>>
>> I am getting no hits for that. It is said that if the hyphen is there
>>
> in
>
>> the word, then we should include that word in the double quotes (").
>>
> But
>
>> enclosing the word in a double quotes (") means the exact word search.
>>
>> How can I perform the like search on a word containing hyphen???????
>>
>> Please help.
>>
>> Regards,
>> Shakti Sareen
>>
>>
>>
>>
>>
>> DISCLAIMER:
>> This email (including any attachments) is intended for the sole use of
>>
> the
>
>> intended recipient/s and may contain material that is CONFIDENTIAL AND
>> PRIVATE COMPANY INFORMATION. Any review or reliance by others or
>>
> copying or
>
>> distribution or forwarding of any or all of the contents in this
>>
> message is
>
>> STRICTLY PROHIBITED. If you are not the intended recipient, please
>>
> contact
>
>> the sender by email and delete all copies; your cooperation in this
>>
> regard
>
>> is appreciated.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
>
>
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.