Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Range queries in successive positions

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


s_hangal at yahoo

Mar 1, 2012, 11:22 PM

Post #1 of 3 (200 views)
Permalink
Range queries in successive positions

Hi,
I'm new to Lucene. I'm indexed some documents with Lucene and need to sanitize it to ensure
that they do not have any social security numbers (3-digits 2-digits 4-digits).

(How) Can I write a query (with the QueryParser) that searches for this pattern?

e.g. I can do [000 to 999] or [00 to 99] or [0000 to 9999], but this causes hits with any 2, 3 or 4 digit number.
Something like "[000 to 999] [00 TO 99] [0000 TO 9999]", I get no hits at all.

Is this possible with the default QueryParser?
Or is there some other programmatic way to do it?
thanks,
Sandeep


trejkaz at trypticon

Mar 1, 2012, 11:26 PM

Post #2 of 3 (185 views)
Permalink
Re: Range queries in successive positions [In reply to]

On Fri, Mar 2, 2012 at 6:22 PM, su ha <s_hangal [at] yahoo> wrote:
> Hi,
> I'm new to Lucene. I'm indexed some documents with Lucene and need to sanitize it to ensure
> that they do not have any social security numbers (3-digits 2-digits 4-digits).
>
> (How) Can I write a query (with the QueryParser) that searches for this pattern?
>
> e.g. I can do [000 to 999] or [00 to 99] or [0000 to 9999], but this causes hits with any 2, 3 or 4 digit number.
> Something like "[000 to 999] [00 TO 99] [0000 TO 9999]", I get no hits at all.
>
> Is this possible with the default QueryParser?
> Or is there some other programmatic way to do it?

The programmatic way is to use SpanMultiTermQueryWrapper around each
RangeQuery and then SpanNearQuery around the lot.

The default QueryParser probably can't do it. I believe someone was
enhancing it for wildcards but I'm not sure if range queries were
included in all that.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Mar 2, 2012, 1:21 AM

Post #3 of 3 (177 views)
Permalink
Re: Range queries in successive positions [In reply to]

Or take a look at search.regex.RegexQuery contrib module. You won't
be able to use that via QueryParser either.

It might make more sense to do the sanitizing before indexing rather than after.


--
Ian.


On Fri, Mar 2, 2012 at 7:26 AM, Trejkaz <trejkaz [at] trypticon> wrote:
> On Fri, Mar 2, 2012 at 6:22 PM, su ha <s_hangal [at] yahoo> wrote:
>> Hi,
>> I'm new to Lucene. I'm indexed some documents with Lucene and need to sanitize it to ensure
>> that they do not have any social security numbers (3-digits 2-digits 4-digits).
>>
>> (How) Can I write a query (with the QueryParser) that searches for this pattern?
>>
>> e.g. I can do [000 to 999] or [00 to 99] or [0000 to 9999], but this causes hits with any 2, 3 or 4 digit number.
>> Something like "[000 to 999] [00 TO 99] [0000 TO 9999]", I get no hits at all.
>>
>> Is this possible with the default QueryParser?
>> Or is there some other programmatic way to do it?
>
> The programmatic way is to use SpanMultiTermQueryWrapper around each
> RangeQuery and then SpanNearQuery around the lot.
>
> The default QueryParser probably can't do it. I believe someone was
> enhancing it for wildcards but I'm not sure if range queries were
> included in all that.
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.