
ronchalant at gmail
Jul 21, 2008, 12:05 PM
Post #9 of 9
(448 views)
Permalink
|
|
Re: Boolean expression for no terms OR matching a wildcard
[In reply to]
|
|
Thanks Steve, this looks promising even if it doesn't perform the best. I'll run some tests on what produces the best results. -Ron On Jul 21, 2008, at 3:00 PM, Steven A Rowe wrote: > Hi Ronald, > > Caveat - I haven't tested this, but: > > With a RegexQuery <http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/regex/RegexQuery.html > >, I think you can do something like (using your example): > > +abc*123 -{Regex}(?!abc.*123$) > > This query would include all documents that have terms that match > the wildcard "abc*123", and exclude all documents containing terms > that don't match regex "^abc.*123$". > > Note that the Lucene QueryParser doesn't handle regex queries (and > if it did, the syntax would probably be different than "{Regex}" - > this was intended solely for purposes of exposition). As a result, > you would have to manually construct the RegexQuery and combine it > using BooleanQuery clauses with your wildcard query. > > The "(?!...)" syntax is a negative lookahead assertion - this is a > Java 1.4+ java.util.regex.Pattern feature. Note that wildcard > expressions are easily programmatically converted to regular > expressions by substituting "*"->".*" and "?"->".", and then adding > the "$" anchor. The "^" anchor is not required with RegexQuery's, > because when using the java.util.regex engine (the default engine), > j.u.r.Matcher.lookingAt() is used; from <http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html#lookingAt() > >: > > Attempts to match the input sequence, starting at the > beginning, against the pattern. > > Like the matches method, this method always starts at the > beginning of the input sequence; unlike that method, it > does not require that the entire input sequence be matched. > > Caveat #2: RegexQuery's are relatively slow, since *all* index terms > have to be tested against the regular expression, so you may have to > use some other method if query response time turns out to be a > problem. > > Steve > > On 07/20/2008 at 8:29 AM, Ronald Rudy wrote: >> A query solution is preferable.. but I can programmatically >> filter my results after the fact, it just seems like something that >> the Lucene team should consider adding.. I think it would only have >> value for wildcard queries, but nonetheless it would have some value >> I think.. >> >> -Ron >> >> On Jul 18, 2008, at 6:24 PM, eks dev wrote: >> >>> Analyzer that detects your condition "ALL match something", if >>> possible at all... >>> e.g. "800123456 80034543534 80023423423" -> 800 >>> >>> than you put it in ALL_MATCH field and match this condition against >>> it... if this prefix needs to be variable, you could extract all >>> matching prefixes to this fiield an make your query work like >>> "ALL_MATCH:800" and care not for the rest :) than yo would not need >>> field1 at all for these queries >>> >>> you were looking for something like this or you need "Query >>> solution"? >>> >>> ----- Original Message ---- >>>> From: Chris Hostetter <hossman_lucene[at]fucit.org> >>>> To: java-user[at]lucene.apache.org >>>> Sent: Saturday, 19 July, 2008 12:00:39 AM >>>> Subject: Re: Boolean expression for no terms OR matching a wildcard >>>> >>>>> Maybe this is easier ... suppose what I'm indexing is a phone >>>>> number, >>>>> and there are multiple phone numbers for what I'm indexing under >>>>> the >>>>> same field (phone) and I want the wildcard query to match only >>>>> records that have either no phone numbers at all OR where ALL >>>>> phone >>>>> numbers are in a specific area code (e.g. 800* would match all >>>>> in the >>>>> 800 area code). >>>> >>>> i can't think of anyway to accomplish the second part of your >>>> query. >>>> specificly, given the following records... >>>> >>>> Doc1: field1:AAA, field1:Aaa, field1:Bb, field1:C, field2:X, >>>> field3:Y >>>> Doc2: field1:AAA, field1:Aaa, field1:Aa, field2:Z >>>> >>>> ...i can't think of any type of query like field1:A* which would >>>> match >>>> Doc2 but not Doc1 (because there are other field1 values that do >>>> not start with 'A') >>>> >>>> -Hoss > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org > For additional commands, e-mail: java-user-help[at]lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org For additional commands, e-mail: java-user-help[at]lucene.apache.org
|