
jack at basetechnology
Oct 25, 2012, 6:18 PM
Post #10 of 10
(198 views)
Permalink
|
Right another level of BooleanQuery that is a SHOULD clause, with TWO terms: a MUST of MatchAllDocsQuery and a MUST_NOT of the TermRangeQuery for "allergies" with null for both start and end. Actually, there is a new filter that you can use to detect empty fields down at that level. See https://issues.apache.org/jira/browse/LUCENE-4386 I think it is: new ConstantScoreQuery(new FieldValueFilter(fieldname, false)) Use a SHOULD of that rather than a second level of BooleanQuery. Let us know if it actually works! -- Jack Krupansky -----Original Message----- From: Vitaly Funstein Sent: Thursday, October 25, 2012 8:55 PM To: java-user [at] lucene Subject: Re: query for documents WITHOUT a field? This is the QueryParser syntax, right? So an API equivalent for the not null case would be something like this? BooleanQuery q = new BooleanQuery(); q.add(new BooleanClause(new TermQuery(new Term("first_name", "Zed")), Occur.SHOULD)); q.add(new BooleanClause(new TermRangeQuery("allergies", null, null, true, true), Occur.SHOULD)); Whereas, for "IS NULL" the TermRangeQuery above would need to be wrapped in another BooleanClause with Occur.MUST_NOT? On Thu, Oct 25, 2012 at 5:29 PM, Jack Krupansky <jack [at] basetechnology>wrote: > "OR allergies IS NULL" would be "OR (*:* -allergies:[* TO *])" in > Lucene/Solr. > > -- Jack Krupansky > > -----Original Message----- From: Vitaly Funstein > Sent: Thursday, October 25, 2012 8:25 PM > To: java-user [at] lucene > Subject: Re: query for documents WITHOUT a field? > > > Sorry for resurrecting an old thread, but how would one go about writing a > Lucene query similar to this? > > SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL > > An AND case would be easy since one would just use a simple TermQuery with > a FieldValueFilter added, but what about other boolean cases? Admittedly, > this is a contrived example, but the point here is that it seems that > since > filters are always applied to results after they are returned, how would > one go about making the null-ness of a field part of the query logic? > > On Thu, Feb 16, 2012 at 1:45 PM, Uwe Schindler <uwe [at] thetaphi> wrote: > > I already mentioned that pseudo NULL term, but the user asked for another >> solution... >> -- >> Uwe Schindler >> H.-H.-Meier-Allee 63, 28213 Bremen >> http://www.thetaphi.de >> >> >> >> Jamie Johnson <jej2003 [at] gmail> schrieb: >> >> Another possible solution is while indexing insert a custom token >> which is impossible to show up in the index otherwise, then do the >> filter based on that token. >> >> >> On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uwe [at] thetaphi> wrote: >> > As the documentation states: >> > Lucene is an inverted index that does not have per-document fields. It >> only >> > knows terms pointing to documents. The query you are searching is a > >> query >> > that returns all documents which have no term. To execute this query, >> > it >> > will get the term index and iterate all terms of a field, mark those in >> > a >> > bitset and negates that. The filter/query I told you uses the >> > FieldCache >> to >> > do this. Since 3.6 (also in 3.5, but there it is buggy/API different) >> there >> > is another fieldcache that returns exactly that bitset. The filter >> mentioned >> > only uses that bitset from this new fieldcache. Fieldcache is populated >> on >> > first access and keeps alive as long as the underlying index segment is >> open >> > (means as long as IndexReader is open and the parts of the index is not >> > refreshed). If you are also sorting against your fields or doing other >> > queries using FieldCache, there is no overhead, otherwise the bitset is >> > populated on first access to the filter. >> > >> > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo >> term is >> > the only solution (and also much faster on the first access in Lucene >> 3.6). >> > Later accesses hitting the cache in 3.6 will be faster, of course. >> > >> > Another hacky way to achieve the same results is (works with almost any >> > Lucene version): >> > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and >> > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do >> > a >> > full term index scan without caching :-). You may use >> CachingWrapperFilter >> > with PrefixFilter instead. >> > >> > ----- >> > Uwe Schindler >> > H.-H.-Meier-Allee 63, D-28213 Bremen >> > http://www.thetaphi.de >> > eMail: uwe [at] thetaphi >> > >> > >> >> -----Original Message----- >> >> From: Tim Eck [mailto:timeck [at] gmail] >> >> Sent: Thursday, February 16, 2012 10:14 PM >> >> To: java-user [at] lucene >> >> Subject: RE: query for documents WITHOUT a field? >> >> >> >> Thanks for the fast response. I'll certainly have a look at the >> >> upcoming >> > 3.6.x >> >> release. What is the expected performance for using a negated filter? >> >> In particular does it defeat the index in any way and require a full >> index >> > scan? >> >> Is it different between regular fields and numeric fields? >> >> >> >> For 3.5 and earlier though, is there any suggestion other than magic >> > values? >> >> >> >> -----Original Message----- >> >> From: Uwe Schindler [mailto:uwe [at] thetaphi] >> >> Sent: Thursday, February 16, 2012 1:07 PM >> >> To: java-user [at] lucene >> >> Subject: RE: query for documents WITHOUT a field? >> >> >> >> Lucene 3.6 will have a FieldValueFilter that can be negated: >> >> >> >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true)); >> >> >> >> (see http://goo.gl/wyjxn) >> >> >> >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from >> > Jenkins: >> >> http://goo.gl/Ka0gr >> >> >> >> ----- >> >> Uwe Schindler >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >> http://www.thetaphi.de >> >> eMail: uwe [at] thetaphi >> >> >> >> >> >> > -----Original Message----- >> >> > From: Tim Eck >> >> > [mailto:teck [at] terracottatech**com<teck [at] terracottatech> >> ] >> >> > Sent: Thursday, February 16, 2012 9:59 PM >> >> > To: java-user [at] lucene >> >> > Subject: query for documents WITHOUT a field? >> >> > >> >> > My apologies if this answer is readily available someplace, I've >> >> > searched around and not found a definitive answer. >> >> > >> >> > >> >> > >> >> > I'd like to run a query for documents that _do not_ contain >> > >> particular >> >> indexed >> >> > fields to implement something like a SQL-like query where a column >> >> > is >> >> null. >> >> > >> >> > >> >> > >> >> > I understand I could possibly use a magic value to represent "null", >> >> > but >> >> the data >> >> > I'm searching doesn't led itself to reserving a value for null. I >> >> > also >> >> understand I >> >> > could add an extra field to hold this boolean isNull state but would >> >> > love >> >> a better >> >> > solution :-) >> >> > >> >> > >> >> > >> >> > TIA >> >> > >> >> > >> >> >> >> >> >> >> >>____________________________**_________________ >> >> >> To unsubscribe, e-mail: >> >> java-user-unsubscribe [at] lucene**apache.org<java-user-unsubscribe [at] lucene> >> >> For additional commands, e-mail: >> >> java-user-help [at] lucene**org<java-user-help [at] lucene> >> >> >> >> >> >> >> >>____________________________**_________________ >> >> >> To unsubscribe, e-mail: >> >> java-user-unsubscribe [at] lucene**apache.org<java-user-unsubscribe [at] lucene> >> >> For additional commands, e-mail: >> >> java-user-help [at] lucene**org<java-user-help [at] lucene> >> > >> > >> >_____________________________**________________ >> >> > To unsubscribe, e-mail: >> > java-user-unsubscribe [at] lucene**apache.org<java-user-unsubscribe [at] lucene> >> > For additional commands, e-mail: >> > java-user-help [at] lucene**org<java-user-help [at] lucene> >> > >> >> ______________________________**_______________ >> >> To unsubscribe, e-mail: >> java-user-unsubscribe [at] lucene**apache.org<java-user-unsubscribe [at] lucene> >> For additional commands, e-mail: >> java-user-help [at] lucene**org<java-user-help [at] lucene> >> >> >> > > ------------------------------**------------------------------**--------- > > To unsubscribe, e-mail: > java-user-unsubscribe [at] lucene**apache.org<java-user-unsubscribe [at] lucene> > For additional commands, e-mail: > java-user-help [at] lucene**org<java-user-help [at] lucene> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|