Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

is it possible to make lucene searches match based on per doc field:termcount?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


jason at hardlight

Nov 5, 2009, 4:31 PM

Post #1 of 3 (127 views)
Permalink
is it possible to make lucene searches match based on per doc field:termcount?

Hi All, I hope someone can offer some advice.
I want to extend lucene to search in a particular way(if it cant already):

I want to index docs, each with file containing several terms something
like:
doc1=>myfield:a
doc2=>myfield:a,b
doc3=>myfield:a,b,c
doc4=>myfield:a,b,c,d
so far nothing new.

I want to query for matching docs such that a query something like
myfield:(a or b) should only return docs if the doc itself is FULLY
matched.
ie, for the query myfield:(a or b) , only doc1 and doc2 should match.
So the rules are its only a match if the termcount for each doc is <=the
termcount of the query(for that field) AND ALL the terms in the doc were
matched

a few more examples just to clarify:
myfield:(a or b or d) would match doc4
myfield:(a or b or c or d) would match ALL the docs here (this one works
anyway but only because it uses all the terms that exist)
myfield:(a) would match doc1

order is not important (but might be a nice have)

can anyone tell me if its possible to make lucene do this, and perhaps
offer a starting point?
thanks
Jason.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


gsingers at apache

Nov 5, 2009, 5:11 PM

Post #2 of 3 (122 views)
Permalink
Re: is it possible to make lucene searches match based on per doc field:termcount? [In reply to]

On Nov 5, 2009, at 4:31 PM, Jason Eacott wrote:

> Hi All, I hope someone can offer some advice.
> I want to extend lucene to search in a particular way(if it cant
> already):
>
> I want to index docs, each with file containing several terms
> something like:
> doc1=>myfield:a
> doc2=>myfield:a,b
> doc3=>myfield:a,b,c
> doc4=>myfield:a,b,c,d
> so far nothing new.
>
> I want to query for matching docs such that a query something like
> myfield:(a or b) should only return docs if the doc itself is FULLY
> matched.
> ie, for the query myfield:(a or b) , only doc1 and doc2 should match.
> So the rules are its only a match if the termcount for each doc is
> <=the termcount of the query(for that field) AND ALL the terms in
> the doc were matched
>
> a few more examples just to clarify:
> myfield:(a or b or d) would match doc4
> myfield:(a or b or c or d) would match ALL the docs here (this one
> works anyway but only because it uses all the terms that exist)
> myfield:(a) would match doc1
>
> order is not important (but might be a nice have)
>
> can anyone tell me if its possible to make lucene do this, and
> perhaps offer a starting point?

Would overriding http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DefaultSimilarity.html#coord%28int,%20int%29
help?


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


jason at hardlight

Nov 6, 2009, 12:50 AM

Post #3 of 3 (113 views)
Permalink
Re: is it possible to make lucene searches match based on per doc field:termcount? [In reply to]

looks like it, thanks!
but if I index multiple copies of the same value, ie: myfield:a,a,b
and search with myfield:(a or b)
or perhaps myfield:(a or a or b)

will I be able to tell the difference? is this apparently duplicate data
kept as part of the query? (I'd like to be able to do this too)

Cheers
Jason.




Grant Ingersoll wrote:
>
> On Nov 5, 2009, at 4:31 PM, Jason Eacott wrote:
>
>> Hi All, I hope someone can offer some advice.
>> I want to extend lucene to search in a particular way(if it cant
>> already):
>>
>> I want to index docs, each with file containing several terms
>> something like:
>> doc1=>myfield:a
>> doc2=>myfield:a,b
>> doc3=>myfield:a,b,c
>> doc4=>myfield:a,b,c,d
>> so far nothing new.
>>
>> I want to query for matching docs such that a query something like
>> myfield:(a or b) should only return docs if the doc itself is FULLY
>> matched.
>> ie, for the query myfield:(a or b) , only doc1 and doc2 should match.
>> So the rules are its only a match if the termcount for each doc is
>> <=the termcount of the query(for that field) AND ALL the terms in the
>> doc were matched
>>
>> a few more examples just to clarify:
>> myfield:(a or b or d) would match doc4
>> myfield:(a or b or c or d) would match ALL the docs here (this one
>> works anyway but only because it uses all the terms that exist)
>> myfield:(a) would match doc1
>>
>> order is not important (but might be a nice have)
>>
>> can anyone tell me if its possible to make lucene do this, and perhaps
>> offer a starting point?
>
> Would overriding
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DefaultSimilarity.html#coord%28int,%20int%29 help?
>
>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.