Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

filter by term frequency

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


sokolov at ifactory

Jun 16, 2012, 11:33 AM

Post #1 of 3 (573 views)
Permalink
filter by term frequency

I imagine this is a question that comes up from time to time, but I
haven't been able to find a definitive answer anywhere, so...

I'm wondering whether there is some type of Lucene query that filters by
term frequency. For example, suppose I want to find all documents that
have exactly 2 occurrences of some word. I know that the frequency is
stored and used in scoring , but I don't think it is exposed in a simple
way at the query level. It looks to me as if CustomScoreQuery might be
a convenient way to monkey with scores? But it doesn't seem to use that
for filtering, just sorting. Perhaps a Collector could then impose a
score threshold later? Any suggestions here?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jack at basetechnology

Jun 16, 2012, 2:26 PM

Post #2 of 3 (559 views)
Permalink
Re: filter by term frequency [In reply to]

If you were a *Solr* user, I could say "try the 'termfreq' function query":

termfreq(field,term) returns the number of times the term appears in the
field for that document.
Example Syntax: termfreq(text,'memory')

See:
http://wiki.apache.org/solr/FunctionQuery#tf

Lucene does have "FunctionQuery", "ValueSource", and "TermFreqValueSource".

See:
http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html

-- Jack Krupansky

-----Original Message-----
From: Mike Sokolov
Sent: Saturday, June 16, 2012 2:33 PM
To: java-user [at] lucene
Subject: filter by term frequency

I imagine this is a question that comes up from time to time, but I
haven't been able to find a definitive answer anywhere, so...

I'm wondering whether there is some type of Lucene query that filters by
term frequency. For example, suppose I want to find all documents that
have exactly 2 occurrences of some word. I know that the frequency is
stored and used in scoring , but I don't think it is exposed in a simple
way at the query level. It looks to me as if CustomScoreQuery might be
a convenient way to monkey with scores? But it doesn't seem to use that
for filtering, just sorting. Perhaps a Collector could then impose a
score threshold later? Any suggestions here?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


sokolov at ifactory

Jun 17, 2012, 7:57 PM

Post #3 of 3 (556 views)
Permalink
Re: filter by term frequency [In reply to]

Thanks, Jack!

On 6/16/2012 5:26 PM, Jack Krupansky wrote:
> If you were a *Solr* user, I could say "try the 'termfreq' function
> query":
>
> termfreq(field,term) returns the number of times the term appears
> in the field for that document.
> Example Syntax: termfreq(text,'memory')
>
> See:
> http://wiki.apache.org/solr/FunctionQuery#tf
>
> Lucene does have "FunctionQuery", "ValueSource", and
> "TermFreqValueSource".
>
> See:
> http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mike Sokolov
> Sent: Saturday, June 16, 2012 2:33 PM
> To: java-user [at] lucene
> Subject: filter by term frequency
>
> I imagine this is a question that comes up from time to time, but I
> haven't been able to find a definitive answer anywhere, so...
>
> I'm wondering whether there is some type of Lucene query that filters by
> term frequency. For example, suppose I want to find all documents that
> have exactly 2 occurrences of some word. I know that the frequency is
> stored and used in scoring , but I don't think it is exposed in a simple
> way at the query level. It looks to me as if CustomScoreQuery might be
> a convenient way to monkey with scores? But it doesn't seem to use that
> for filtering, just sorting. Perhaps a Collector could then impose a
> score threshold later? Any suggestions here?
>
> -Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.