Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Matching w/in X% ?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


michael_prichard at mac

Jan 21, 2008, 12:38 PM

Post #1 of 3 (942 views)
Permalink
Matching w/in X% ?

Say I have a field of To addresses from an email archive. I do a search and I get 10 To addresses for a single hit. Then I want to find similar email with the To addresses containing roughly 75% of those email addresses as well. How would I do this?

In other words:
I get a result with:
To: foo [at] bar, foo2 [at] bar, foo3 [at] bar, foo4 [at] bar, foo5 [at] bar, foo6 [at] bar

Now I want to find similar emails with 75% of this addresses in the To field.....

Thanks!
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


otis_gospodnetic at yahoo

Jan 21, 2008, 2:12 PM

Post #2 of 3 (805 views)
Permalink
Re: Matching w/in X% ? [In reply to]

I think you'll have to go with MoreLikeThis (assuming your emails as tokenized suitably) and go through matches yourself to check for the % match.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Michael Prichard <michael_prichard [at] mac>
To: java-user [at] lucene
Sent: Monday, January 21, 2008 3:38:14 PM
Subject: Matching w/in X% ?

Say I have a field of To addresses from an email archive. I do a
search and I get 10 To addresses for a single hit. Then I want to find
similar email with the To addresses containing roughly 75% of those email
addresses as well. How would I do this?

In other words:
I get a result with:
To: foo [at] bar, foo2 [at] bar, foo3 [at] bar, foo4 [at] bar,
foo5 [at] bar, foo6 [at] bar

Now I want to find similar emails with 75% of this addresses in the To
field.....

Thanks!
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markharw00d at yahoo

Jan 21, 2008, 3:02 PM

Post #3 of 3 (794 views)
Permalink
Re: Matching w/in X% ? [In reply to]

See BooleanQuery.setMinimumNumberShouldMatch.
Add the addresses as "SHOULD" termQuery clauses and set
minumumNumberShouldMatch to the required value.

Cheers
Mark
> ----- Original Message ----
> From: Michael Prichard <michael_prichard [at] mac>
> To: java-user [at] lucene
> Sent: Monday, January 21, 2008 3:38:14 PM
> Subject: Matching w/in X% ?
>
> Say I have a field of To addresses from an email archive. I do a
> search and I get 10 To addresses for a single hit. Then I want to find
> similar email with the To addresses containing roughly 75% of those email
> addresses as well. How would I do this?
>
> In other words:
> I get a result with:
> To: foo [at] bar, foo2 [at] bar, foo3 [at] bar, foo4 [at] bar,
> foo5 [at] bar, foo6 [at] bar
>
> Now I want to find similar emails with 75% of this addresses in the To
> field.....
>
> Thanks!
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.