Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Doc-Doc Similarity Matrix Construction

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


amir.jadidi at yahoo

Jun 29, 2009, 12:14 PM

Post #1 of 3 (359 views)
Permalink
Doc-Doc Similarity Matrix Construction

Hi,
It's my first experiment with Lucene. Please help me.
I'm
going to index a set of documents and create a feature vector for each
of them. This vector contains all terms belong to the document that
weight using TFIDF.
After that I want to compute the cosine similarity between all documents and produce a doc-doc similarity matrix. My document set is large and it's important to have a scalable implementation.
Would you please provide me a guideline or to-do list?
Thank you and kind regards.


markharw00d at yahoo

Jun 29, 2009, 12:22 PM

Post #2 of 3 (329 views)
Permalink
Re: Doc-Doc Similarity Matrix Construction [In reply to]

See MoreLikeThis in the contrib/queries folder. It optimizes the speed
of similarity comparisons by taking the most significant words only
from a document as search terms.




On 29 Jun 2009, at 20:14, Amir Hossein Jadidinejad wrote:

> Hi,
> It's my first experiment with Lucene. Please help me.
> I'm
> going to index a set of documents and create a feature vector for each
> of them. This vector contains all terms belong to the document that
> weight using TFIDF.
> After that I want to compute the cosine similarity between all
> documents and produce a doc-doc similarity matrix. My document set
> is large and it's important to have a scalable implementation.
> Would you please provide me a guideline or to-do list?
> Thank you and kind regards.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


amir.jadidi at yahoo

Jun 29, 2009, 1:26 PM

Post #3 of 3 (328 views)
Permalink
Re: Doc-Doc Similarity Matrix Construction [In reply to]

It's exactly my question: http://www.mail-archive.com/lucene-user[at]jakarta.apache.org/msg04915.html

--- On Mon, 6/29/09, Amir Hossein Jadidinejad <amir.jadidi[at]yahoo.com> wrote:

From: Amir Hossein Jadidinejad <amir.jadidi[at]yahoo.com>
Subject: Doc-Doc Similarity Matrix Construction
To: java-user[at]lucene.apache.org
Date: Monday, June 29, 2009, 3:14 PM

Hi,
It's my first experiment with Lucene. Please help me.
I'm
going to index a set of documents and create a feature vector for each
of them. This vector contains all terms belong to the document that
weight using TFIDF.
After that I want to compute the cosine similarity between all documents and produce a doc-doc similarity matrix. My document set is large and it's important to have a scalable implementation.
Would you please provide me a guideline or to-do list?
Thank you and kind regards.


     

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.