
markharw00d at yahoo
Jun 29, 2009, 12:22 PM
Post #2 of 3
(511 views)
Permalink
|
|
Re: Doc-Doc Similarity Matrix Construction
[In reply to]
|
|
See MoreLikeThis in the contrib/queries folder. It optimizes the speed of similarity comparisons by taking the most significant words only from a document as search terms. On 29 Jun 2009, at 20:14, Amir Hossein Jadidinejad wrote: > Hi, > It's my first experiment with Lucene. Please help me. > I'm > going to index a set of documents and create a feature vector for each > of them. This vector contains all terms belong to the document that > weight using TFIDF. > After that I want to compute the cosine similarity between all > documents and produce a doc-doc similarity matrix. My document set > is large and it's important to have a scalable implementation. > Would you please provide me a guideline or to-do list? > Thank you and kind regards. > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|