erickerickson at gmail
Dec 16, 2011, 12:04 PM
Post #4 of 5
Have you looked at Lucene's "MoreLikeThis"? I confess I haven't
Re: Using Lucene to match document sets to each other
[In reply to]
worked with this enough to recommend *how* to use it, but it seems
like it's in the general area you're talking about.
On Fri, Dec 16, 2011 at 12:53 PM, Josh Stone <pacesysjosh [at] gmail> wrote:
> Thanks for the response Donna. That would make more sense, but the items
> I'm pulling in from the web contain large bodies of text (descriptions)
> whereas the products in my catalog consist of shorter fields such as
> product name, manufacturer, product code, etc. So using the smaller fields
> from my catalog to build queries against the larger fields in the items I
> pull in seems to be the only way to do things (that I can think of).
> And this brings up my exact problem. I have a document (set of fields) that
> I want to use as search criteria for a search against another set of
> documents. Can something like this be done?
> On Fri, Dec 16, 2011 at 5:02 AM, Donna L Gresh <gresh [at] us> wrote:
>> Maybe I'm misunderstanding what you're trying to do, but why not do it the
>> way around; that is, index the items in your catalog, and use the items on
>> the web
>> as the query into the catalog. I have an analogous process (though
>> different application area) and I index the stuff that doesn't change
>> much, and use the
>> things that are constantly changing as the query.
>> Donna L. Gresh
>> Business Analytics and Mathematical Sciences
>> IBM T.J. Watson Research Center
>> (914) 945-2472
>> gresh [at] us
>> Josh Stone <pacesysjosh [at] gmail>
>> java-user [at] lucene
>> 12/15/2011 04:57 PM
>> Using Lucene to match document sets to each other
>> I have a use case for which I'm trying to figure out the best way to use
>> Lucene and could use some guidance.
>> I have a set of documents representing products in a catalog (name,
>> description, etc.). I then pull down data from different sources such as
>> Ebay and Amazon and need to determine if the items retrieved from those
>> sources match any of the products in the catalog. So I'm essentially
>> attempting to take many items and many products and determine where I have
>> I'm not sure the best way to go about this, but one questionable approach
>> is to index the items as I pull them in (to RAM) and do one search for
>> every product in my catalog, looking for matching names or descriptions.
>> This means an almost exponential number of queries though. Is there a
>> better approach? Any help is appreciated.
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene