Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Reverse keyword search?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


unclelonghair0 at gmail

Apr 27, 2012, 4:35 AM

Post #1 of 2 (191 views)
Permalink
Reverse keyword search?

Hello,

I am relatively new to Lucene, this might be a noob question, if so please redirect me. I'd like some guidance on how to use Lucene to address a problem.

I have a set of a few hundred (and growing) user-defined keywords such as "spain" and "volkswagen" and each of which is associated to one of about 20 categories, such as "world" and "automotive". My challenge is to use the summary (title, description, caption, meta-tags, keywords, but not the entire content) from a news article such as what you might find on cnn.com and look for those keywords in the article, to identify the article's category. The article's summary is often "dirty" with special characters, commas, hash tags, etc. and so needs to be tokenized. I would also like to utilize Lucene's natural language processing to match "spanish" to "spain" for example.

This appears to be somewhat the reverse of the typical Lucene use case -- rather than having a set of say 1000 of articles which are indexed, then issuing a query using a few keywords to search on those articles, I have a set of say 1000 keywords, and a single article, and I want to determine which keyword best fits the article's summary. How to best use Lucene to handle this?

I have considered:

1) Creating a Lucene index of the keywords and topics, then tokenizing the summaries using Lucene's tokenizers, then issuing queries with the tokens to find the best match
2) Indexing the article summary, then iterating over all of the keywords, issuing a query for each of them, then keeping the best match.
3) Learning how Lucene does the individual keyword-to-keyword matching and writing some custom solution.

I'd appreciate it if someone could point me in the right direction.

Randy


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


iorixxx at yahoo

Apr 27, 2012, 5:18 AM

Post #2 of 2 (187 views)
Permalink
Re: Reverse keyword search? [In reply to]

> This appears to be somewhat the reverse of the typical
> Lucene use case -- rather than having a set of say 1000 of
> articles which are indexed, then issuing a query using a few
> keywords to search on those articles, I have a set of say
> 1000 keywords, and a single article, and I want to determine
> which keyword best fits the article's summary.  How to
> best use Lucene to handle this?

Not used myself but MemoryIndex seems what you are after.

http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/memory/MemoryIndex.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.