Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Indexing with Semantics

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


kasunp at opensource

Apr 27, 2012, 8:02 PM

Post #1 of 3 (318 views)
Permalink
Indexing with Semantics

I'm using Lucene's Term Freq vector to calculate cosine similarity between
documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene
takes this as 3 separate terms, but 3 of them means same "owe". Is there
any functionality in Lucene that can be used to index by semantics? so that
it indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ?

If not I'd welcome any suggestions achieving this task?

--
Regards

Kasun Perera


fancyerii at gmail

Apr 27, 2012, 8:06 PM

Post #2 of 3 (310 views)
Permalink
Re: Indexing with Semantics [In reply to]

stemmer
semantic is a "large" word, care to use it.

On Sat, Apr 28, 2012 at 11:02 AM, Kasun Perera <kasunp [at] opensource> wrote:
> I'm using Lucene's Term Freq vector to calculate cosine similarity between
> documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene
> takes this as 3 separate terms, but 3 of them means same "owe". Is there
> any functionality in Lucene that can be used to index by semantics? so that
> it indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ?
>
> If not I'd welcome any suggestions achieving this task?
>
> --
> Regards
>
> Kasun Perera

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ykesten at yahoo-inc

May 3, 2012, 1:59 AM

Post #3 of 3 (296 views)
Permalink
RE: Indexing with Semantics [In reply to]

Hi,
The logic you are looking for is Lemmatization - http://en.wikipedia.org/wiki/Lemmatisation.
I don't think Lucene has a built-in lemmatizer but you can use GATE which is an open source project:
http://gate.ac.uk
http://gate.ac.uk/gate/doc/plugins.html

Enjoy!



-----Original Message-----
From: Kasun Perera [mailto:kasunp [at] opensource]
Sent: Saturday, April 28, 2012 6:03 AM
To: java-user [at] lucene
Subject: Indexing with Semantics

I'm using Lucene's Term Freq vector to calculate cosine similarity between documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene takes this as 3 separate terms, but 3 of them means same "owe". Is there any functionality in Lucene that can be used to index by semantics? so that it indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ?

If not I'd welcome any suggestions achieving this task?

--
Regards

Kasun Perera

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.