Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

N-gram

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


rajeshm at dessci

Jul 18, 2005, 1:41 PM

Post #1 of 2 (1129 views)
Permalink
N-gram

At what point do I add n-grams? Does the order in which I add n-grams
affect exact phrase queries later? My questions are

(1) Should I add all the 1-grams followed by 2-grams followed by
3-grams..etc sentence by sentence OR
(2) Add all the 1 grams of entire document first before starting 2-grams
for the entire document?

What is the general accepted notion of adding n-grams of a document?

thanks,

Rajesh


madhusasidhar at gmail

Jul 18, 2005, 1:56 PM

Post #2 of 2 (1048 views)
Permalink
Re: N-gram [In reply to]

Rajesh
I am not sure what your eventual goal is - but it looks like you are using
Lucene is some sort of Natural Language Processing environment - I am doing
something similar - with dotLucene. Possibly the SpanQuery is what you want
that will let you specify the Span - hence 1-gram, 2-gram etc. Email me if
you want samples (C#)
Madhu


On 7/18/05, Rajesh Munavalli <rajeshm [at] dessci> wrote:
>
> At what point do I add n-grams? Does the order in which I add n-grams
> affect exact phrase queries later? My questions are
>
> (1) Should I add all the 1-grams followed by 2-grams followed by
> 3-grams..etc sentence by sentence OR
> (2) Add all the 1 grams of entire document first before starting 2-grams
> for the entire document?
>
> What is the general accepted notion of adding n-grams of a document?
>
> thanks,
>
> Rajesh
>
>

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.