Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General
Index writer for Ngram
 

Index | Next | Previous | View Flat


phuctvcc at gmail

Mar 26, 2012, 7:31 AM


Views: 222
Permalink
Index writer for Ngram

I want create indexWriter for character Ngram. ex: Lucene is a great
language. Then i want to use Ngram with n=3 to become: Luc uce cen ene is a
gre rea eat....

my code:


IndexWriter writer = new IndexWriter(INDEX_DIR, new
PositionalPorterStopAnalyzer(), true,
IndexWriter.MaxFieldLength.UNLIMITED);IndexWriter.
MaxFieldLength.UNLIMITED);
writer.setMaxFieldLength(100000);
Reader reader = new FileReader(f);
Document doc = new Document();
NGramTokenizer token=new NGramTokenizer(token,3,3);
doc.add(new Field("contents", new FileReader(f)));
doc.add(new Field("vector",token,Field.TermVector.YES));


With above code I only create IndexWriter for token with extract 3 character
but it is not gram.
Who can help me for this issues? because token on above NgramTokenizer only
extract 3 character without 3 character of Ngram?
Thanks very much in advance for your help?

--
View this message in context: http://lucene.472066.n3.nabble.com/Index-writer-for-Ngram-tp3858312p3858312.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Subject User Time
Index writer for Ngram phuctvcc at gmail Mar 26, 2012, 7:31 AM

  Index | Next | Previous | View Flat
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.