
phuctvcc at gmail
Mar 26, 2012, 7:31 AM
Views: 164
Permalink
|
I want create indexWriter for character Ngram. ex: Lucene is a great language. Then i want to use Ngram with n=3 to become: Luc uce cen ene is a gre rea eat.... my code: IndexWriter writer = new IndexWriter(INDEX_DIR, new PositionalPorterStopAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);IndexWriter. MaxFieldLength.UNLIMITED); writer.setMaxFieldLength(100000); Reader reader = new FileReader(f); Document doc = new Document(); NGramTokenizer token=new NGramTokenizer(token,3,3); doc.add(new Field("contents", new FileReader(f))); doc.add(new Field("vector",token,Field.TermVector.YES)); With above code I only create IndexWriter for token with extract 3 character but it is not gram. Who can help me for this issues? because token on above NgramTokenizer only extract 3 character without 3 character of Ngram? Thanks very much in advance for your help? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-writer-for-Ngram-tp3858312p3858312.html Sent from the Lucene - General mailing list archive at Nabble.com.
|