
kuro at basistech
Apr 9, 2012, 11:57 PM
Post #4 of 5
(324 views)
Permalink
|
Please disregard this suggestion. It is a bad idea. Almost every text would have a verb, noun, etc. so search on a pos tag only field won't make sense. Maybe the parallel field should have a lemma (dictionary form) plus part-of-speech tag putting together as a token like "like_verb", "lemming_propernoun"? On 4/9/12 1:10 PM, T. Kuro Kurosaka wrote: > If you want to search on part-of-speech tag, I'd just make a parallel > field ("text_pos" for the field "text", for example) and search on > that field (text_pos:noun). > > Kuro > > On 3/14/12 9:37 AM, Mark McGuire wrote: >> I'm working on a project where I need to tag both the part of speech >> and other syntactic information on tokens so that this information is >> searchable. I have read the threads on the mailing list regarding >> part of speech tagging here >> <http://mail-archives.apache.org/mod_mbox/lucene-java-user/201105.mbox/%3CBANLkTimwqcQ_GF2pxE8Hyc_R75NcWDRWbQ [at] mail%3E> >> and the many responses to similar questions. To me, inserting 0 >> increment tokens seems rather clunky, especially when TypeAttributes >> appear to be what one would want to use. Does Lucene do anything >> extra when the Type is set to or not set to its default, "word"? Is >> it possible to write a search that uses multiple attributes from >> TokenAttributes (ie a search that searches for CharTermAttribute >> "dog" followed by a TypeAttribute of verb)? >> >> Also if I were to use 0 increment tokens for tagging, would data like >> document length or sumTotalTermFreq be different from a document >> indexed without these tags? How would I counteract these differences >> if any occur? >> >> Thanks, >> Mark McGuire >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|