
paul at hoplahup
Mar 27, 2012, 11:34 PM
Post #3 of 3
(162 views)
Permalink
|
Nilesh, the StandardAnalyzer is full of generally useful special cases, including emails and numbers detection. I am supposing you met one such special case which has a justification of some sort. I can't tell you why but I can tell it's really hard to change because others rely on this somehow (I think). paul Le 27 mars 2012 à 20:03, Nilesh Vijaywargiay a écrit : > I have a string 01a_b-_-c-d which is tokenized as > 01a_b > c > d > > and the string a_b-_-c_d which is tokenized as > a > b > c > d > > why is there a difference when there is a digit at the beginning? I am > using standard unstemmed tokenizer. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|