
lucene at mikemccandless
Aug 27, 2008, 4:26 PM
Post #36 of 47
(1410 views)
Permalink
|
OK I'll open an issue to do this renaming in 3.0, which actually means we do the renaming in 2.4 or 2.9 (deprecating the old ones) then in 3.0 removing the old ones. Mike On Aug 27, 2008, at 11:08 AM, Otis Gospodnetic wrote: > Nah, I think the names are fine, I simply forgot. I looked at the > javadocs, it clearly says NO_NORMS doesn't get passed through an > Analyzer. Maybe in 3.0 we can switch to NOT_ANALYZED, as suggested, > to reflect reality more closely. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- >> From: Michael McCandless <lucene [at] mikemccandless> >> To: java-user [at] lucene >> Sent: Wednesday, August 27, 2008 5:36:46 AM >> Subject: Re: Case Sensitivity >> >> >> Actually, as confusing as it is, Field.Index.NO_NORMS means >> Field.Index.UN_TOKENIZED plus field.setOmitNorms(true). >> >> Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? >> >> Mike >> >> Otis Gospodnetic wrote: >> >>> Dino, you lost me half-way through your email :( >>> >>> NO_NORMS does not mean the field is not tokenized. >>> UN_TOKENIZED does mean the field is not tokenized. >>> >>> >>> Otis-- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> ----- Original Message ---- >>>> From: Dino Korah >>>> To: java-user [at] lucene >>>> Sent: Tuesday, August 26, 2008 9:17:49 AM >>>> Subject: RE: Case Sensitivity >>>> >>>> I think I should rephrase my question. >>>> >>>> [. Context: Using out of the box StandardAnalyzer for indexing and >>>> searching. >>>> ] >>>> >>>> Is it right to say that a field, if either UN_TOKENIZED or >>>> NO_NORMS- >>>> ized ( >>>> field.setOmitNorms(true) ), it doesn't get analyzed while indexing? >>>> Which means that when we search, it gets thru the analyzer and we >>>> need to >>>> analyze them differently in the analyzer we use for searching? >>>> Doesn't it mean that a setOmitNorms(true) field also doesn't get >>>> tokenized? >>>> >>>> What is the best solution if one was to add a set of fields >>>> UN_TOKENIZED and >>>> others TOKENIZED, of the later set a few with setOmitNorms(true) >>>> (the index >>>> writer is plain StandardAnalyzer based)? A per field analyzer at >>>> query time >>>> ?! >>>> >>>> Many thanks, >>>> Dino >>>> >>>> >>>> -----Original Message----- >>>> From: Dino Korah [mailto:dckorah [at] gmail] >>>> Sent: 26 August 2008 12:12 >>>> To: 'java-user [at] lucene' >>>> Subject: RE: Case Sensitivity >>>> >>>> A little more case sensitivity questions. >>>> >>>> Based on the discussion on http://markmail.org/message/q7dqr4r7o6t6dgo5 >>>> and >>>> on this thread, is it right to say that a field, if either >>>> UN_TOKENIZED or >>>> NO_NORMS-ized, it doesn't get analyzed while indexing? Which means >>>> we need >>>> to case-normalize (down-case) those fields before hand? >>>> >>>> Doest it mean that if I can afford, I should use norms. >>>> >>>> Many thanks, >>>> Dino >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Steven A Rowe [mailto:sarowe [at] syr] >>>> Sent: 19 August 2008 17:43 >>>> To: java-user [at] lucene >>>> Subject: RE: Case Sensitivity >>>> >>>> Hi Dino, >>>> >>>> I think you'd benefit from reading some FAQ answers, like: >>>> >>>> "Why is it important to use the same analyzer type during indexing >>>> and >>>> search?" >>>> >>>> 4472d10961ba63c> >>>> >>>> Also, have a look at the AnalysisParalysis wiki page for some >>>> hints: >>>> >>>> >>>> On 08/19/2008 at 8:57 AM, Dino Korah wrote: >>>>> From the discussion here what I could understand was, if I am >>>>> using >>>>> StandardAnalyzer on TOKENIZED fields, for both Indexing and >>>>> Querying, >>>>> I shouldn't have any problems with cases. >>>> >>>> If by "shouldn't have problems with cases" you mean "can match >>>> case-insensitively", then this is true. >>>> >>>>> But if I have any UN_TOKENIZED fields there will be problems if >>>>> I do >>>>> not case-normalize them myself before adding them as a field to >>>>> the >>>>> document. >>>> >>>> Again, assuming that by "case-normalize" you mean "downcase", and >>>> that you >>>> want case-insensitive matching, and that you use the >>>> StandardAnalyzer (or >>>> some other downcasing analyzer) at query-time, then this is true. >>>> >>>>> In my case I have a mixed scenario. I am indexing emails and the >>>>> email >>>>> addresses are indexed UN_TOKENIZED. I do have a second set of >>>>> custom >>>>> tokenized field, which keep the tokens in individual fields with >>>>> same >>>>> name. >>>> [...] >>>>> Does it mean that where ever I use UN_TOKENIZED, they do not get >>>>> through the StandardAnalyzer before getting Indexed, but they do >>>>> when >>>>> they are searched on? >>>> >>>> This is true. >>>> >>>>> If that is the case, Do I need to normalise them before adding to >>>>> document? >>>> >>>> If you want case-insensitive matching, then yes, you do need to >>>> normalize >>>> them before adding them to the document. >>>> >>>>> I also would like to know if it is better to employ an >>>>> EmailAnalyzer >>>>> that makes a TokenStream out of the given email address, rather >>>>> than >>>>> using a simplistic function that gives me a list of string pieces >>>>> and >>>>> adding them one by one. With searches, would both the approaches >>>>> give >>>>> same result? >>>> >>>> Yes, both approaches give the same result. When you add string >>>> pieces >>>> one-by-one, you are adding multiple same-named fields. By contrast, >>>> the >>>> EmailAnalyzer approach would add a single field, and would allow >>>> you to >>>> control positions (via Token.setPositionIncrement(): >>>> >>>> ml#setPositionIncrement(int)>), e.g. to improve phrase handling. >>>> Also, if >>>> you make up an EmailAnalyzer, you can use it to search against your >>>> tokenized email field, along with other analyzer(s) on other >>>> field(s), using >>>> the PerFieldAnalyzerWrapper >>>> >>>> AnalyzerWrapper.html>. >>>> >>>> Steve >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene >>>> For additional commands, e-mail: java-user-help [at] lucene >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene >>>> For additional commands, e-mail: java-user-help [at] lucene >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene >>> For additional commands, e-mail: java-user-help [at] lucene >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene >> For additional commands, e-mail: java-user-help [at] lucene > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene > For additional commands, e-mail: java-user-help [at] lucene > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|