Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

lucene 2.9+ numeric indexing

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


john.wang at gmail

Nov 8, 2009, 12:36 AM

Post #1 of 2 (1162 views)
Permalink
lucene 2.9+ numeric indexing

Hi guys:

Running into a strange problem:

I am indexing into a field a numeric string:

int n = Math.abs(rand.nextInt(1000000));

Field myField = new Field(MY_FIELD,String.valueOf(n),Store.NO,Index.
NOT_ANALYZED_NO_NORMS);

myField.setOmitTermFreqAndPositions(true);

doc.add(myField);



I am trying to load this field into a FieldCache, e.g. :


int[] data = FieldCache.DEFAULT.getInts(reader, MY_FIELD);


and I get: Exception in thread "main" java.lang.NumberFormatException:
Invalid shift value in prefixCoded string (is encoded value really an INT?)

After further examination, I see the original Integer.parseInt failed
because the termText was:

java.lang.NumberFormatException: For input string: "77886$"


I am not clear why the term text became: 77886$ instead of a number.


I examined the index using Luke and at least according to Luke, the number
displayed was 77886:


i.e. searching for: MY_FIELD:77886$ does yield a doc, and using
reconstructing the doc functionality, I see the value is 77886.


Ideas?


Thanks


-John


uwe at thetaphi

Nov 8, 2009, 6:30 AM

Post #2 of 2 (1130 views)
Permalink
RE: lucene 2.9+ numeric indexing [In reply to]

That's indeed strange. The problem has nothing to do with
NumericField/NumericUtils and corresponding FieldCache parsing at all, it is
more the autodetection falling back to NumericField parser, if the first
term is not parseable as old-style numeric. Because of that you get this
error message, because also the second try (parsing as NumericField fails).

Is only this one term (the first one in this field, because the autodetect
is only done there) maybe corrupt and you haven't seen it with Luke?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: John Wang [mailto:john.wang [at] gmail]
> Sent: Sunday, November 08, 2009 12:36 AM
> To: java-user [at] lucene
> Subject: lucene 2.9+ numeric indexing
>
> Hi guys:
>
> Running into a strange problem:
>
> I am indexing into a field a numeric string:
>
> int n = Math.abs(rand.nextInt(1000000));
>
> Field myField = new Field(MY_FIELD,String.valueOf(n),Store.NO,Index.
> NOT_ANALYZED_NO_NORMS);
>
> myField.setOmitTermFreqAndPositions(true);
>
> doc.add(myField);
>
>
>
> I am trying to load this field into a FieldCache, e.g. :
>
>
> int[] data = FieldCache.DEFAULT.getInts(reader, MY_FIELD);
>
>
> and I get: Exception in thread "main" java.lang.NumberFormatException:
> Invalid shift value in prefixCoded string (is encoded value really an
> INT?)
>
> After further examination, I see the original Integer.parseInt failed
> because the termText was:
>
> java.lang.NumberFormatException: For input string: "77886$"
>
>
> I am not clear why the term text became: 77886$ instead of a number.
>
>
> I examined the index using Luke and at least according to Luke, the
> number
> displayed was 77886:
>
>
> i.e. searching for: MY_FIELD:77886$ does yield a doc, and using
> reconstructing the doc functionality, I see the value is 77886.
>
>
> Ideas?
>
>
> Thanks
>
>
> -John


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.