Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Issue Comment Edited: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Nov 16, 2009, 2:03 PM

Post #1 of 2 (228 views)
Permalink
[jira] Issue Comment Edited: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778582#action_12778582 ]

Uwe Schindler edited comment on LUCENE-2074 at 11/16/09 10:01 PM:
------------------------------------------------------------------

bq. Uwe, also, just checking, i don't know javacc at all, does it use unicode properties? We have a lot of queryparsers out there...

I do not know it, too :)

The only query parser using jflex is the new one. And the new one should normally use no unicode properties. Can you check the JFlex file? All other query parsers use JavaCC

*edit*

No Jflex is used by any query parser. But WikipediaTokenizer uses JFlex...

was (Author: thetaphi):
bq. Uwe, also, just checking, i don't know javacc at all, does it use unicode properties? We have a lot of queryparsers out there...

I do not know it, too :)

The only query parser using jflex is the new one. And the new one should normally use no unicode properties. Can you check the JFlex file? All other query parsers use JavaCC.

> Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
> -------------------------------------------------------------------------------
>
> Key: LUCENE-2074
> URL: https://issues.apache.org/jira/browse/LUCENE-2074
> Project: Lucene - Java
> Issue Type: Bug
> Affects Versions: 3.0
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: jflexwarning.patch, LUCENE-2074.patch, LUCENE-2074.patch
>
>
> The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file.
> After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 is used as matchVersion.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 16, 2009, 2:19 PM

Post #2 of 2 (214 views)
Permalink
[jira] Issue Comment Edited: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778598#action_12778598 ]

Uwe Schindler edited comment on LUCENE-2074 at 11/16/09 10:18 PM:
------------------------------------------------------------------

It uses hardcode char ranges, the parser is therefore not JDK-dependent. Let's keep it as it is for now. Mediawiki itsself is not unicode conform, because it's written in PHP and PHP only gets unicode in 6.0 (*lol*) (that says a PHP core committer named U.S. from Bremen in Germany...)

was (Author: thetaphi):
It uses hardcode char ranges, the parser is therefore not JDK-dependent. Let's keep it as it is for now. Mediawiki itsself is not unicode conform, because it's written in PHP and PHP only gets unicode in 6.0 *lol* (that says a PHP core committer names U.S. from Bremen in Germany...)

> Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
> -------------------------------------------------------------------------------
>
> Key: LUCENE-2074
> URL: https://issues.apache.org/jira/browse/LUCENE-2074
> Project: Lucene - Java
> Issue Type: Bug
> Affects Versions: 3.0
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: jflexwarning.patch, LUCENE-2074.patch, LUCENE-2074.patch
>
>
> The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file.
> After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 is used as matchVersion.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.