Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

 

 

First page Previous page 1 2 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jul 1, 2009, 11:01 AM

Post #1 of 27 (969 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726155#action_12726155 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, what do you think about name for the new folder?

I am concerned users will be confused between analyzers/cjk, analyzers/cn and analyzer-cn, all of which are different.
Should we name the new package analyzer-cnhmm or something to help clarify it?
I intend to also add a little wordage to the javadocs to help disambiguate this, whatever we decide to name it.

thanks

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 1, 2009, 11:59 AM

Post #2 of 27 (940 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726184#action_12726184 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. I am concerned users will be confused between analyzers/cjk, analyzers/cn and analyzer-cn, all of which are different. Should we name the new package analyzer-cnhmm or something to help clarify it?

The name analyzer-cn was just an example though but I don't like analyzer-cnhmm. Whats about analyzer-smartcn? Definitly +1 for a less ambigious name.

bq. I intend to also add a little wordage to the javadocs to help disambiguate this, whatever we decide to name it.
+1

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 1, 2009, 12:09 PM

Post #3 of 27 (941 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726187#action_12726187 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, analyzer-smartcn works, and its consistent with the name of the analyzer.

If i "svn move" the files in my local, and submit a patch, will it ensure that history is preserved? I am not an svn expert.


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 1, 2009, 9:14 PM

Post #4 of 27 (932 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726314#action_12726314 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. If i "svn move" the files in my local, and submit a patch, will it ensure that history is preserved? I am not an svn expert.

No. an svn copy (or svn move) will not be reflected in a patch. I guess I should do the moveing and commit it and the refactoring should be done afterwards. Would that make sense to you?!

simon

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 1, 2009, 10:20 PM

Post #5 of 27 (931 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726319#action_12726319 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

simon, i think i've almost got it ready. i've got a set of svn move's etc you can run first, then a patch to apply over it.

this way also the patch only reflects the real changes, and you can review what these are before changing anything in SVN...

i'll upload it soon...


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 1, 2009, 11:44 PM

Post #6 of 27 (930 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726354#action_12726354 ]

Uwe Schindler commented on LUCENE-1728:
---------------------------------------

After creating the new contrib, do not forget to add the javadocs generation of the "all/" subdir in the main build.xml! Also new contribs should be added to the developers part in the site docs and so on. I can do that if you like after committing the whole thing (I have done it several times the last months for spatial, trie,...).

Another idea: We can do it without creating a new contrib, instead do it like the contrib-bdb, which consists of 2 sub-contribs. Here the contrib folder of bdb is divided into two sub-folders, the build.xml of the main folder is just a "delegator" (or how you would call it) and delegates the ant targets to the build.xmls in the sub-folders. Using this approach we would still have only one contrib-analyzers main folder with two subdirs, which are two separate contribs modules (like the two bdb ones), but are in one folder.

This approach is only good for source code, the user still gets the jar files in the main build folder directly under contrib. So I am not sure, if this is really better than two really separate contribs.

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 2, 2009, 12:16 AM

Post #7 of 27 (929 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726362#action_12726362 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. After creating the new contrib, do not forget to add the javadocs generation of the "all/" subdir in the main build.xml! Also new contribs should be added to the developers part in the site docs and so on. I can do that if you like after committing the whole thing (I have done it several times the last months for spatial, trie,...).
Uwe, I will not be able to commit those changes I guess. This reminds me that contrib commiters should have access to those files too. Once I get this change in I will notify you with a patch so you can get it in.

bq. This approach is only good for source code, the user still gets the jar files in the main build folder directly under contrib. So I am not sure, if this is really better than two really separate contribs.
I really like this approach as it keeps the code logically consistent. I think we should go for this approach, that makes much more sense to me.


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 2, 2009, 5:11 AM

Post #8 of 27 (921 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726453#action_12726453 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

bq. I really like this approach as it keeps the code logically consistent. I think we should go for this approach, that makes much more sense to me.

Simon, are you referring to Uwe's approach of splitting the analyzers contrib into two, or your (previous) approach of analyzer-smartcn contrib?


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 2, 2009, 5:21 AM

Post #9 of 27 (920 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726456#action_12726456 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. Simon, are you referring to Uwe's approach of splitting the analyzers contrib into two, or your (previous) approach of analyzer-smartcn contrib?

I refer to Uwe's approach.

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 2, 2009, 5:37 AM

Post #10 of 27 (919 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726458#action_12726458 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

great, I like this too. any preference on names? contrib/analyzers/analyzers and contrib/analyzers/smartcn?

once we figure that out i can create updated set of svn moves + patch for this approach

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 3, 2009, 12:55 AM

Post #11 of 27 (894 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726818#action_12726818 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. contrib/analyzers/analyzers and contrib/analyzers/smartcn?
+1

go ahead!

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 7, 2009, 11:08 AM

Post #12 of 27 (814 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728264#action_12728264 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

Robert, thanks for all that! I just had a brief look at it but looks good so far. I need to look over it again in the next days.
Plan to commit it this week.

simon

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 15, 2009, 1:57 AM

Post #13 of 27 (768 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731354#action_12731354 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

Robert, I don't think we should rename the directory analyzers to analysis I would rather go for analyzers/common and analyzers/smartcn or a similar scheme.


simon

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 15, 2009, 2:05 AM

Post #14 of 27 (769 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731357#action_12731357 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, sounds good. I will update the patch / svn commands with that scheme (hopefully in the next day or 2)


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 15, 2009, 2:23 AM

Post #15 of 27 (767 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731360#action_12731360 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

cool thanks!

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 21, 2009, 1:54 AM

Post #16 of 27 (729 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733538#action_12733538 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

Robert, I have looked at this patch and more important at the source itself and I get more and more the impression that we have to do more work on this analyzer and the related classes as just moving them into one package and make everything package private. From my understanding the Hidden Markov Model Segmenter is a feature which could be replaced by some other algorithm. Once you have such a feature relationship I would prefer packages by feature which enables you to remove a single feature just by removing a whole package.
In other words I would love to see a general refactoring of the code which exploits a tiny but common API in the base package and is subsequently used by the HHMM "feature". There is quite a bit of work to do that I do not consider 2.9 work.
So here is the question, do we keep the structure as it is and just move it to a new subdir to build a sep. jar or do we move them into one single package (as you did in the patch) and build up a clean HHMM package later in 3.*.

Beside the packaging I found heaps of things I do not like very much in the code (not your patch :) an my fingertips getting nervous when I see stuff like the AbstractDictionary hierarchy or those Singletions. I would really like to have this separation of CN and common Analyzers in for 2.9 -- we just need to decide which way we go. I guess moving it over without changing code would be easiest.

simon


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 21, 2009, 2:20 AM

Post #17 of 27 (729 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733544#action_12733544 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, I agree with you, there is a ton of work to be done.

I also did not particularly like my method of moving everything into one package to hide the internals... and I 100% agree that a "correct" refactoring is quite a bit of work.

I don't want to sound like a complainer since I don't have a patch to fix these things, but I want to list some things that I would like to fix/refactor also.
* removal of GB2312 dictionary dependency: this limits functionality to simplified chinese.
* use of unicode categories (java Character class, etc) versus Utility.getCharType()
* support for codepoints outside of BMP, this is necessary to support traditional chinese.
* a little more flexibility with tokenization, honestly I'm really not sold on indexing "words" for chinese in the first place. But words + bigrams (overlapping tokens), that would be nice.

In the future it would be nice to add support for traditional chinese, and there is frequency data out there (libtabe: BSD license, etc), but we need to refactor first.

As far as what to do for 2.9... I really don't know either, just let me know if you need a new patch :)


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 21, 2009, 2:36 AM

Post #18 of 27 (729 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733547#action_12733547 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. I don't want to sound like a complainer since I don't have a patch to fix these things, but I want to list some things that I would like to fix/refactor also.

:) pushing things forward is not complaining to me. I agree with you points I did not look closely into implementation details but rather on structural things. Apparently we both agree that we have work to do on this and I guess we can work out good solutions in the future together. Let's just move the classes into it's own subdir as you already did and keep the structure as it is (with the smallest changes - some classes have to be moved). If you could provide a patch I will commit the refactoring and we open a new issue for 3.*.
This solution seems to be ideal as 2.9 release is quite close...


simon

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 21, 2009, 2:48 AM

Post #19 of 27 (728 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733552#action_12733552 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon OK, I will work on a patch that tries to maintain the package structure.

Other than package structure, is there anything in the patch you are uncomfortable with?
I can either try to unfix any small fixes you don't like or create more testcases, whatever makes sense.


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 21, 2009, 2:58 AM

Post #20 of 27 (729 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733556#action_12733556 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. Other than package structure, is there anything in the patch you are uncomfortable with?
no that I could tell. You can keep whatever applies to the package structure - means we might have to keep some classes public etc.

thanks for your patience! Good job!

simon

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 21, 2009, 3:02 AM

Post #21 of 27 (728 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733560#action_12733560 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, yes some things may have to be public that should not be due to the package structure.

I'll see if I can improve the javadocs for anything that falls in this situation as a short-term workaround.


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 21, 2009, 9:59 PM

Post #22 of 27 (713 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733986#action_12733986 ]

Michael Busch commented on LUCENE-1728:
---------------------------------------

So we are going to move everything currently under contrib/analyzers to contrib/analyzers/common?

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 21, 2009, 10:15 PM

Post #23 of 27 (713 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733989#action_12733989 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

yeah, except smart chinese analyzer. I am testing the latest patch (that keeps the previous smart chinese analyzer package structure), regenerating docs, etc etc.

I will upload it in a few when I think it is good to go.

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 23, 2009, 10:21 AM

Post #24 of 27 (671 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734647#action_12734647 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, thanks!

oh, the equals and hashcode were commented out in the original src (I removed the commented lines).

I was afraid to uncomment them (I didnt know why they were commented out),
but I shouldn't have deleted the commented lines... thanks for resolving this.


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 23, 2009, 11:30 AM

Post #25 of 27 (670 views)
Permalink
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734685#action_12734685 ]

Michael McCandless commented on LUCENE-1728:
--------------------------------------------

I'll commit the top-level changes for the web-site. Thanks Robert!

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org

First page Previous page 1 2 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.