Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jul 24, 2007, 10:12 AM

Post #1 of 6 (1143 views)
Permalink
[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515032 ]

Mark Miller commented on LUCENE-871:
------------------------------------

It would be a nice courtesy to document that this thing is over 2 times faster if you use a StringBuilder rather than a StringBuffer. Up to 2.8 times faster in my measurements.

- Mark

> ISOLatin1AccentFilter a bit slow
> --------------------------------
>
> Key: LUCENE-871
> URL: https://issues.apache.org/jira/browse/LUCENE-871
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2
> Reporter: Ian Boston
> Attachments: ISOLatin1AccentFilter.java.patch
>
>
> The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in a highligher for output responses.
> Patch to follow

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jul 24, 2007, 11:12 AM

Post #2 of 6 (1046 views)
Permalink
[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515053 ]

Grant Ingersoll commented on LUCENE-871:
----------------------------------------

Right, but StringBuilder is 1.5. Sigh...

> ISOLatin1AccentFilter a bit slow
> --------------------------------
>
> Key: LUCENE-871
> URL: https://issues.apache.org/jira/browse/LUCENE-871
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2
> Reporter: Ian Boston
> Attachments: ISOLatin1AccentFilter.java.patch
>
>
> The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in a highligher for output responses.
> Patch to follow

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jul 24, 2007, 11:39 AM

Post #3 of 6 (1038 views)
Permalink
[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515060 ]

Mark Miller commented on LUCENE-871:
------------------------------------

Yes, I feel that sigh. So perhaps the point is moot. I was just thinking that a lot of code runs on 1.5 or > and while its obviously crazy to point out every class that could be changed to a StringBuilder, i thought it might be nice to warn somehow that this thing is like twice as fast if you can use StringBuilder. Even this sped up version takes more than its fair share in my filter chain.

- Mark

> ISOLatin1AccentFilter a bit slow
> --------------------------------
>
> Key: LUCENE-871
> URL: https://issues.apache.org/jira/browse/LUCENE-871
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2
> Reporter: Ian Boston
> Attachments: ISOLatin1AccentFilter.java.patch
>
>
> The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in a highligher for output responses.
> Patch to follow

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jul 24, 2007, 12:52 PM

Post #4 of 6 (1039 views)
Permalink
[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515077 ]

Michael McCandless commented on LUCENE-871:
-------------------------------------------

I think we can likely get a sizable speedup here by using the char[] termBuffer in Token plus the new "re-use single Token" API being discussed here:

http://www.gossamer-threads.com/lists/lucene/java-dev/51283

> ISOLatin1AccentFilter a bit slow
> --------------------------------
>
> Key: LUCENE-871
> URL: https://issues.apache.org/jira/browse/LUCENE-871
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2
> Reporter: Ian Boston
> Attachments: ISOLatin1AccentFilter.java.patch
>
>
> The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in a highligher for output responses.
> Patch to follow

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jul 24, 2007, 3:06 PM

Post #5 of 6 (1046 views)
Permalink
[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515103 ]

Mark Miller commented on LUCENE-871:
------------------------------------

Woah! I thought I was seeing this patch on the trunk already. Should have looked more closely at the patch file. I basically just duplicated the patch. Belay all previous.

> ISOLatin1AccentFilter a bit slow
> --------------------------------
>
> Key: LUCENE-871
> URL: https://issues.apache.org/jira/browse/LUCENE-871
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2
> Reporter: Ian Boston
> Attachments: fasterisoremove1.patch, ISOLatin1AccentFilter.java.patch
>
>
> The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in a highligher for output responses.
> Patch to follow

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jul 30, 2007, 7:40 AM

Post #6 of 6 (1033 views)
Permalink
[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516403 ]

Michael McCandless commented on LUCENE-871:
-------------------------------------------

OK, for LUCENE-969 I made yet a 3rd option for optimizing
ISOLatin1AccentFilter.

In that patch I reuse the Token instance, using the char[] API for the
Token's text instead of String, and I also re-use a single TokenStream
instance (I did this for all core tokenizers).

I just tested total time to tokenize all wikipedia content with
current trunk (1116 sec) vs with LUCENE-969 (500 sec), with a
WhitespaceTokenizer -> ISOLatin1AccentFilter chain.

I separately timed just creating the documents at 112 sec, to subtract
it off from the above times (so I can measure only cost of
tokenization).

This gives net speedup of this filter is 2.97X faster (1004 sec -> 388
sec).


> ISOLatin1AccentFilter a bit slow
> --------------------------------
>
> Key: LUCENE-871
> URL: https://issues.apache.org/jira/browse/LUCENE-871
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2
> Reporter: Ian Boston
> Attachments: fasterisoremove1.patch, fasterisoremove2.patch, ISOLatin1AccentFilter.java.patch
>
>
> The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in a highligher for output responses.
> Patch to follow

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.