Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Created: (LUCENE-1758) improve arabic analyzer: light8 -> light10

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jul 23, 2009, 11:01 AM

Post #1 of 1 (222 views)
Permalink
[jira] Created: (LUCENE-1758) improve arabic analyzer: light8 -> light10

improve arabic analyzer: light8 -> light10
------------------------------------------

Key: LUCENE-1758
URL: https://issues.apache.org/jira/browse/LUCENE-1758
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
Attachments: LUCENE-1758.txt

Someone mentioned on the java user list that the arabic analysis was not as good as they would like.

This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
In the light10 paper, this improves precision from .390 to .413
They mention this is not statistically significant, but it makes linguistic sense and at least has been shown not to hurt.

In the future, I hope openrelevance will allow us to try some more approaches.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.