
jira at apache
Nov 7, 2009, 6:53 AM
Post #1 of 1
(36 views)
Permalink
|
|
[jira] Resolved: (LUCENE-2016) replace invalid U+FFFF character during indexing
|
|
[ https://issues.apache.org/jira/browse/LUCENE-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2016. ---------------------------------------- Resolution: Fixed > replace invalid U+FFFF character during indexing > ------------------------------------------------ > > Key: LUCENE-2016 > URL: https://issues.apache.org/jira/browse/LUCENE-2016 > Project: Lucene - Java > Issue Type: Bug > Affects Versions: 2.4, 2.4.1, 2.9 > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 2.9.1, 3.0 > > Attachments: LUCENE-2016.patch > > > If the invalid U+FFFF character is embedded in a token, it actually causes indexing to silently corrupt the index by writing duplicate terms into the terms dict. CheckIndex will catch the error, and merging will hit exceptions (I think). > We already replace invalid surrogate pairs with the replacement character U+FFFD, so I'll just do the same with U+FFFF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org For additional commands, e-mail: java-dev-help[at]lucene.apache.org
|