
jira at apache
Jul 4, 2012, 11:39 PM
Post #25 of 61
(166 views)
Permalink
|
|
[jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
[In reply to]
|
|
[ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406874#comment-13406874 ] Shai Erera commented on LUCENE-4190: ------------------------------------ What if we had an object called IndexFileNames with a method accept(String name), that returns true if the file is recognized, false otherwise - that could give applications a way to create a recognized-set of index files: * Lucene would provide a DefaultIndexFileNames which recognizes all non-codec files * Either the app would provide an extension to the default (or a wrapper) which recognizes its codec files as well ** Or, we make the Codec responsible for recognizing files too, and then the code would just query the Codec for non-default index files. Either way, it seems like we can very easily recognize what are index files and what aren't. When files need to be deleted, it seems simple as well: * Lucene lists all files in the directory * Any file that is referenced by the index (I assume we still know which files are needed right?) is kept * Any other file is queried against IndexFileNames.accept and if it is accepted, it's deleted, otherwise it's left alone. Since this looks too simple to me, I'm assuming that I'm missing something. If so, can someone please clarify the problem to me? > IndexWriter deletes non-Lucene files > ------------------------------------ > > Key: LUCENE-4190 > URL: https://issues.apache.org/jira/browse/LUCENE-4190 > Project: Lucene - Java > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Robert Muir > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4190.patch, LUCENE-4190.patch > > > Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html > IndexWriter will now (as of 4.0) delete all foreign files from the index directory. We made this change because Codecs are free to write to any files now, so the space of filenames is hard to "bound". > But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete important stuff. > I think we can at least use some simple criteria (must start with _, maybe must fit certain pattern eg _<base36>(_X).Y), so we are much less likely to delete a non-Lucene file.... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe [at] lucene For additional commands, e-mail: dev-help [at] lucene
|