
gsingers at apache
Dec 18, 2007, 10:09 AM
Post #8 of 15
(1290 views)
Permalink
|
I think the issue is my fault, but I am not exactly sure how it happened. I deleted my index and have not been able to reproduce the problem since. However, here's what I can tell from some debugging I did before that: The field that is causing the problem in the stack trace is neither binary nor compressed, nor is it even stored. So, the fact that it is being merged (see the stack trace) is just wrong, since it isn't stored. I start out with 6 fields, 2 of which are stored. When I come into FieldsReader, it gets the correct number of Fields, however they must be out of order from when I originally indexed or something like that. AFAICT, FieldsWriter is also correctly writing out the Fields. In looking at SegmentMerger, we are in the else clause of if (matchingSegmentReader != null) { // We can optimize this case (doing a bulk // byte copy) since the field numbers are // identical int start = j; int numDocs = 0; do { j++; numDocs++; } while(j < maxDoc && ! matchingSegmentReader.isDeleted(j) && numDocs < MAX_RAW_MERGE_DOCS); IndexInput stream = matchingFieldsReader.rawDocs(rawDocLengths, start, numDocs); fieldsWriter.addRawDocuments(stream, rawDocLengths, numDocs); docCount += numDocs; } else { fieldsWriter.addDocument(reader.document(j, fieldSelectorMerge)); /////// HERE j++; docCount++; } Based on the comment in the if condition, I am assuming the field numbers are not identical in this clause, which would explain the fact that the Fields info is being misinterpreted. I still wonder if there isn't a problem in that somehow the index got corrupted such that the Field numbering was off between various runs of the IndexWriter? Does that even seem possible in the code? I am just thinking out loud here, not sure if it even makes sense. I think we can just put this on hold for now and see if it comes up again, since I can't reproduce it (and I forgot to save the mislabeled index) -Grant On Dec 18, 2007, at 7:27 AM, Grant Ingersoll wrote: > No, there were not any exceptions during indexing. I am still > trying to work up some test cases using open documents (i.e. > wikipedia) > > -Grant > > On Dec 18, 2007, at 6:09 AM, Michael McCandless wrote: > >> >> Grant, >> >> Do you know whether you hit any exceptions while adding docs, >> before you hit those merge exceptions? >> >> I have found one case where an exception that runs back through >> DocumentsWriter (during addDocument()) can produce a corrupt fdt >> (stored field) file. I have a test case that shows this, and a fix. >> >> Mike >> >> Grant Ingersoll wrote: >> >>> I will try to work up a test case that I can share and will double >>> check that I have all the right pieces in place. >>> >>> -Grant >>> >>> On Dec 17, 2007, at 2:50 PM, Michael McCandless wrote: >>> >>>> >>>> Yonik Seeley wrote: >>>> >>>>> On Dec 17, 2007 2:15 PM, Michael McCandless <lucene [at] mikemccandless >>>>> > wrote: >>>>>> >>>>>> Not good! >>>>>> >>>>>> It's almost certainly a bug with Lucene, I think, because Solr is >>>>>> just a consumer of Lucene's API, which shouldn't ever cause >>>>>> something >>>>>> like this. >>>>> >>>>> Yeah... a solr level commit should just translate into >>>>> writer.close >>>>> reader.open // assuming there are "overwrites" >>>>> delete duplicates via TermDocs >>>>> reader.close >>>>> writer.open >>>>> writer.optimize >>>>> writer.close >>>> >>>> Seems fine! >>>> >>>>>> Apparently, while merging stored fields, SegmentMerger tried to >>>>>> read >>>>>> too far. >>>>> >>>>> The code to merge stored fields was recently optimized to do >>>>> bulk copy >>>>> of contiguous fields, right? >>>> >>>> Yes, I'm wondering the same thing... though Grant's exception is >>>> on the un-optimized case, because the field name->number mapping >>>> differed for that segment. I'll scrutinize that change some >>>> more... >>>> >>>> Mike >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene >>>> For additional commands, e-mail: java-dev-help [at] lucene >>>> >>> >>> -------------------------- >>> Grant Ingersoll >>> http://lucene.grantingersoll.com >>> >>> Lucene Helpful Hints: >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>> http://wiki.apache.org/lucene-java/LuceneFAQ >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene >>> For additional commands, e-mail: java-dev-help [at] lucene >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene >> For additional commands, e-mail: java-dev-help [at] lucene >> > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene > For additional commands, e-mail: java-dev-help [at] lucene > -------------------------- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene For additional commands, e-mail: java-dev-help [at] lucene
|