Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Error while closing IndexWriter

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


shivani at netedgecomputing

Oct 12, 2006, 10:54 PM

Post #1 of 3 (654 views)
Permalink
Error while closing IndexWriter

Hi All,

I am facing a peculiar problem.

I am trying to index a file and the indexing code executes without any error
but when I try to close the indexer, I get the following error and the error
comes very rarely but when it does, no code on document indexing works and I
finally have to delete all indexes and run a re-indexing utility.

Can anyone please suggest what might be the problem?



Thanks a ton

Shivani



Stack Trace:

java.lang.ArrayIndexOutOfBoundsException: 97 >= 17

at java.util.Vector.elementAt(Vector.java:432)

at
org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135)

at
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103)

at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237)

at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:169)

at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97)

at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:425)

at
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:373)

at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:193)

at rd.admin.Indexer.indexFile(Indexer.java:150)





Code:

public void indexFile(File indexDirFile, File resumeFile) throws
IOException, FileNotFoundException

{



IndexWriter indexwriter = null;



File afile[] = indexDirFile.listFiles();

boolean flag = false;



if (afile.length <= 0)

flag = true;



indexwriter = new IndexWriter(indexDirFile, new StandardAnalyzer(),
flag);



doIndexing(indexwriter, resumeFile);



closeIndexWriter(indexwriter);

}





//--------------------------------------------------------------------------
----------------------//

public void doIndexing(IndexWriter indexwriter, File resumeFile) throws
FileNotFoundException

{



System.out.println("Indexing of File " + resumeFile.getName() +
"Started...");

Document document = new Document();



if (resumeFile.getName().endsWith(".pdf"))

{

FileInputStream fileinputstream;

try

{

fileinputstream = new FileInputStream(resumeFile);

}

catch (FileNotFoundException e1)

{

e1.printStackTrace();

throw new MyRuntimeException(e1.getMessage(), e1);

}

Object obj = null;

InputStreamReader inputstreamreader = null;

OutputStreamWriter outputstreamwriter = null;

PDDocument pddocument = null;

try

{

PDFParser pdfparser = new PDFParser(fileinputstream);

pdfparser.parse();

pddocument = pdfparser.getPDDocument();

ByteArrayOutputStream bytearrayoutputstream = new
ByteArrayOutputStream();

outputstreamwriter = new
OutputStreamWriter(bytearrayoutputstream);

PDFTextStripper pdftextstripper = new PDFTextStripper();

pdftextstripper.writeText(pddocument.getDocument(),
outputstreamwriter);

byte abyte0[] = bytearrayoutputstream.toByteArray();

inputstreamreader = new InputStreamReader(new
ByteArrayInputStream(abyte0));

document.add(Field.Text(IndexerColumns.contents,
inputstreamreader));

abyte0 = bytearrayoutputstream.toByteArray();

}

catch (Exception e)

{

System.out.println("error in indexFile " + e.getMessage());

e.printStackTrace();

}

finally

{

if (inputstreamreader != null)

{

inputstreamreader = null;

}



if (outputstreamwriter != null)

{

try

{

outputstreamwriter.close();

}

catch (IOException e2)

{

e2.printStackTrace();

}

}

if (pddocument != null)

{

try

{

pddocument.close();

}

catch (IOException e2)

{

e2.printStackTrace();

}

}

inputstreamreader = null;

}

}

else

{

document.add(Field.Text(IndexerColumns.contents, new
FileReader(resumeFile)));

}

document.add(Field.Keyword(IndexerColumns.id,
String.valueOf(mapLuceneParams.get(IndexerColumns.id))));



for (int i = 0; i < this.columnInfos.length; i++)

{

ColumnInfo columnInfo = columnInfos[i];

String value =
String.valueOf(mapLuceneParams.get(columnInfo.columnName));



if (value != null)

{

value = value.trim();

if (value.length() != 0)

{

if (columnInfo.istokenized)

{

document.add(Field.Text(columnInfo.columnName,
value));

}

else

{

document.add(Field.Keyword(columnInfo.columnName,
value.toLowerCase()));



}

}

}

}

document.add(Field.Keyword(IndexerColumns.filePath, String

.valueOf(mapLuceneParams.get(IndexerColumns.filePath))));



try

{

indexwriter.addDocument(documentWithCustomFields);

}

catch (IOException e)

{

closeIndexWriter(indexwriter);

e.printStackTrace();

throw new MyRuntimeException(e.getMessage(), e);

}



}



//--------------------------------------------------------------------------
----------------------//

private void closeIndexWriter(IndexWriter indexwriter)

{





if (indexwriter != null)

{

System.out.println("going to close index writer");

try

{

indexwriter.close();

}

catch (IOException e1)

{

e1.printStackTrace();

}

}



}


DORONC at il

Oct 12, 2006, 11:46 PM

Post #2 of 3 (595 views)
Permalink
Re: Error while closing IndexWriter [In reply to]

I am far from perfect in this pdf text extracting, however I noticed
something in your code that you may want to check to clear up the reason
for this failure, see below..

"Shivani Sawhney" <shivani [at] netedgecomputing> wrote on 12/10/2006
22:54:07:
> Hi All,
>
> I am facing a peculiar problem.
>
> I am trying to index a file and the indexing code executes without any
error
> but when I try to close the indexer, I get the following error and the
error
> comes very rarely but when it does, no code on document indexing works
and I
> finally have to delete all indexes and run a re-indexing utility.
>
> Can anyone please suggest what might be the problem?
>
> Stack Trace:
>
> java.lang.ArrayIndexOutOfBoundsException: 97 >= 17
> at java.util.Vector.elementAt(Vector.java:432)
> at
> org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135)
> at
> org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103)
> at
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237)
> at
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:169)
> at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97)
> at
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:425)
> at
>
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:373)
> at
> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:193)
> at rd.admin.Indexer.indexFile(Indexer.java:150)
> ...
> ...
> try {
> indexwriter.addDocument(documentWithCustomFields);

Here documentWithCustomFields is added to the index, but the code provided
does not handle this doc variable at all... might be just a typo in the
code snippet (i.e. if you cleaned it for the mail), or else a real problem
in the code, attempting to add to the index the wrong, perhaps null,
document?

> } catch (IOException e) {
> closeIndexWriter(indexwriter);

Here, once catching an exception, you are first attempting to close the
index and only then print the exception e. So if this "close" line is
throwing the exception with the stack trace above, you can't really know
that addDocument "executes without any error". Better switch between these
2 lines.

> e.printStackTrace();
> throw new MyRuntimeException(e.getMessage(), e);
> }
>

There are 2 calls in your code to closeIndexWriter() and I can't tell which
line is Indexer.java:150, is it this one:
doIndexing(indexwriter, resumeFile);
closeIndexWriter(indexwriter);
Or this one:
} catch (IOException e) {
closeIndexWriter(indexwriter);

Hope this helps at all.

If not - add some info on the exception, also on the code path taken before
getting this error (many ifs in this code), and lucene version used.

Regards,
Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


shivani at netedgecomputing

Oct 15, 2006, 10:53 PM

Post #3 of 3 (589 views)
Permalink
RE: Error while closing IndexWriter [In reply to]

Hi,



Sorry Doron, if the code added in my last mail was confusing and thanks for
the reply. The code added in my last mail was not exactly the version that
was causing problem, this one is.

The lucene version is 1.2.



Waiting for a suggestion.







Code:

public void indexFile(File indexDirFile, File resumeFile) throws IOException

{



IndexWriter indexwriter = null;

try

{

File afile[] = indexDirFile.listFiles();

boolean flag = false;

if (afile.length <= 0)

flag = true;

indexwriter = new IndexWriter(indexDirFile, new
StandardAnalyzer(), flag);

doIndexing(indexwriter, resumeFile); // following method



if (indexwriter != null)

{

indexwriter.close(); // <--Indexer.java:150 (Error here)

}

}

catch (IOException e)

{

e.printStackTrace();

throw new Error(e);

}

}



//--------------------------------------------------------------------------
----------//

public void doIndexing(IndexWriter indexwriter, File resumeFile)

{





Document document = new Document();

if (resumeFile.getName().endsWith(".pdf"))

{

...

// Code for indexing PDF docs. Right now the inputs are not PDF
docs,

// so I have removed this piece as it could not have been
causing problems.

}

else

{

try

{

document.add(Field.Text(IndexerColumns.contents, new
FileReader(resumeFile)));

}

catch (FileNotFoundException e)

{

e.printStackTrace();

throw new MyRuntimeException(e.getMessage(), e);

}

}



for (int i = 0; i < this.columnInfos.length; i++)

{

ColumnInfo columnInfo = columnInfos[i];

String value =
String.valueOf(mapLuceneParams.get(columnInfo.columnName));



if (value != null)

{

value = value.trim();

if (value.length() != 0)

{

document.add(Field.Text(columnInfo.columnName, value));

}

}

}



try

{

indexwriter.addDocument(document);

}

catch (IOException e)

{

e.printStackTrace();

throw new MyRuntimeException(e.getMessage(), e);

}

}

}



Regards,

Shivani Sawhney
NetEdge Computing Global Services Private Limited
A-14, Sector-7, NOIDA U.P. 201-301
Tel # 91-120-2423281, 2423282
Fax # 91-120-2423279
www.netedgecomputing.com <http://www.netedgecomputing.com/>

****************************************************************************
***********************************

Disclaimer:

This message may contain confidential and/or privileged information. If you
are not the addressee or authorized to receive this for the addressee, you
must not use, copy, disclose or take any action based on this message or any
information herein. If you have received this message in error, please
advise the sender immediately by reply e-mail and delete this message. Thank
you for your cooperation.-----Original Message-----
From: Doron Cohen [mailto:DORONC [at] il]
Sent: Friday, October 13, 2006 12:17 PM
To: java-user [at] lucene
Subject: Re: Error while closing IndexWriter



I am far from perfect in this pdf text extracting, however I noticed

something in your code that you may want to check to clear up the reason

for this failure, see below..



"Shivani Sawhney" <shivani [at] netedgecomputing> wrote on 12/10/2006

22:54:07:

> Hi All,

>

> I am facing a peculiar problem.

>

> I am trying to index a file and the indexing code executes without any

error

> but when I try to close the indexer, I get the following error and the

error

> comes very rarely but when it does, no code on document indexing works

and I

> finally have to delete all indexes and run a re-indexing utility.

>

> Can anyone please suggest what might be the problem?

>

> Stack Trace:

>

> java.lang.ArrayIndexOutOfBoundsException: 97 >= 17

> at java.util.Vector.elementAt(Vector.java:432)

> at

> org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135)

> at

> org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103)

> at

> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237)

> at

> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:169)

> at

> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97)

> at

> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:425)

> at

>

org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:373)

> at

> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:193)

> at rd.admin.Indexer.indexFile(Indexer.java:150)

> ...

> ...

> try {

> indexwriter.addDocument(documentWithCustomFields);



Here documentWithCustomFields is added to the index, but the code provided

does not handle this doc variable at all... might be just a typo in the

code snippet (i.e. if you cleaned it for the mail), or else a real problem

in the code, attempting to add to the index the wrong, perhaps null,

document?



> } catch (IOException e) {

> closeIndexWriter(indexwriter);



Here, once catching an exception, you are first attempting to close the

index and only then print the exception e. So if this "close" line is

throwing the exception with the stack trace above, you can't really know

that addDocument "executes without any error". Better switch between these

2 lines.



> e.printStackTrace();

> throw new MyRuntimeException(e.getMessage(), e);

> }

>



There are 2 calls in your code to closeIndexWriter() and I can't tell which

line is Indexer.java:150, is it this one:

doIndexing(indexwriter, resumeFile);

closeIndexWriter(indexwriter);

Or this one:

} catch (IOException e) {

closeIndexWriter(indexwriter);



Hope this helps at all.



If not - add some info on the exception, also on the code path taken before

getting this error (many ifs in this code), and lucene version used.



Regards,

Doron





---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscribe [at] lucene

For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.