Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Move from RAMDirectory to FSDirectory causing problem sometimes

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


paul_t100 at fastmail

Jul 8, 2008, 1:01 AM

Post #1 of 8 (2235 views)
Permalink
Move from RAMDirectory to FSDirectory causing problem sometimes

Hi, I have been using a RAMDirectory for indexing without any problem,
but I then moved to a file based directory to reduce memory usage. this
has been working fine on Windows and OSX and my version of linux
(redhat) but is failing on a version of linux (archlinux) with 'Too many
files opened' , but they are only indexing 32 documents , I can index
thousands without a problem. It mentions this error in the Lucene FAQ
but I am not dealing directly with the filesystem myself, this is my
code for creating an index is it okay or is there some kind of close
that I am missing

thanks for any help Paul

public synchronized void reindex()
{
MainWindow.logger.info("Reindex start:" + new Date());
TableModel tableModel = table.getModel();
try
{
//Recreate the RAMDirectory uses too much memory
//directory = new RAMDirectory();
directory =
FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" +
TAG_BROWSER_INDEX);
IndexWriter writer = new IndexWriter(directory, analyzer, true);

//Iterate through all rows
for (int row = 0; row < tableModel.getRowCount(); row++)
{
//for each row make a new document
Document document = createDocument(row);
writer.addDocument(document);

}
writer.optimize();
writer.close();
}
catch (Exception e)
{
throw new RuntimeException("Problem indexing Data:" +
e.getMessage());
}
}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jul 8, 2008, 2:56 AM

Post #2 of 8 (2128 views)
Permalink
Re: Move from RAMDirectory to FSDirectory causing problem sometimes [In reply to]

Technically you should call directory.close() as well, but missing
that will not lead to too many open files.

How often is that RuntimeException being thrown? EG if a single
document is frequently hitting an exception during analysis, your code
doesn't close the IndexWriter in that situation. It's better to use a
try/finally and close the IndexWriter in the finally clause, to cover
that case.

Are you sure nothing else is using up file descriptors? EG the
createDocument call does not open any files?

Mike

Paul Taylor wrote:

> Hi, I have been using a RAMDirectory for indexing without any
> problem, but I then moved to a file based directory to reduce memory
> usage. this has been working fine on Windows and OSX and my version
> of linux (redhat) but is failing on a version of linux (archlinux)
> with 'Too many files opened' , but they are only indexing 32
> documents , I can index thousands without a problem. It mentions
> this error in the Lucene FAQ but I am not dealing directly with the
> filesystem myself, this is my code for creating an index is it okay
> or is there some kind of close that I am missing
>
> thanks for any help Paul
>
> public synchronized void reindex()
> {
> MainWindow.logger.info("Reindex start:" + new Date());
> TableModel tableModel = table.getModel();
> try
> {
> //Recreate the RAMDirectory uses too much memory
> //directory = new RAMDirectory();
> directory =
> FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" +
> TAG_BROWSER_INDEX);
> IndexWriter writer = new IndexWriter(directory, analyzer,
> true);
>
> //Iterate through all rows
> for (int row = 0; row < tableModel.getRowCount(); row++)
> {
> //for each row make a new document
> Document document = createDocument(row);
> writer.addDocument(document);
>
> }
> writer.optimize();
> writer.close();
> }
> catch (Exception e)
> {
> throw new RuntimeException("Problem indexing Data:" +
> e.getMessage());
> }
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


paul_t100 at fastmail

Jul 8, 2008, 3:03 AM

Post #3 of 8 (2115 views)
Permalink
Re: Move from RAMDirectory to FSDirectory causing problem sometimes [In reply to]

Michael McCandless wrote:
>
> Technically you should call directory.close() as well, but missing
> that will not lead to too many open files.
>
> How often is that RuntimeException being thrown? EG if a single
> document is frequently hitting an exception during analysis, your code
> doesn't close the IndexWriter in that situation. It's better to use a
> try/finally and close the IndexWriter in the finally clause, to cover
> that case.
>
> Are you sure nothing else is using up file descriptors? EG the
> createDocument call does not open any files?
>
> Mike
>
The runtimeException is occurring all the time, Im waiting for some more
information from the user. Since the post I've since added
directory.close() too, I thought this would cause a problem when I call
IndexSearcher with it as a parameter but it seems to still work - the
documentation is not very clear on this point. I see your poibnt about
the try/finally I'll make that chnage.

There are many other parts of the code that use filedescriptors, but the
problem has never occurred before moving to a FSDirectory

thanks paul

heres an example of my search code, is this ok ?

public boolean recNoColumnMatchesSearch(Integer columnId, Integer recNo,
String search)
{
try
{
IndexSearcher is = new IndexSearcher(directory);

//Build a query based on the fields, searchString and
standard analyzer
QueryParser parser = new
QueryParser(String.valueOf(columnId) + INDEXED, analyzer);
Query query = parser.parse(search);
MainWindow.logger.finer("Parsed Search Query Is" +
query.toString() + "of type:" + query.getClass());

//Create a filter,to restrict search to one row
Filter filter = new QueryFilter(new TermQuery(new
Term(ROW_NUMBER, String.valueOf(recNo))));

//run the search
Hits hits = is.search(query, filter);
Iterator i = hits.iterator();
if (i.hasNext())
{
return true;
}
}
catch (ParseException pe)
{
//Problem with syntax rather than throwing exception and
causing everything to stop we just
//log and return false
MainWindow.logger.warning("Search Query invalid:" +
pe.getMessage());
return false;
}
catch (IOException e)
{
MainWindow.logger.warning("DataIndexer.Unable to do perform
reno match search:" + search + ":" + e);
}
return false;

> Paul Taylor wrote:
>
>> Hi, I have been using a RAMDirectory for indexing without any
>> problem, but I then moved to a file based directory to reduce memory
>> usage. this has been working fine on Windows and OSX and my version
>> of linux (redhat) but is failing on a version of linux (archlinux)
>> with 'Too many files opened' , but they are only indexing 32
>> documents , I can index thousands without a problem. It mentions this
>> error in the Lucene FAQ but I am not dealing directly with the
>> filesystem myself, this is my code for creating an index is it okay
>> or is there some kind of close that I am missing
>>
>> thanks for any help Paul
>>
>> public synchronized void reindex()
>> {
>> MainWindow.logger.info("Reindex start:" + new Date());
>> TableModel tableModel = table.getModel();
>> try
>> {
>> //Recreate the RAMDirectory uses too much memory
>> //directory = new RAMDirectory();
>> directory =
>> FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" +
>> TAG_BROWSER_INDEX);
>> IndexWriter writer = new IndexWriter(directory, analyzer,
>> true);
>>
>> //Iterate through all rows
>> for (int row = 0; row < tableModel.getRowCount(); row++)
>> {
>> //for each row make a new document
>> Document document = createDocument(row);
>> writer.addDocument(document);
>>
>> }
>> writer.optimize();
>> writer.close();
>> }
>> catch (Exception e)
>> {
>> throw new RuntimeException("Problem indexing Data:" +
>> e.getMessage());
>> }
>> }
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jul 8, 2008, 3:14 AM

Post #4 of 8 (2166 views)
Permalink
Re: Move from RAMDirectory to FSDirectory causing problem sometimes [In reply to]

Hmmm, you should not close the directory if you are then going to use
it to instantiate a searcher.

Your code below never closes the searcher? I think that is most
likely the source of your file descriptor leaks.

Mike

Paul Taylor wrote:

> Michael McCandless wrote:
>>
>> Technically you should call directory.close() as well, but missing
>> that will not lead to too many open files.
>>
>> How often is that RuntimeException being thrown? EG if a single
>> document is frequently hitting an exception during analysis, your
>> code doesn't close the IndexWriter in that situation. It's better
>> to use a try/finally and close the IndexWriter in the finally
>> clause, to cover that case.
>>
>> Are you sure nothing else is using up file descriptors? EG the
>> createDocument call does not open any files?
>>
>> Mike
>>
> The runtimeException is occurring all the time, Im waiting for some
> more information from the user. Since the post I've since added
> directory.close() too, I thought this would cause a problem when I
> call IndexSearcher with it as a parameter but it seems to still work
> - the documentation is not very clear on this point. I see your
> poibnt about the try/finally I'll make that chnage.
>
> There are many other parts of the code that use filedescriptors, but
> the problem has never occurred before moving to a FSDirectory
>
> thanks paul
>
> heres an example of my search code, is this ok ?
>
> public boolean recNoColumnMatchesSearch(Integer columnId, Integer
> recNo, String search)
> { try
> {
> IndexSearcher is = new IndexSearcher(directory);
>
> //Build a query based on the fields, searchString and
> standard analyzer
> QueryParser parser = new
> QueryParser(String.valueOf(columnId) + INDEXED, analyzer);
> Query query = parser.parse(search);
> MainWindow.logger.finer("Parsed Search Query Is" +
> query.toString() + "of type:" + query.getClass());
>
> //Create a filter,to restrict search to one row
> Filter filter = new QueryFilter(new TermQuery(new
> Term(ROW_NUMBER, String.valueOf(recNo))));
>
> //run the search
> Hits hits = is.search(query, filter);
> Iterator i = hits.iterator();
> if (i.hasNext())
> {
> return true;
> }
> }
> catch (ParseException pe)
> {
> //Problem with syntax rather than throwing exception and
> causing everything to stop we just
> //log and return false
> MainWindow.logger.warning("Search Query invalid:" +
> pe.getMessage());
> return false;
> }
> catch (IOException e)
> {
> MainWindow.logger.warning("DataIndexer.Unable to do
> perform reno match search:" + search + ":" + e);
> }
> return false;
>
>> Paul Taylor wrote:
>>
>>> Hi, I have been using a RAMDirectory for indexing without any
>>> problem, but I then moved to a file based directory to reduce
>>> memory usage. this has been working fine on Windows and OSX and my
>>> version of linux (redhat) but is failing on a version of linux
>>> (archlinux) with 'Too many files opened' , but they are only
>>> indexing 32 documents , I can index thousands without a problem.
>>> It mentions this error in the Lucene FAQ but I am not dealing
>>> directly with the filesystem myself, this is my code for creating
>>> an index is it okay or is there some kind of close that I am missing
>>>
>>> thanks for any help Paul
>>>
>>> public synchronized void reindex()
>>> {
>>> MainWindow.logger.info("Reindex start:" + new Date());
>>> TableModel tableModel = table.getModel();
>>> try
>>> {
>>> //Recreate the RAMDirectory uses too much memory
>>> //directory = new RAMDirectory();
>>> directory =
>>> FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/"
>>> + TAG_BROWSER_INDEX);
>>> IndexWriter writer = new IndexWriter(directory, analyzer,
>>> true);
>>>
>>> //Iterate through all rows
>>> for (int row = 0; row < tableModel.getRowCount(); row++)
>>> {
>>> //for each row make a new document
>>> Document document = createDocument(row);
>>> writer.addDocument(document);
>>>
>>> }
>>> writer.optimize();
>>> writer.close();
>>> }
>>> catch (Exception e)
>>> {
>>> throw new RuntimeException("Problem indexing Data:" +
>>> e.getMessage());
>>> }
>>> }
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jul 8, 2008, 3:15 AM

Post #5 of 8 (2116 views)
Permalink
Re: Move from RAMDirectory to FSDirectory causing problem sometimes [In reply to]

Also, if possible, you should share the IndexSearcher across multiple
searches (ie, don't open/close a new one per search). Opening an
IndexSearcher can be a resource intensive operation, so you'll see
better throughput if you share. (Though in your particular situation
it may not matter).

Mike

Paul Taylor wrote:

> Michael McCandless wrote:
>>
>> Technically you should call directory.close() as well, but missing
>> that will not lead to too many open files.
>>
>> How often is that RuntimeException being thrown? EG if a single
>> document is frequently hitting an exception during analysis, your
>> code doesn't close the IndexWriter in that situation. It's better
>> to use a try/finally and close the IndexWriter in the finally
>> clause, to cover that case.
>>
>> Are you sure nothing else is using up file descriptors? EG the
>> createDocument call does not open any files?
>>
>> Mike
>>
> The runtimeException is occurring all the time, Im waiting for some
> more information from the user. Since the post I've since added
> directory.close() too, I thought this would cause a problem when I
> call IndexSearcher with it as a parameter but it seems to still work
> - the documentation is not very clear on this point. I see your
> poibnt about the try/finally I'll make that chnage.
>
> There are many other parts of the code that use filedescriptors, but
> the problem has never occurred before moving to a FSDirectory
>
> thanks paul
>
> heres an example of my search code, is this ok ?
>
> public boolean recNoColumnMatchesSearch(Integer columnId, Integer
> recNo, String search)
> { try
> {
> IndexSearcher is = new IndexSearcher(directory);
>
> //Build a query based on the fields, searchString and
> standard analyzer
> QueryParser parser = new
> QueryParser(String.valueOf(columnId) + INDEXED, analyzer);
> Query query = parser.parse(search);
> MainWindow.logger.finer("Parsed Search Query Is" +
> query.toString() + "of type:" + query.getClass());
>
> //Create a filter,to restrict search to one row
> Filter filter = new QueryFilter(new TermQuery(new
> Term(ROW_NUMBER, String.valueOf(recNo))));
>
> //run the search
> Hits hits = is.search(query, filter);
> Iterator i = hits.iterator();
> if (i.hasNext())
> {
> return true;
> }
> }
> catch (ParseException pe)
> {
> //Problem with syntax rather than throwing exception and
> causing everything to stop we just
> //log and return false
> MainWindow.logger.warning("Search Query invalid:" +
> pe.getMessage());
> return false;
> }
> catch (IOException e)
> {
> MainWindow.logger.warning("DataIndexer.Unable to do
> perform reno match search:" + search + ":" + e);
> }
> return false;
>
>> Paul Taylor wrote:
>>
>>> Hi, I have been using a RAMDirectory for indexing without any
>>> problem, but I then moved to a file based directory to reduce
>>> memory usage. this has been working fine on Windows and OSX and my
>>> version of linux (redhat) but is failing on a version of linux
>>> (archlinux) with 'Too many files opened' , but they are only
>>> indexing 32 documents , I can index thousands without a problem.
>>> It mentions this error in the Lucene FAQ but I am not dealing
>>> directly with the filesystem myself, this is my code for creating
>>> an index is it okay or is there some kind of close that I am missing
>>>
>>> thanks for any help Paul
>>>
>>> public synchronized void reindex()
>>> {
>>> MainWindow.logger.info("Reindex start:" + new Date());
>>> TableModel tableModel = table.getModel();
>>> try
>>> {
>>> //Recreate the RAMDirectory uses too much memory
>>> //directory = new RAMDirectory();
>>> directory =
>>> FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/"
>>> + TAG_BROWSER_INDEX);
>>> IndexWriter writer = new IndexWriter(directory, analyzer,
>>> true);
>>>
>>> //Iterate through all rows
>>> for (int row = 0; row < tableModel.getRowCount(); row++)
>>> {
>>> //for each row make a new document
>>> Document document = createDocument(row);
>>> writer.addDocument(document);
>>>
>>> }
>>> writer.optimize();
>>> writer.close();
>>> }
>>> catch (Exception e)
>>> {
>>> throw new RuntimeException("Problem indexing Data:" +
>>> e.getMessage());
>>> }
>>> }
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


paul_t100 at fastmail

Jul 8, 2008, 3:38 AM

Post #6 of 8 (2128 views)
Permalink
Re: Move from RAMDirectory to FSDirectory causing problem sometimes [In reply to]

Michael McCandless wrote:
>
> Hmmm, you should not close the directory if you are then going to use
> it to instantiate a searcher.
how come it works ?
>
> Your code below never closes the searcher? I think that is most
> likely the source of your file descriptor leaks.
Ok fixed

paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jul 8, 2008, 8:39 AM

Post #7 of 8 (2121 views)
Permalink
Re: Move from RAMDirectory to FSDirectory causing problem sometimes [In reply to]

It works because Lucene doesn't currently check for it, and, because
closing an FSDirectory does not actually make it unusable. In fact it
also doesn't catch a double-close call.

But it may cause subtle problems, because FSDirectory has this
invariant: only a single instance of FSDirectory exists per canonical
directory in the filesystem. This allows code to synchronized on that
instance and sure no other code in the same JVM is also working in
that canonical directory.

When you close an FSDirectory but keep using it you can get yourself
to a point where this invariant is broken. That said, besides
IndexModifier (which is now deprecated), I can't find anything that
would actually break when this invariant is broken.

Still I think we should put protection in to catch double-closing and
prevent using a closed directory. I'll open an issue.

Mike

Paul Taylor wrote:

> Michael McCandless wrote:
>>
>> Hmmm, you should not close the directory if you are then going to
>> use it to instantiate a searcher.
> how come it works ?
>>
>> Your code below never closes the searcher? I think that is most
>> likely the source of your file descriptor leaks.
> Ok fixed
>
> paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jul 8, 2008, 9:14 AM

Post #8 of 8 (2100 views)
Permalink
Re: Move from RAMDirectory to FSDirectory causing problem sometimes [In reply to]

OK I opened:

https://issues.apache.org/jira/browse/LUCENE-1331

Mike

Paul Taylor wrote:

> Michael McCandless wrote:
>>
>> Hmmm, you should not close the directory if you are then going to
>> use it to instantiate a searcher.
> how come it works ?
>>
>> Your code below never closes the searcher? I think that is most
>> likely the source of your file descriptor leaks.
> Ok fixed
>
> paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.