4
votes

The project I'm working on is indexing a certain number of data (with long texts) and comparing them with list of words per interval (about 15 to 30 minutes).

After some time, say 35th round, while starting to index new set of data on 36th round this error occurred:

    [ERROR] (2011-06-01 10:08:59,169) org.demo.service.LuceneService.countDocsInIndex(?:?) : Exception on countDocsInIndex: 
    java.io.FileNotFoundException: /usr/share/demo/index/tag/data/_z.tvd (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:69)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:90)
        at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:91)
        at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
        at org.apache.lucene.index.TermVectorsReader.<init>(TermVectorsReader.java:81)
        at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:299)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:580)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:556)
        at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:113)
        at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:29)
        at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:736)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:428)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:274)
        at org.demo.service.LuceneService.countDocsInIndex(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.indexTweets(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.processTweets(Unknown Source)
        at org.demo.processing.worker.DataFilterWorker.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:636)

I've already tried setting maximum number of open files by:

        ulimit -n <number>

But after some time, when the interval has about 1050 rows of long texts, the same error occurs. But it only occurred once.

Should I follow the advice of modifying Lucene IndexWriter's mergeFactor from (Too many open files) - SOLR or is this an issue on the amount of data being indexed?

I've also read that it's a choice between batch indexing or interactive indexing. How would one determine if indexing is interactive, just by frequent updates? Should I categorize this project under interactive indexing then?

UPDATE: I'm adding snippet of my IndexWriter:

        writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);

Seems like maxMerge (? or field length...) is already set to unlimited.

3

3 Answers

2
votes

I already used the ulimit but error still shows. Then I inspected the customized core adapters for lucene functions. Turns out there's too many IndexWriter.open directory that is LEFT OPEN.

Should note that after processing, will always call on closing the directory opened.

1
votes

You need to double check if ulimit value has actually been persisted and set to a proper value (whatever maximum is).

It is very likely that your app is not closing index readers/writers properly. I've seen many stories like this in the Lucene mailing list and it was almost always the user app which was to blame, not the Lucene itself.

0
votes

Use compound index to reduce file count. When this flag is set, lucene will write a segment as single .cfs file instead of multiple files. This will reduce the number of files significantly.

IndexWriter.setUseCompoundFile(true)