1
votes

A bit of background first: I'm currently using Hibernate Search 4.2 in my Java web application and I deal with write heavy Lucene Indexes in which I store quite a bit of data. An indexing operation of single object stored in my biggest index takes around 1 second when using the directory-based indexmanager.

To improve performance, I switched the indexmanager to near-real-time and the performance improved leaps and bounds, but now I would like to implement zero downtime deployments using Tomcat Parallel Deployments (which allows me to have 2 different versions of the same application side-by-side in a single Tomcat) and I have found out that I can't use the near-real-time indexmanager anymore because it buffers the modifications in memory and avoids flushing to disk until the buffer ram is full or the application shuts down.

My question is: what alternative solutions do I have in this situation? I would like to keep the indexing process as synchronous as possible.

I have taken a look at the code of Hibernate Search and I have seen that the there are different commit policies and a class called IndexWriterHolder that allows me to commit and flush writes, but I'm not sure if or how those APIs are publicly exposed.

I've also seen that Lucene 4 implements concurrent flushing and I've discovered the max_thread_states flag in Hibernate Search 5 that allows me to specify the number of concurrent writer threads per IndexWriter, but I've never used it before and I'm not sure if concurrent flushing would help in my situation.

Any help is greatly appreciated. Thank you.

1

1 Answers

0
votes

Great question. The short answer is that this is currently not possible.

The IndexWriterHolder is indeed not public API and would be tricky to expose as it handles a queue in a background thread: invoking it directly rather than by scheduling events into the queue would be racy.

The concurrent flushing capabilities of Lucene are being used automatically, when safe; for example during MassIndexing; I don't expect the max_thread_states property would help you much, but it's worth trying the other tuning options.

I'm one of the Hibernate Search developers and wasn't aware of Tomcat's Parallel Deployments: it sounds like an interesting feature which we could explore supporting. Please open a feature request on JIRA or start a conversation on the forums to better help us understand how this could work.