I am using Lucene.net 3.0.3 and AzureDirectory 2.0.4937.26631 which I installed from NuGet (called Lucene.Net.Store.Azure in NuGet).
The project description at azuredirectory.codeplex.com states "To be more concrete: you can have 1..N worker roles adding documents to an index, and 1..N searcher webroles searching over the catalog in near real time." (emphasis added) Implying that is is possible to have multiple worker roles writing to the index in parallel. However, when I try to do this I get many "Lock obtain timed out: [email protected]." exceptions.
My code follows the example given in the AzureDirectory documentation (azuredirectory.codeplex.com/documentation). My code is roughly (simplified for question).
var dbEntities = // Load database entities here
var docFactory = // Create class that builds lucene documents from dbEntities
var account = // get the CloudStorageAccount
var directory = new AzureDirectory(account, "<my container name>");
using(var writer = new IndexWriter(directory, new StandardAnalyzer(Version.LUCENE_30), createEvenIfExists, IndexWriter.MaxFieldLength.UNLIMITED))
{
foreach(var entity in entities)
{
writer.AddDocument(docFactory.CreateDocument(entity));
}
}
When run sequentially, this code works fine. However, if I run the same code in parallel on multiple threads/workers. I get many "Lock obtain timed out: [email protected]." exceptions:
[Lucene.Net.Store.LockObtainFailedException: Lock obtain timed out: [email protected].]
at Lucene.Net.Store.Lock.Obtain(Int64 lockWaitTimeout) in d:\Lucene.Net\FullRepo\trunk\src\core\Store\Lock.cs:line 83
at Lucene.Net.Index.IndexWriter.Init(Directory d, Analyzer a, Boolean create, IndexDeletionPolicy deletionPolicy, Int32 maxFieldLength, IndexingChain indexingChain, IndexCommit commit) in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexWriter.cs:line 1228
at Lucene.Net.Index.IndexWriter..ctor(Directory d, Analyzer a, Boolean create, MaxFieldLength mfl) in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexWriter.cs:line 1018
I understand that a "write.lock" file is created in blob storage and when the file contains the text "wrote.lock" the lock is held. I see from my searches that users have had problems with the write.lock not getting cleaned up. That doesn't seem to be my problem since I can get the same code to work correctly when run in sequence, and the lock file is cleaned up in that case.
I see in the AzureDirectory documentation (azuredirectory.codeplex.com/documentation) that "The index can only be updated by one process at a time, so it makes sense to push all Add/Update/Delete operations through an indexing role." However, that doesn't make any sense since any role you create should have multiple instances, so there would be multiple instances writing to the index in parallel. Also, the project description directly states that "you can have 1..N worker roles adding documents to an index." Note it says "an" index, not shards of index.
Question:
So, is the project description simply wrong? Or is there actually some way to have multiple IndexWriters adding to an index in parallel? I can't see anything in the API to allow that. If it is possible, please provide a code snippet of how to use AzureDirectory to "have 1..N worker roles adding documents to an index" in parallel.