0
votes

I am using solr 4.10.2. I tried to perform a shard split on my solr cloud test cluster. It fails all the time if the index type is set to "native" or "simple".

Is that normal? I can perform shard splitting if the index type is set to "single" or "none". They advertise that shard splitting can be done while solr is running and i hardly imagine poking around changing the lock type of a production server...

Here is the test environment:

1 shard, 2 nodes, 1 collection. Initially the collection was empty. I added few documents, verified that they have been replicated. All worked.

Issued the split shard command:

server1:port/solr/admin/collections?action=SPLITSHARD&collection=mycollection&shard=shard1&async=myhandle

After verifying that the operation had finished, by calling

server1:port/solr/admin/collections?action=REQUESTSTATUS&requestid=myhandle

The status was "complete".

Here is the log:

OverseerCollectionProcessor.processMessage : splitshard , {
  "operation":"splitshard",
  "shard":"shard1",
  "collection":"mycollection",
  "async":"myhandle"}


  1/26/2015, 1:49:02 PM
ERROR
CoreContainer
Error creating core [mycollection_shard1_0_replica1]: Error opening new searcher
org.apache.solr.common.SolrException: Error opening new searcher
    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:873)
    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:646)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
    at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575)
    at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
    at org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1234)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
    ... 9 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/nfs/solr/index/write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:89)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:753)
    at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77)
    at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
    at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
    at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
    ... 11 more

    org.apache.solr.common.SolrException: Error opening new searcher
    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:873)
    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:646)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
    at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575)
    at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
    at org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1234)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
    ... 9 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/nfs/solr/index/write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:89)
    at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:753)
    at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77)
    at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
    at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
    at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
    ... 11 more

    1/26/2015, 1:49:26 PM
ERROR
SolrIndexWriter
SolrIndexWriter was not closed prior to finalize(),​ indicates a bug -- POSSIBLE RESOURCE LEAK!!!
1/26/2015, 1:49:26 PM
ERROR
SolrIndexWriter
Error closing IndexWriter
java.lang.NullPointerException
    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3230)
    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3203)
    at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:907)
    at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:984)
    at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:954)
    at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:129)
    at org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:182)
    at java.lang.System$2.invokeFinalize(System.java:1213)
    at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98)
    at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210)
1

1 Answers

0
votes

I fixed the problem. Here is how:

When i created the solr cloud environment, i used the -Dsolr.data.dir property to map the collection storage to a different file system. This was because i was running VMs with limited storage capacity. Once i removed this property everything started working. I think solr tries to use the same solr.data.dir path for the new cores created by the shard split causing the lock problem.