Intro
I'm trying to play around with SolrCloud, using Zookeeper. I know that SolrCloud has its own built-in Zookeeper, but since using that set-up is not recommended, I mimic (or, at least, I hope so) the external Zookeeper ensemble - Solr Cloud setup (3 ZK nodes, 2 Solr nodes).
To facilitate this, I created following docker-compose.yml
:
version: '3.8'
services:
zoo1:
image: library/zookeeper:3.5.7
container_name: zoo1
restart: always
hostname: zoo1
ports:
- 8184:8080
environment:
TZ: Europe/Paris
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
networks:
- solr
command: >
sh -c "ln -snf /usr/share/zoneinfo/$TZ /etc/localtime &&
echo $TZ > /etc/timezone &&
sed -i 's/autopurge.purgeInterval=0/autopurge.purgeInterval=1/g' /conf/zoo.cfg &&
echo 4lw.commands.whitelist=mntr,conf,ruok >> /conf/zoo.cfg &&
exec zkServer.sh start-foreground"
zoo2:
image: library/zookeeper:3.5.7
container_name: zoo2
restart: always
hostname: zoo2
ports:
- 8284:8080
environment:
TZ: Europe/Paris
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zoo3:2888:3888;2181
networks:
- solr
command: >
sh -c "ln -snf /usr/share/zoneinfo/$TZ /etc/localtime &&
echo $TZ > /etc/timezone &&
sed -i 's/autopurge.purgeInterval=0/autopurge.purgeInterval=1/g' /conf/zoo.cfg &&
echo 4lw.commands.whitelist=mntr,conf,ruok >> /conf/zoo.cfg &&
exec zkServer.sh start-foreground"
zoo3:
image: library/zookeeper:3.5.7
container_name: zoo3
restart: always
hostname: zoo3
ports:
- 8384:8080
environment:
TZ: Europe/Paris
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=0.0.0.0:2888:3888;2181
networks:
- solr
command: >
sh -c "ln -snf /usr/share/zoneinfo/$TZ /etc/localtime &&
echo $TZ > /etc/timezone &&
sed -i 's/autopurge.purgeInterval=0/autopurge.purgeInterval=1/g' /conf/zoo.cfg &&
echo 4lw.commands.whitelist=mntr,conf,ruok >> /conf/zoo.cfg &&
exec zkServer.sh start-foreground"
solr1:
image: library/solr:8.6.3
container_name: solr1
ports:
- "8981:8983"
environment:
ZK_HOST: zoo1:2181,zoo2:2181,zoo3:2181
networks:
- solr
depends_on:
- zoo1
- zoo2
- zoo3
solr2:
image: library/solr:8.6.3
container_name: solr2
ports:
- "8982:8983"
environment:
ZK_HOST: zoo1:2181,zoo2:2181,zoo3:2181
networks:
- solr
depends_on:
- zoo1
- zoo2
- zoo3
networks:
solr:
name: solr_zookeeper_cluster
So, using this file, everything starts up nice and easy. I actually have 3 ZK nodes, of which one is the leader, and 2 Solr nodes...
Problem
However (and here's my actual problem) Solr UI acts a bit weird when showing the ZK status.
I always have exactly 2 ZK instances in the zkStatus
that have no issue, but exactly one that's "not ok"...
Most of the time, both Solr nodes have issues with the same Zookeeper node, but as soon as I start playing around (as in: stopping the leaders to trigger leader-election and restarting that particular node), it becomes quite randomized....
Screenshot after initial startup:
Screenshot after trigging the leader-election
Some node log
2020-10-14 09:31:18.597 INFO (main) [ ] o.e.j.s.Server Started @7571ms
2020-10-14 09:32:20.539 INFO (qtp247162961-18) [ ] o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for 2147483647 transient cores
2020-10-14 09:32:20.540 INFO (qtp247162961-18) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1602667940461} status=0 QTime=6
2020-10-14 09:32:20.552 WARN (qtp247162961-17) [ ] o.a.s.h.a.ZookeeperStatusHandler Failed talking to zookeeper 0.0.0.0:2181 => org.apache.solr.common.SolrException: Failed talking to Zookeeper 0.0.0.0:2181
at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkRawResponse(ZookeeperStatusHandler.java:294)
org.apache.solr.common.SolrException: Failed talking to Zookeeper 0.0.0.0:2181
at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkRawResponse(ZookeeperStatusHandler.java:294) ~[?:?]
at org.apache.solr.handler.admin.ZookeeperStatusHandler.monitorZookeeper(ZookeeperStatusHandler.java:238) ~[?:?]
at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkStatus(ZookeeperStatusHandler.java:144) ~[?:?]
at org.apache.solr.handler.admin.ZookeeperStatusHandler.handleRequestBody(ZookeeperStatusHandler.java:84) ~[?:?]
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214) ~[?:?]
at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:857) ~[?:?]
at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:821) ~[?:?]
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566) ~[?:?]
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415) ~[?:?]
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) ~[?:?]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596) ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590) ~[jetty-security-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485) ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) ~[jetty-rewrite-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.Server.handle(Server.java:500) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[jetty-io-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) ~[jetty-io-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) ~[jetty-io-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) ~[?:?]
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) ~[?:?]
at java.net.AbstractPlainSocketImpl.connect(Unknown Source) ~[?:?]
at java.net.SocksSocketImpl.connect(Unknown Source) ~[?:?]
at java.net.Socket.connect(Unknown Source) ~[?:?]
at java.net.Socket.connect(Unknown Source) ~[?:?]
at java.net.Socket.<init>(Unknown Source) ~[?:?]
at java.net.Socket.<init>(Unknown Source) ~[?:?]
at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkRawResponse(ZookeeperStatusHandler.java:285) ~[?:?]
... 46 more
2020-10-14 09:32:20.564 INFO (qtp247162961-22) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1602667940462} status=0 QTime=29
2020-10-14 09:32:20.573 INFO (qtp247162961-17) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/zookeeper/status params={wt=json&_=1602667940521} status=0 QTime=39
2020-10-14 09:32:20.589 INFO (qtp247162961-20) [ ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params action=LIST&wt=json&_=1602667940462 and sendToOCPQueue=true
2020-10-14 09:32:20.589 INFO (qtp247162961-20) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={action=LIST&wt=json&_=1602667940462} status=0 QTime=0
2020-10-14 09:32:20.612 INFO (qtp247162961-18) [ ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :listaliases with params action=LISTALIASES&wt=json&_=1602667940462 and sendToOCPQueue=true
2020-10-14 09:32:20.615 INFO (qtp247162961-18) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={action=LISTALIASES&wt=json&_=1602667940462} status=0 QTime=2