1
votes

Intro

I'm trying to play around with SolrCloud, using Zookeeper. I know that SolrCloud has its own built-in Zookeeper, but since using that set-up is not recommended, I mimic (or, at least, I hope so) the external Zookeeper ensemble - Solr Cloud setup (3 ZK nodes, 2 Solr nodes).

To facilitate this, I created following docker-compose.yml:

version: '3.8'

services:
  zoo1:
    image: library/zookeeper:3.5.7
    container_name: zoo1
    restart: always
    hostname: zoo1
    ports:
      - 8184:8080
    environment:
      TZ: Europe/Paris
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
    networks:
      - solr
    command: >
      sh -c "ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && 
      echo $TZ > /etc/timezone &&
      sed -i 's/autopurge.purgeInterval=0/autopurge.purgeInterval=1/g' /conf/zoo.cfg  &&
      echo 4lw.commands.whitelist=mntr,conf,ruok >> /conf/zoo.cfg &&
      exec zkServer.sh start-foreground"

  zoo2:
    image: library/zookeeper:3.5.7
    container_name: zoo2
    restart: always
    hostname: zoo2
    ports:
      - 8284:8080
    environment:
      TZ: Europe/Paris
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zoo3:2888:3888;2181
    networks:
      - solr
    command: >
      sh -c "ln -snf /usr/share/zoneinfo/$TZ /etc/localtime &&
      echo $TZ > /etc/timezone &&
      sed -i 's/autopurge.purgeInterval=0/autopurge.purgeInterval=1/g' /conf/zoo.cfg  &&
      echo 4lw.commands.whitelist=mntr,conf,ruok >> /conf/zoo.cfg &&
      exec zkServer.sh start-foreground"

  zoo3:
    image: library/zookeeper:3.5.7
    container_name: zoo3
    restart: always
    hostname: zoo3
    ports:
      - 8384:8080
    environment:
      TZ: Europe/Paris
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=0.0.0.0:2888:3888;2181
    networks:
      - solr
    command: >
      sh -c "ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && 
      echo $TZ > /etc/timezone &&
      sed -i 's/autopurge.purgeInterval=0/autopurge.purgeInterval=1/g' /conf/zoo.cfg  &&
      echo 4lw.commands.whitelist=mntr,conf,ruok >> /conf/zoo.cfg &&
      exec zkServer.sh start-foreground"

  solr1:
    image: library/solr:8.6.3
    container_name: solr1
    ports:
      - "8981:8983"
    environment:
       ZK_HOST: zoo1:2181,zoo2:2181,zoo3:2181
    networks:
      - solr
    depends_on:
      - zoo1
      - zoo2
      - zoo3

  solr2:
    image: library/solr:8.6.3
    container_name: solr2
    ports:
      - "8982:8983"
    environment:
      ZK_HOST: zoo1:2181,zoo2:2181,zoo3:2181
    networks:
      - solr
    depends_on:
      - zoo1
      - zoo2
      - zoo3

networks:
  solr:
    name: solr_zookeeper_cluster

So, using this file, everything starts up nice and easy. I actually have 3 ZK nodes, of which one is the leader, and 2 Solr nodes...

Problem

However (and here's my actual problem) Solr UI acts a bit weird when showing the ZK status. I always have exactly 2 ZK instances in the zkStatus that have no issue, but exactly one that's "not ok"... Most of the time, both Solr nodes have issues with the same Zookeeper node, but as soon as I start playing around (as in: stopping the leaders to trigger leader-election and restarting that particular node), it becomes quite randomized....

Screenshot after initial startup:

SolrNodesSameZkNodeDown

Screenshot after trigging the leader-election

SolrNodesDifferentZkNodeDown

Some node log

2020-10-14 09:31:18.597 INFO  (main) [   ] o.e.j.s.Server Started @7571ms
2020-10-14 09:32:20.539 INFO  (qtp247162961-18) [   ] o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for 2147483647 transient cores
2020-10-14 09:32:20.540 INFO  (qtp247162961-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1602667940461} status=0 QTime=6
2020-10-14 09:32:20.552 WARN  (qtp247162961-17) [   ] o.a.s.h.a.ZookeeperStatusHandler Failed talking to zookeeper 0.0.0.0:2181 => org.apache.solr.common.SolrException: Failed talking to Zookeeper 0.0.0.0:2181
        at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkRawResponse(ZookeeperStatusHandler.java:294)
org.apache.solr.common.SolrException: Failed talking to Zookeeper 0.0.0.0:2181
        at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkRawResponse(ZookeeperStatusHandler.java:294) ~[?:?]
        at org.apache.solr.handler.admin.ZookeeperStatusHandler.monitorZookeeper(ZookeeperStatusHandler.java:238) ~[?:?]
        at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkStatus(ZookeeperStatusHandler.java:144) ~[?:?]
        at org.apache.solr.handler.admin.ZookeeperStatusHandler.handleRequestBody(ZookeeperStatusHandler.java:84) ~[?:?]
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214) ~[?:?]
        at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:857) ~[?:?]
        at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:821) ~[?:?]
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566) ~[?:?]
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415) ~[?:?]
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) ~[?:?]
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596) ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590) ~[jetty-security-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485) ~[jetty-servlet-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) ~[jetty-rewrite-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.Server.handle(Server.java:500) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) ~[jetty-server-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[jetty-io-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) ~[jetty-io-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) ~[jetty-io-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) ~[jetty-util-9.4.27.v20200227.jar:9.4.27.v20200227]
        at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) ~[?:?]
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) ~[?:?]
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source) ~[?:?]
        at java.net.SocksSocketImpl.connect(Unknown Source) ~[?:?]
        at java.net.Socket.connect(Unknown Source) ~[?:?]
        at java.net.Socket.connect(Unknown Source) ~[?:?]
        at java.net.Socket.<init>(Unknown Source) ~[?:?]
        at java.net.Socket.<init>(Unknown Source) ~[?:?]
        at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkRawResponse(ZookeeperStatusHandler.java:285) ~[?:?]
        ... 46 more
2020-10-14 09:32:20.564 INFO  (qtp247162961-22) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1602667940462} status=0 QTime=29
2020-10-14 09:32:20.573 INFO  (qtp247162961-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/zookeeper/status params={wt=json&_=1602667940521} status=0 QTime=39
2020-10-14 09:32:20.589 INFO  (qtp247162961-20) [   ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params action=LIST&wt=json&_=1602667940462 and sendToOCPQueue=true
2020-10-14 09:32:20.589 INFO  (qtp247162961-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={action=LIST&wt=json&_=1602667940462} status=0 QTime=0
2020-10-14 09:32:20.612 INFO  (qtp247162961-18) [   ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :listaliases with params action=LISTALIASES&wt=json&_=1602667940462 and sendToOCPQueue=true
2020-10-14 09:32:20.615 INFO  (qtp247162961-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={action=LISTALIASES&wt=json&_=1602667940462} status=0 QTime=2
1

1 Answers

1
votes

You should not use 0.0.0.0 but the hostname as defined through the dockerfile. So at the zoo1 config server1 should be zoo1, at the zoo2 server2 should be zoo2 and at the zoo3 server should be zoo3.