0
votes

I have init a swarm with 1 manager and 1 worker, each on a different hosts, following the official tutorial. I also use Traefik, following these instructions on dockerswarm.rocks, using simply overlay network created with:

docker network create --driver=overlay traefik-public

Now I deploy a service of mine, which has to access the Internet.

While this works well when the service is deployed on the manager node, it fails in the worker node.

docker-compose.yml

version: '3.5'
services:
  export-phyc:
    image: my.docker.registry/my/image
    networks:
      - traefik-public
    deploy:
      labels:
        - traefik.enable=true
        - traefik.docker.network=traefik-public
        - traefik.constraint-label=traefik-public
        - traefik.http.routers.myservice-http.rule=Host(`my.domain`)
        - traefik.http.routers.myservice-http.entrypoints=http
        - traefik.http.routers.myservice-http.middlewares=https-redirect
        - traefik.http.routers.myservice-https.rule=Host(`my.domain`)
        - traefik.http.routers.myservice-https.entrypoints=https
        - traefik.http.routers.myservice-https.tls=true
        - traefik.http.routers.myservice-https.tls.certresolver=le
        - traefik.http.services.myservice.loadbalancer.server.port=80    
networks:
  traefik-public:
    external: true

Both hosts have the same DNS conf:

# cat /etc/resolv.conf
domain openstacklocal
search openstacklocal
nameserver 213.186.xx.xx

Both tasks has the same DNS conf too (but not the same as the hosts):

# docker container exec <my-container-id> cat /etc/resolv.conf
search openstacklocal
nameserver 127.0.0.xx
options ndots:0

And yet, the task on the manager can reach the internet:

# docker container exec <my-container-id> wget google.com
Connecting to google.com (216.58.215.46:80)
Connecting to www.google.com (216.58.206.228:80)
saving to 'index.html'
index.html           100% |********************************| 13848  0:00:00 ETA
'index.html' saved

and the task on the worker cannot:

# docker container exec <my-container-id> wget google.com
wget: bad address 'google.com'
# docker container exec <my-container-id> wget 216.58.204.142
Connecting to 216.58.204.142 (216.58.204.142:80)
wget: can't connect to remote host (216.58.204.142): Operation timed out

I am most confused. How do I get the tasks on my worker node to access the internet?

1
This problem doesn't seem related to anything you have done with relation to swarm or even docker. Can you do a wget/curl from the host system for example? - Chris Becke
Well, it seems to be a firewall problem, yet related to what Swarm needs, as turning the firewalls solves the problem. I'm using iptables. I'm investigating. - Bob

1 Answers

0
votes

So the problem was with my firewall (iptables) messing around with the rules set by Docker. I indeed need to implement my own rules (launched at reboots), and Docker has to set its internal communication rules (set everytime the docker daemon restarts).

I'm not a connoisseur of iptables, I just got one supposed to deal well with Docker Swarm, but one line was missing in it:

-A DOCKER-USER -j RETURN

Example iptables rules:

*filter
:INPUT ACCEPT [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:FILTERS - [0:0]
:DOCKER-USER - [0:0]

-F INPUT
-F DOCKER-USER
-F FILTERS

-A INPUT -i lo -j ACCEPT
-A INPUT -j FILTERS

-A DOCKER-USER -i eno1 -j FILTERS
-A DOCKER-USER -j RETURN

-A FILTERS -m state --state ESTABLISHED,RELATED -j ACCEPT

#allow 80 and 443
-A FILTERS -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A FILTERS -m state --state NEW -m tcp -p tcp --dport 443 -j ACCEPT

###### System

# Allow SSH connections
-A FILTERS -p tcp --dport 22 -j ACCEPT

# Docker SWARM cluster connections
-A FILTERS -p tcp --dport 2377 -j ACCEPT
-A FILTERS -p tcp --dport 7946 -j ACCEPT
-A FILTERS -p udp --dport 7946 -j ACCEPT
-A FILTERS -p udp --dport 4789 -j ACCEPT

###### Rules home

# ...

###### end

-A FILTERS -j REJECT --reject-with icmp-host-prohibited

COMMIT