9
votes

I am using Google Kubernetes Engine to deploy some applications that need to connect to a DB on premises. In order to do that, I have configured a VPN tunnel and created a VPC.

Then, I created a GKE cluster (1 node) that is using that VPC and I can confirm that the DB is accessible by connecting to the node and try to ping the DB server

~ $ sudo toolbox ping 10.197.100.201
Spawning container root-gcr.io_google-containers_toolbox-20180309-00 on 
/var/lib/toolbox/root-gcr.io_google-containers_toolbox-20180309-00.
Press ^] three times within 1s to kill container.
PING 10.197.100.201 (10.197.100.201): 56 data bytes 
64 bytes from 10.197.100.201: icmp_seq=0 ttl=62 time=45.967 ms
64 bytes from 10.197.100.201: icmp_seq=1 ttl=62 time=44.186 ms`

However, if I try to do the same from a Pod, I am not able to connect.

root@one-shot-pod:/# traceroute 10.197.100.201
traceroute to 10.197.100.201 (10.197.100.201), 30 hops max, 60 byte 
packets
 1  10.0.0.1 (10.0.0.1)  0.046 ms  0.009 ms  0.007 ms
 2  * * *
 3  * * *```

What am I missing?

2
What kind of VPN is it, what is the CIDR of the SDN, what is the CIDR of the VPN tunnel, and have you configured that pod with hostNetwork: true?mdaniel
It is a route based VPN and the subnet that is being used had 10.198.100.0/24 CIDR. The alternative pod CIDR is 10.40.0.0/14. If I use hostNetwork: true it works, I think that it is not the right approach to do it, right? What do you mean with CIDR of the SDN?pVilaca
I am sharing a link[1] for a gke network demo for VPN. You might find it helpful to understand how Kubernetes Engine communicate through VPN. This demonstrates how connection could be established between a kubernetes engine cluster and a cluster running in an on-premises. [1]github.com/GoogleCloudPlatform/gke-networking-demos/tree/master/…Md Zubayer
Have you solved this issue @pVilaca ?jrenk

2 Answers

5
votes

After some investigation, I found the root cause of the problem. Basically, the communication wasn't working properly because there is something called ip masquerade (https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent) that is used for NAT translation.

As GKE has some default addresses that are configured to not be masquerade (on the version that I was using, the defaults were: 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16) and the destination ip was 10.197.100.201 (part of 10.0.0.0/8) and that ip was outside the cluster, the solution was modifying the nonMasqueradeCIDRs and remove 10.0.0.0/8 and use 10.44.0.0/14 (GKE cluster CIDR) instead.

In order to do that, I used the following configmap:

apiVersion: v1
data:
  config: |-
    nonMasqueradeCIDRs:
      - 10.44.0.0/14
      - 172.16.0.0/12
      - 192.168.0.0/16
    resyncInterval: 60s
kind: ConfigMap
metadata:
  name: ip-masq-agent
  namespace: kube-system

After that, to apply the config, you can upload the configmap using the follwing command:

kubectl create configmap ip-masq-agent --from-file <configmap file> --namespace kube-system
3
votes

I found a solution in this blog.

The problem is that the default iptables config looks like this:

iptables -A POSTROUTING ! -d 10.0.0.0/8 \
  -m comment --comment “kubenet: outbound traffic" -m addrtype \
  ! --dst-type LOCAL -j MASQUERADE -t nat

It means that traffic from the pods will be NATted to the host IP only if the destination is not in 10.0.0.0/8.

This 10.0.0.0/8 is the problem: it’s too large.

It also includes your 10.197.100.201 IP.

To fix this you can add the following DaemonSet to your Kubernetes Cluster:

kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  name: fix-nat
  labels:
    app: fix-nat
spec:
  template:
    metadata:
      labels:
        app: fix-nat
    spec:
      hostPID: true
      containers:
        - name: fix-nat
          image: gcr.io/google-containers/startup-script:v1
          imagePullPolicy: Always
          securityContext:
            privileged: true
          env:
          - name: STARTUP_SCRIPT
            value: |
              #! /bin/bash
              while true; do
                iptables-save | grep MASQUERADE | grep -q "NAT-VPN"
                if [ $? -ne 0 ]; then
                  echo "Missing NAT rule for VPN, adding it"
                  iptables -A POSTROUTING -d 10.197.100.0/24 -m comment --comment "NAT-VPN: SNAT for outbound traffic through VPN" -m addrtype ! --dst-type LOCAL -j MASQUERADE -t nat
                fi
                sleep 60
              done

This small script will check every minute, forever, if we have the right iptables rule and, if not, add it.

Note that the privileged: true is necessary for the pod to be able to change iptables rules from the host.

I had the same problem and this solved the issue.