21
votes

I'm running a Google Kubernetes Engine with the "private-cluster" option. I've also defined "authorized Master Network" to be able to remotely access the environment - this works just fine. Now I want to setup some kind of CI/CD pipeline using Google Cloud Build - after successfully building a new docker image, this new image should be automatically deployed to GKE. When I first fired off the new pipeline, the deployment to GKE failed - the error message was something like: "Unable to connect to the server: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout". As I had the "authorized master networks" option under suspicion for being the root cause for the connection timeout, I've added 0.0.0.0/0 to the allowed networks and started the Cloud Build job again - this time everything went well and after the docker image was created it was deployed to GKE. Good.

The only problem that remains is that I don't really want to allow the whole Internet being able to access my Kubernetes master - that's a bad idea, isn't it?

Are there more elegant solutions to narrow down access by using allowed master networks and also being able to deploy via cloud build?

4
The problem here is that the Cloud Build API needs to communicate with your cluster which is why it worked when you changed the authorized network to 0.0.0.0. You'd have to add a range of IPs that the Google APIs use to your master authorized network which does not seem like a good idea. Instead, could the build trigger something to your local machine which, in turn, triggers a call from your local machine to the K8s master to update the image?Patrick W
I'm really trying not to involve my local machine into the build process as I dont like me or my local machine being a vital part of it. I think the problems root cause is obvious - I thought there might maybe a "google internal" way to establish a connection to my cluster. I've also tried to narrow down the list of ip ranges used for cloud builder, but trying to do so still left me with a quite long list ...Mizaru
You can definitely do an internal way (you can configure a GCE VM on the same VPC network to use the k8s internal end point instead of the external one) but cloud builder won't have an IP that can be considered internal without opening your cluster up to far more IPs than you'd likePatrick W

4 Answers

10
votes

It's currently not possible to add Cloud Build machines to a VPC. Similarly, Cloud Build does not announce IP ranges of the build machines. So you can't do this today without creating a "ssh bastion instance" or a "proxy instance" on GCE within that VPC.

I suspect this would change soon. GCB existed before GKE private clusters and private clusters are still a beta feature.

4
votes

We ended up doing the following:

1) Remove the deployment step from cloudbuild.yaml

2) Install Keel inside the private cluster and give it pub/sub editor privileges in the cloud builder / registry project

Keel will monitor changes in images and deploy them automatically based on your settings.

This has worked out great as now we get pushed sha hashed image updates, without adding vms or doing any kind of bastion/ssh host.

4
votes

Updated answer (02/22/2021)

Unfortunately, while the below method works, IAP tunnels suffer from rate-limiting, it seems. If there are a lot of resources deployed via kubectl, then the tunnel times out after a while. I had to use another trick, which is to dynamically whitelist Cloud Build IP address via Terraform, and then to apply directly, which works every time.

Original answer

It is also possible to create an IAP tunnel inside a Cloud Build step:

- id: kubectl-proxy
  name: gcr.io/cloud-builders/docker
  entrypoint: sh
  args:
  - -c
  - docker run -d --net cloudbuild --name kubectl-proxy
      gcr.io/cloud-builders/gcloud compute start-iap-tunnel
      bastion-instance 8080 --local-host-port 0.0.0.0:8080 --zone us-east1-b &&
    sleep 5

This step starts a background Docker container named kubectl-proxy in cloudbuild network, which is used by all of the other Cloud Build steps. The Docker container establishes an IAP tunnel using Cloud Build Service Account identity. The tunnel connects to a GCE instance with a SOCKS or an HTTPS proxy pre-installed on it (an exercise left to the reader).

Inside subsequent steps, you can then access the cluster simply as

- id: setup-k8s
  name: gcr.io/cloud-builders/kubectl
  entrypoint: sh
  args:
  - -c
  - HTTPS_PROXY=socks5://kubectl-proxy:8080 kubectl apply -f config.yml

The main advantages of this approach compared to the others suggested above:

  • No need to have a "bastion" host with a public IP - kubectl-proxy host can be entirely private, thus maintaining the privacy of the cluster
  • Tunnel connection relies on default Google credentials available to Cloud Build, and as such there's no need to store/pass any long-term credentials like an SSH key
1
votes

Update: I suppose this won't work with production strength for the same reason as @dinvlad's update above, i.e., rate limiting in IAP. I'll leave my original post here because it does solve the network connectivity problem, and illustrates the underlying networking mechanism.

Furthermore, even if we don't use it for Cloud Build, my method provides a way to tunnel from my laptop to a K8s private master node. Therefore, I can edit K8s yaml files on my laptop (e.g., using VS Code), and immediately execute kubectl from my laptop, rather than having to ship the code to a bastion host and execute kubectl inside the bastion host. I find this a big booster to development time productivity.

Original answer

================

I think I might have an improvement to the great solution provided by @dinvlad above.

I think the solution can be simplified without installing an HTTP Proxy Server. Still need a bastion host.

I offer the following Proof of Concept (without HTTP Proxy Server). This PoC illustrates the underlying networking mechanism without involving the distraction of Google Cloud Build (GCB). (When I have time in the future, I'll test out the full implementation on Google Cloud Build.)

Suppose:

  1. I have a GKE cluster whose master node is private, e.g., having an IP address 10.x.x.x.
  2. I have a bastion Compute Instance named my-bastion. It has only private IP but not external IP. The private IP is within the master authorized networks CIDR of the GKE cluster. Therefore, from within my-bastion, kubectl works against the private GKE master node. Because my-bastion doesn't have an external IP, my home laptop connects to it through IAP.
  3. My laptop at home, with my home internet public IP address, doesn't readily have connectivity to the private GKE master node above.

The goal is for me to execute kubectl on my laptop against that private GKE cluster. From network architecture perspective, my home laptop's position is like the Google Cloud Build server.

Theory: Knowing that gcloud compute ssh (and the associated IAP) is a wrapper for SSH, the SSH Dynamic Port Forwarding should achieve that goal for us.

Practice:

## On laptop:
LAPTOP~$ kubectl get ns
^C            <<<=== Without setting anything up, this hangs (no connectivity to GKE).

## Set up SSH Dynamic Port Forwarding (SOCKS proxy) from laptop's port 8443 to my-bastion.
LAPTOP~$ gcloud compute ssh my-bastion --ssh-flag="-ND 8443" --tunnel-through-iap

In another terminal of my laptop:

## Without using the SOCKS proxy, this returns my laptop's home public IP:
LAPTOP~$ curl https://checkip.amazonaws.com
199.xxx.xxx.xxx

## Using the proxy, the same curl command above now returns a different IP address, 
## i.e., the IP of my-bastion. 
## Note: Although my-bastion doesn't have an external IP, I have a GCP Cloud NAT 
## for its subnet (for purpose unrelated to GKE or tunneling).
## Anyway, this NAT is handy as a demonstration for our curl command here.
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 curl -v --insecure https://checkip.amazonaws.com
* Uses proxy env variable HTTPS_PROXY == 'socks5://127.0.0.1:8443'  <<<=== Confirming it's using the proxy
...
* SOCKS5 communication to checkip.amazonaws.com:443
...
* TLSv1.2 (IN), TLS handshake, Finished (20):             <<<==== successful SSL handshake
...
> GET / HTTP/1.1
> Host: checkip.amazonaws.com
> User-Agent: curl/7.68.0
> Accept: */*
...
< Connection: keep-alive
<
34.xxx.xxx.xxx            <<<=== Returns the GCP Cloud NAT'ed IP address for my-bastion 

Finally, the moment of truth for kubectl:

## On laptop:
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 kubectl --insecure-skip-tls-verify=true get ns
NAME              STATUS   AGE
default           Active   3d10h
kube-system       Active   3d10h