We use Kubernetes cronjobs on GKE (version 1.9) for running several periodic tasks. From the pods, we need to make several calls to external API outside our network. Often (but not all the time), these calls fail because of DNS resolution timeouts.
The current hypothesis I have is that the upstream DNS server for the service we are trying to contact is rate limiting the requests where we make lots of repeated DNS requests because the TTL for those records was either too low or just because we dropped those entries from dnsmasq cache due to low cache size.
I tried editing the kube-dns deployment to change the cache size and ttl arguments passed to dnsmasq container, but the changes get reverted because it's a managed deployment by GKE. Is there a way to persist these changes so that GKE does not overwrite them? Any other ideas to deal with dns issues on GKE or Kubernetes engine in general?