I think I have a pretty simple scenario: I need to auto-scale on Google Kubernetes Engine with a pod that runs one per node and uses all available remaining resources on the node.
"Remaining" resources means that there are certain basic pod services running on each node such logging and metrics, which need their requested resources. But everything left should go to this particular pod, which is in fact the main web service for my cluster.
Also, these remaining resources should be available when the pod's container starts up, rather than through vertical autoscaling with pod restarts. The reason is that the container has certain constraints that make restarts sort of expensive: heavy disk caching, and issues with licensing of some 3rd party software I use. So although certainly the container/pod is restartable, I'd like to avoid except for rolling updates.
The cluster should scale nodes when CPU utilization gets too high (say, 70%). And I don't mean requested CPU utilization of a node's pods, but rather the actual utilization, which is mainly determined by the web service's load.
How should I configure the cluster for this scenario? I've seen there's cluster auto scaling, vertical pod autoscaling, and horizontal pod autoscaling. There's also Deployment vs DaemonSet, although it does not seem that DaemonSet is designed for pods that need to scale. So I think Deployment may be necessary, but in a way that limits one web service pod per node (pod anti affinity??).
How do I put all this together?