We are developing a simulation software which is deployed and scaled between multiple pods using kubernetes. When a user makes a simulation request, a pod is selected which starts doing the job and is considered as busy. When another user makes a simulation request, it should be routed to the next free pod. Currently, a busy pod is often selected (even though there are free ones) as kubernetes does not know which pods are busy/free.
Is it possible to balance requests in such way that a free pod is always selected? (Assuming that each app instance inside a pod exposes an HTTP endpoint which tells it's current busy/free status)