30
votes

We use Kubernetes Jobs for a lot of batch computing here and I'd like to instrument each Job with a monitoring sidecar to update a centralized tracking system with the progress of a job.

The only problem is, I can't figure out what the semantics are (or are supposed to be) of multiple containers in a job.

I gave it a shot anyways (with an alpine sidecar that printed "hello" every 1 sec) and after my main task completed, the Jobs are considered Successful and the kubectl get pods in Kubernetes 1.2.0 shows:

NAME                                         READY     STATUS      RESTARTS   AGE
    job-69541b2b2c0189ba82529830fe6064bd-ddt2b   1/2       Completed   0          4m
    job-c53e78aee371403fe5d479ef69485a3d-4qtli   1/2       Completed   0          4m
    job-df9a48b2fc89c75d50b298a43ca2c8d3-9r0te   1/2       Completed   0          4m
    job-e98fb7df5e78fc3ccd5add85f8825471-eghtw   1/2       Completed   0          4m

And if I describe one of those pods

State:              Terminated
  Reason:           Completed
  Exit Code:        0
  Started:          Thu, 24 Mar 2016 11:59:19 -0700
  Finished:         Thu, 24 Mar 2016 11:59:21 -0700

Then GETing the yaml of the job shows information per container:

  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: 2016-03-24T18:59:29Z
      message: 'containers with unready status: [pod-template]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    containerStatuses:
    - containerID: docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937
      image: luigi-reduce:0.2
      imageID: docker://sha256:5a5e15390ef8e89a450dac7f85a9821fb86a33b1b7daeab9f116be252424db70
      lastState: {}
      name: pod-template
      ready: false
      restartCount: 0
      state:
        terminated:
          containerID: docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937
          exitCode: 0
          finishedAt: 2016-03-24T18:59:30Z
          reason: Completed
          startedAt: 2016-03-24T18:59:29Z
    - containerID: docker://3d2b51436e435e0b887af92c420d175fafbeb8441753e378eb77d009a38b7e1e
      image: alpine
      imageID: docker://sha256:70c557e50ed630deed07cbb0dc4d28aa0f2a485cf7af124cc48f06bce83f784b
      lastState: {}
      name: sidecar
      ready: true
      restartCount: 0
      state:
        running:
          startedAt: 2016-03-24T18:59:31Z
    hostIP: 10.2.113.74
    phase: Running

So it looks like my sidecar would need to watch the main process (how?) and exit gracefully once it detects it is alone in the pod? If this is correct, then are there best practices/patterns for this (should the sidecar exit with the return code of the main container? but how does it get that?)?

** Update ** After further experimentation, I've also discovered the following: If there are two containers in a pod, then it is not considered successful until all containers in the pod return with exit code 0.

Additionally, if restartPolicy: OnFailure is set on the pod spec, then any container in the pod that terminates with non-zero exit code will be restarted in the same pod (this could be useful for a monitoring sidecar to count the number of retries and delete the job after a certain number (to workaround no max-retries currently available in Kubernetes jobs)).

1
This is by no means an elegant solution, but I think you could set up a liveness probe on your sidecar that actually probes the main container. Then, when the main container goes down, the probe will fail and kubelet will kill the sidecar.Tim Allclair

1 Answers

8
votes

You can use the downward api to figure out your own podname from within the sidecar, and then retrieving your own pod from the apiserver to lookup exist status. Let me know how this goes.