1
votes

I tried to use K8s to setup spark cluster (I use standalone deployment mode, and I cannot use k8s deployment mode for some reason)

I didn't set any cpu related arguments.

for spark, that means:

Total CPU cores to allow Spark applications to use on the machine (default: all available); only on worker

http://spark.apache.org/docs/latest/spark-standalone.html

for k8s pods, that means:

If you do not specify a CPU limit for a Container, then one of these situations applies:

  • The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running.

  • The Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.

https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/

...
Addresses:
  InternalIP:  172.16.197.133
  Hostname:    ubuntu
Capacity:
 cpu:     4
 memory:  3922Mi
 pods:    110
Allocatable:
 cpu:     4
 memory:  3822Mi
 pods:    110
...

But my spark worker only use 1 core (I have 4 cores on the worker node and the namespace has no resource limits).

That means the spark worker pod only used 1 core of the node (which should be 4).

How can I write yaml file to set the pod to use all available cpu cores?

Here is my yaml file:

---

apiVersion: v1
kind: Namespace
metadata:
  name: spark-standalone

---

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: spark-slave
  namespace: spark-standalone
  labels:
    k8s-app: spark-slave
spec:
  selector:
    matchLabels:
      k8s-app: spark-slave
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      name: spark-slave
      namespace: spark-standalone
      labels:
        k8s-app: spark-slave
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/edge
                operator: Exists
      hostNetwork: true
      containers:
        - name: spark-slave
          image: spark:2.4.3
          command: ["/bin/sh","-c"]
          args: 
            - "
              ${SPARK_HOME}/sbin/start-slave.sh
              spark://$(SPARK_MASTER_IP):$(SPARK_MASTER_PORT)
              --webui-port $(SPARK_SLAVE_WEBUI_PORT)
              &&
              tail -f ${SPARK_HOME}/logs/*
              "
          env:
            - name: SPARK_MASTER_IP
              value: "10.4.20.34"
            - name: SPARK_MASTER_PORT
              value: "7077"
            - name: SPARK_SLAVE_WEBUI_PORT
              value: "8081"

---
2
What other pods do you have on the node?Jonas
On the worker node, there is no other pods @JonasGreen
Then it sounds like your workload is not executing in parallelJonas

2 Answers

2
votes

Kubernetes - No upper bound

The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running

Unless you confgure a limit on CPU for your pod, it can use all available CPU resources on the node.

Consider dedicated nodes

If you are running other workload on the same node, they also consume CPU resources, and may be guaranteed CPU resources if they have configured request for CPU. Consider use a dedicated node for your workload using NodeSelector and Taints and Tolerations.

Spark - No upper bound

You configure the slave with parameters to the start-slave.sh e.g. --cores X to limit CPU core usage.

Total CPU cores to allow Spark applications to use on the machine (default: all available); only on worker

Multithreaded workload

In the end, if pod can use multiple CPU cores depends on how your application uses threads. Some things only uses a single thread, so the application must be designed for multithreading and have something parallelized to do.

0
votes

I hit absolutely the same issue with a Spark worker. Knowing the fact, that Java sometimes failed to calculate CPU correctly, I tried to specify the CPU request or limit in Pod spec and the worker automatically understood what the environment it is. And needed cores will be assigned to the Spark worker executor.

Also, I faced this behavior in k8s only. In Docker Swarm, all available CPU cores had been taken by a worker.

What's more, in default templates for 'cores' parameters for Spark worker 1 core is mentioned. I believe it could be taken in case of the wrong calculation of CPU cores.