3
votes

I have configured standalone spark cluster connected to Cassandra cluster with 1 master, 1 slave and Thrift server which is used as JDBC connector for Tableau application. Slave appears in workers list anyway when I launch any query worker does not seem to be used as executor (0 cores used). All workload is executed master executor. Also in Thrift web console I observe that only one executor is active.

Basically I expect distributed workload on both executors of spark cluster to achieve higher performance.

From master logs:

2019-03-26 15:36:52 INFO Master:54 - I have been elected leader! New state: ALIVE 2019-03-26 15:37:00 INFO Master:54 - Registering worker worker-ip:37678 with 16 cores, 61.8 GB RAM

From worker logs:

2019-03-26 15:37:00 INFO Worker:54 - Successfully registered with master spark://master-hostname:7077

My spark-defaults.conf is:

spark.driver.memory=50g
spark.driver.maxResultSize=4g

spark.sql.thriftServer.incrementalCollect=false
spark.sql.shuffle.partition=17
spark.sql.autoBroadcastJoinThreshold=10485760
spark.sql.inMemoryColumnarStorage.compressed=true
spark.sql.inMemoryColumnarStorage.batchSize=10000

spark.cores.max=32
spark.executor.cores=16
spark.memory.offHeap.enabled=true
spark.memory.offHeap.size=1g

pic1 workers

enter image description here

pic2 executors

enter image description here

Any help highly appreciated.

1
If you don't get enough responses, consider updating the question. I personally find it somewhat hard to read. -- I would recommend at minimum clear paragraphs for: 1. What exactly you do and what you expect to see 2. What you see instead 3. What you have tried to investigate and troubleshoot the situationDennis Jaheruddin
Thanks for helping, I did edit little bit my question, so hope its clearer nowstebetko
How did you start thrift-server?D3V
I use start-thriftserver.sh: sbin/start-thriftserver.sh --packages datastax:spark-cassandra-connector:2.4.0-s_2.11stebetko

1 Answers

3
votes

When spark does not execute on the workers, there are a few primary suspects to eliminate.

  1. Do you see the worker in the Web UI?
  2. Is the firewall allowing you to send the actual workload and get the response back? See this existing answer for more details.
  3. Does the slave have enough free resources to accept the job? I notice you are requiring 16 cores, perhaps that is more than is available?
  4. Is the capacity needed? Consider submitting multiple jobs in parallel (that require executers, and with sufficiently small resource requirements) to ensure it is not just 'coincidentally' avoiding the node. Keep going till you see it really does not fit on your master node.

If all these fail, more context may be needed.

  • You don't share any error messages, is there really no error anywhere?
  • What kind of cluster are you using (Hadoop, Mesos?)