As explained in previous answers, the ideal way to change the verbosity of a Spark cluster is changing the corresponding log4j.properties. However, on dataproc Spark runs on Yarn, therefore we have to adjust the global configuration and not /usr/lib/spark/conf
Several suggestions:
On dataproc we have several gcloud commands and properties we can pass during cluster creation. See documentation Is it possible to change the log4j.properties under /etc/hadoop/conf by specifying
--properties 'log4j:hadoop.root.logger=WARN,console'
Maybe not, as from the docs:
The --properties command cannot modify configuration files not shown above.
Another way would be to use a shell script during cluster init and run sed:
# change log level for each node to WARN
sudo sed -i -- 's/log4j.rootCategory=INFO, console/log4j.rootCategory=WARN, console/g'\
/etc/spark/conf/log4j.properties
sudo sed -i -- 's/hadoop.root.logger=INFO,console/hadoop.root.logger=WARN,console/g'\
/etc/hadoop/conf/log4j.properties
But is it enough or do we need to change the env variable hadoop.root.logger as well?