I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue.
I have found this solution, How to configure monopolistic FIFO application queue in YARN? , but as I'm always creating a new cluster, I needed to automatize this. I have added this to cluster creation:
"softwareConfig": {
"properties": {
"yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
"yarn:yarn.scheduler.fair.user-as-default-queue":"false",
"yarn:yarn.scheduler.fair.allocation.file":"$HADOOP_CONF_DIR/fair-scheduler.xml",
}
}
with another line in init action script:
sudo echo "<allocations><queueMaxAppsDefault>1</queueMaxAppsDefault></allocations>" > /etc/hadoop/conf/fair-scheduler.xml
and the cluster tells me this when I fetch its config:
'softwareConfig': {
'imageVersion': '1.2.27',
'properties': {
'capacity-scheduler:yarn.scheduler.capacity.root.default.ordering-policy': 'fair',
'core:fs.gs.block.size': '134217728',
'core:fs.gs.metadata.cache.enable': 'false',
'distcp:mapreduce.map.java.opts': '-Xmx4096m',
'distcp:mapreduce.map.memory.mb': '5120',
'distcp:mapreduce.reduce.java.opts': '-Xmx4096m',
'distcp:mapreduce.reduce.memory.mb': '5120',
'hdfs:dfs.datanode.address': '0.0.0.0:9866',
'hdfs:dfs.datanode.http.address': '0.0.0.0:9864',
'hdfs:dfs.datanode.https.address': '0.0.0.0:9865',
'hdfs:dfs.datanode.ipc.address': '0.0.0.0:9867',
'hdfs:dfs.namenode.http-address': '0.0.0.0:9870',
'hdfs:dfs.namenode.https-address': '0.0.0.0:9871',
'hdfs:dfs.namenode.secondary.http-address': '0.0.0.0:9868',
'hdfs:dfs.namenode.secondary.https-address': '0.0.0.0:9869',
'mapred-env:HADOOP_JOB_HISTORYSERVER_HEAPSIZE': '3840',
'mapred:mapreduce.job.maps': '189',
'mapred:mapreduce.job.reduce.slowstart.completedmaps': '0.95',
'mapred:mapreduce.job.reduces': '63',
'mapred:mapreduce.map.cpu.vcores': '1',
'mapred:mapreduce.map.java.opts': '-Xmx4096m',
'mapred:mapreduce.map.memory.mb': '5120',
'mapred:mapreduce.reduce.cpu.vcores': '1',
'mapred:mapreduce.reduce.java.opts': '-Xmx4096m',
'mapred:mapreduce.reduce.memory.mb': '5120',
'mapred:mapreduce.task.io.sort.mb': '256',
'mapred:yarn.app.mapreduce.am.command-opts': '-Xmx4096m',
'mapred:yarn.app.mapreduce.am.resource.cpu-vcores': '1',
'mapred:yarn.app.mapreduce.am.resource.mb': '5120',
'spark-env:SPARK_DAEMON_MEMORY': '3840m',
'spark:spark.driver.maxResultSize': '1920m',
'spark:spark.driver.memory': '3840m',
'spark:spark.executor.cores': '8',
'spark:spark.executor.memory': '37237m',
'spark:spark.yarn.am.memory': '640m',
'yarn:yarn.nodemanager.resource.memory-mb': '81920',
'yarn:yarn.resourcemanager.scheduler.class': 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler',
'yarn:yarn.scheduler.fair.allocation.file': '$HADOOP_CONF_DIR/fair-scheduler.xml',
'yarn:yarn.scheduler.fair.user-as-default-queue': 'false',
'yarn:yarn.scheduler.maximum-allocation-mb': '81920',
'yarn:yarn.scheduler.minimum-allocation-mb': '1024'
}
},
The file fair-scheduler.xml also contains the specified code (everything is in one line, but I don't think this could be the problem)
After all this, the cluster still acts like if the capacity scheduler was in charge. No idea why. Any recommendation would help. Thanks.