I created Google Dataproc cluster with 2 workers using n1-standard-4
VMs for master and workers.
I want to submit jobs on a given cluster and all jobs should run sequentially (like on AWS EMR), i.e., if first job is in running state then upcoming job goes to pending state, after completing first job, second job starts running.
I tried with submitting jobs on cluster but it run all jobs in parallel - no jobs went to pending state.
Is there any configuration that I can set in Dataproc cluster so all jobs will run sequentially?
Updated following files :
/etc/hadoop/conf/yarn-site.xml
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/etc/hadoop/conf/fair-scheduler.xml</value>
</property>
/etc/hadoop/conf/fair-scheduler.xml
<?xml version="1.0" encoding="UTF-8"?>
<allocations>
<queueMaxAppsDefault>1</queueMaxAppsDefault>
</allocations>
After that restart services using this command systemctl restart hadoop-yarn-resourcemanager
the above changes on master node. But still job running in parallel.