Submiting the same oozie workflow job multiple times at the same time

Question

I am wondered how oozie handle conflicts(if there really exists) when I submit two same workflow job(just the Oozie sample examples) at the same time. I can submit the same two job successful and oozie server return two different jobId.In Oozie Web Console, I saw status of two job are all RUNNING, then all SUCCEEDED after some time. My workflow.xml as followers:

<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
    <start to="mr-node"/>
    <action name="mr-node">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/${wf:user()}/mapreduce_test/output-data"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>default</value>
                </property>
                <property>
                    <name>mapred.mapper.class</name>
                    <value>org.apache.oozie.example.SampleMapper</value>
                </property>
                <property>
                    <name>mapred.reducer.class</name>
                    <value>org.apache.oozie.example.SampleReducer</value>
                </property>
                <property>
                    <name>mapred.map.tasks</name>
                    <value>1</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>/user/${wf:user()}/mapreduce_test/input</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>/user/${wf:user()}/mapreduce_test/output-data/</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

I know in the "prepare" label delete output directory helps make the action repeatable and enables retries after failure, I also understand the basic action run model.

So, My questions are:

The same two jobs are really running concurrently?(I saw the two all in running state in oozie web console).
Is there exists write conflict?(two same job point one output directory)

You can run multiple workflows, doing different things, on different jobTrackers, and all labeled FooBar. Oozie does not care. — Samson Scharfrichter
But... if you use a Coordinator to run your Workflow, then you can define concurrency rules (e.g. only 1 at a time, FIFO). — Samson Scharfrichter

YoungHobbit YoungHobbit · Accepted Answer · 2016-01-08T06:34:03

Oozie does not detect any job duplication or anything like it. It accepts the workflow jobs and schedule them on the cluster for execution and monitor till the completion or failure.

The same two jobs are really running concurrently?(I saw the two all in running state in oozie web console).

Yes. Both the jobs will be running concurrently.

Is there exists write conflict?(two same job point one output directory)

Oozie does not have any checks related to write conflicts. I guess these are taken care by either map reduce or hdfs framework.

Submiting the same oozie workflow job multiple times at the same time

2 Answers