1
votes

Okay, here is what am doing. I have a U-SQL script which does the following.

Step 1. INSERT a record into a txn table 'A' say "PROCESSING STARTED", recording the start of Step 2. Step 2. Extract from a file Step 3. Insert into table 'B' using the rowset from Step 2. Step 4. INSERT a record into a txn table 'A' say "PROCESSING FINISHED", recording the successful execution of Step 2.

When I coded the above I was hoping above steps will execute in the mentioned order. To my surprise it was not, when I closely looked into the Algebra I came to understand that query optimizer shuffled all my tasks and it runs it as below.

  1. All Extract
  2. All Splits, Aggregates, Partitions
  3. All Writes (if you notice there are 2 tables am inserting into)

So the question I have here is how do I ensure that Step 2, Step 3 executes only after Step 1 ? I am not bothered about Step 4 as of now. I could possibly run as below too but I was hoping there would be some other options. Job 1 (Step 1) Job 2 (Step 2, 3) Job 3 (Step 4)

Pls can you help out ?

1

1 Answers

2
votes

U-SQL is designed to optimize your query so it can be scaled out across multiple nodes - resulting in efficient execution of your query. What you are observing is by design, in your code, since there is no dependency between Steps 1 and 2, there is an opportunity for parallelizing their execution.

One option I can think of for you to execute them in a certain sequence is to introduce a dependency on a result from Step 1 in Step 2.

Having said that, if you are looking at a sequential execution pattern, I'm curious as to why you chose U-SQL (which is designed for massively parallized applications).