U-SQL Query Optimizer behavior

Question

Okay, here is what am doing. I have a U-SQL script which does the following.

Step 1. INSERT a record into a txn table 'A' say "PROCESSING STARTED", recording the start of Step 2. Step 2. Extract from a file Step 3. Insert into table 'B' using the rowset from Step 2. Step 4. INSERT a record into a txn table 'A' say "PROCESSING FINISHED", recording the successful execution of Step 2.

When I coded the above I was hoping above steps will execute in the mentioned order. To my surprise it was not, when I closely looked into the Algebra I came to understand that query optimizer shuffled all my tasks and it runs it as below.

All Extract
All Splits, Aggregates, Partitions
All Writes (if you notice there are 2 tables am inserting into)

So the question I have here is how do I ensure that Step 2, Step 3 executes only after Step 1 ? I am not bothered about Step 4 as of now. I could possibly run as below too but I was hoping there would be some other options. Job 1 (Step 1) Job 2 (Step 2, 3) Job 3 (Step 4)

Pls can you help out ?

Rukmani Gopalan Rukmani Gopalan · Accepted Answer · 2016-05-08T20:50:37

U-SQL is designed to optimize your query so it can be scaled out across multiple nodes - resulting in efficient execution of your query. What you are observing is by design, in your code, since there is no dependency between Steps 1 and 2, there is an opportunity for parallelizing their execution.

One option I can think of for you to execute them in a certain sequence is to introduce a dependency on a result from Step 1 in Step 2.

Having said that, if you are looking at a sequential execution pattern, I'm curious as to why you chose U-SQL (which is designed for massively parallized applications).

U-SQL Query Optimizer behavior

1 Answers