0
votes

I have created a Managed table in U-SQL and loaded data into the table. When i try reading from it, its showing status "preparing" for about 3 hours and cancelled by Yarn.

I tried Rebuild table command and same scenario for it too.

It has Audit data, When ever i process a file from Data lake i am keeping audit details into that table. like File Name, Location, record count. till now i have processed around 36,000 files. When i try to use for final audit report, its keep preparing for 3 hours and being cancelled by Yarn

1
Where is the data coming from, what kind of data is it and how are you loading it into the table?Peter Bons
@srinadhreddy.. add the commented information into your question. This way you improve the question quality a lot and prevent downvoting > removal from SO.ZF007

1 Answers

2
votes

Please provide more information:

  1. How do you load the data into the table?
  2. How are you reading the files?
  3. Are you using the FastFileSetV2dot5 preview feature as suggested in the release notes?

UPDATE:

Based on the statement of "processed around 36k files", I assume that you insert each file individually into the table. This is not recommended and leads to table fragmentation which then in turn will have the preparation phase run out of time during code generation. Since you already have 36k table fragments, you should drop the table, and do a single INSERT from an EXTRACT over the 36k files specified in a file set using the fast file set preview feature I mention above. That way you can avoid this problem.

Once you loaded the data, you need to rebuild the table or partition to avoid later fragmentation.

We are working on improving scalability and add more features around rebuilding fragmented tables, but they will not come before 2nd half of this year the earliest. So it is important that you avoid such fragmentation.