I am processing data in U-SQL but not getting expected results. Here is what I am doing:
1- Select data from ADL table partitions and assign it to @data1
2- Aggregate data using Group BY and assign it to @data2
3- Truncate partitions
4- Insert data(produced in step 2) into the same table
5- Use @data2 and generate a unique GUID for every record using user
defined function and assign it to @data2
//UDF Code
public static Guid GetNewGuid ()
{
return Guid.NewGuid ();
}
6- Select few columns from @data2 and assign it to @data3
Strangely GUIDs in @data2 and @data3 are totally different.
If I perform some joins with other datasets and change schema in Step 5 and then generate unique GUIDs then I get same GUIDS at last step. It looks like some script optimization is happening in the backend that is creating this problem.
Could you please let me know what is wrong happening in above workflow? Or if some sort of optimization is happening in the backend then how to learn how script optimization works.
Update: In this question, my focus is to learn why something calculated on one step is automatically changed in next step.