0
votes

I am processing a huge amount of small JSON files with Azure Data Lake Analytics and I want to save the result into multiple JSON files (if it is needed) with max size (e.g. 128MB)

It this possible?

I know, that there is an option to write custom outputter, but it writes row by row only, thus I have no info about whole file size. (I guess).

There is FILE.LENGTH() property in U-SQL, which gives me the size of each extracted file. Is it possible to use it to repeatedly call output with different files and pass to it only files that fit my size limit?

Thank you for help

1

1 Answers

0
votes

Here is an example of what you can do with FILE.LENGTH.

@yourData = 
  EXTRACT 
          // ... columns to extract
        , file_size = FILE.LENGTH()
  FROM "/mydata/{*}" //input files path
  USING Extractors.Csv();

@res =
  SELECT *
  FROM @yourData 
  WHERE file_size < 100000;  //Your file size