1
votes

Summarize the problem

I've seeing particularly slow performance out of Azure Data Factory. Searching for similar questions on StackOverFlow turns up nothing except for the advice to contact support. I'm rolling the dice here to see if anyone has seen something similar and knows how to fix it.

In short, every operation I try in ADF results in excruciatingly slow performance. This includes:

  • Extracting a zip in blob storage to blob storage
  • Copying a number of small compressed files into Azure Data Explorer
  • Copying a number of small uncompressed json files into Azure Data Explorer

Extracting ZIP

Copying to ADX

In both cases the performance is in the kilobytes per second range. In both cases the copy/import will eventually work but it can take hours.

Describe what you've tried

I've tried:

  • using different regions
  • creating and using my own Integration Runtime
  • playing with different parameters that could potentially affect performance such as parallel connections etc.
  • Contacting Microsoft support (who sent me here)

Show some code

Not really any code to share. To reproduce just try extracting a zip to and from blob storage. I get ~400KB/s.

In summary, any advice would be gratefully received. If I can't get this bit working I have to implement a the ingestion factory manually, which on reflection sounds like fun than I've been having with ADF.

1
Can you show us your Compression type and Compression level both in source and sink? I think the number of files and directory depth affect the copy speed. - Joseph Xu
The compression type is zip for the source and no compression for the sink. Thanks for the tip about the depth. The source zip has many 'deep' folders with about 100 files in each of 1000 folders. Not much I can do about that as that is just the way the file is presented. I'll try doing just a straight copy - no decompress step and see if that affects the speed. - user3030107
We look forward to your test results. I think it will be faster. - Joseph Xu
Unfortunately it made no difference. In my test I tried copying a large number of small files but without any compression on the source or the sink (binary source and binary dest). The operation took around an hour for 3gb. I think I'll suspend any further work on ADF for a while and try again later when hopefully these issues have been solved. Thanks again for your help. - user3030107
What about using a Foreach activity. First,we can use GetMetaData activity to get the file list, then foreach the file list. Inside the Foreach activity, we set a copy activity. So we can Copy files in parallel. This may be work. - Joseph Xu

1 Answers

0
votes

Thease 'deep' folders will affect copy speed. We should minimize the depth and increase the amount of copy activity. You can reference this document to troubleshoot copy activity performance. Or you can send a feedback to Microsoft Azure.