When using control flow, it's possible to use a GetMetadata activity to retrieve a list of files in a blob storage account and then pass that list to a for each activity where the Sequential flag is false to process all files concurrently (in parallel) up to the max batch size according to the activities defined in the for each loop.
However, when reading about data flows in the following article from Microsoft (https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-column-pattern), they indicate the following:
A mapping data flow will execute better when the Source transformation iterates over multiple files instead of looping via the For Each activity. We recommend using wildcards or file lists in your source transformation. The Data Flow process will execute faster by allowing the looping to occur inside the Spark cluster. For more information, see Wildcarding in Source Transformation.
For example, if you have a list of data files from July 2019 that you wish to process in a folder in Blob Storage, below is a wildcard you can use in your Source transformation.
DateFiles/_201907.txt
By using wildcarding, your pipeline will only contain one Data Flow activity. This will perform better than a Lookup against the Blob Store that then iterates across all matched files using a ForEach with an Execute Data Flow activity inside.
Based on this finding, I have configured a data flow task where the source is a blob directory of files and it processes all files in that directory with no control loops. I do not, however, see any options to process files concurrently within the data flow. I do, however, see an Optimize tab where you can set the partitioning option.
Is this option only for processing a single large file into multiple threads or does this control how many files it processes concurrently within the directory where the source is pointing?
Is the documentation assuming the for each control loop is set to "Sequential" (I can see why that would be true if it was, but having a hard time believing it if it's running one file at a time in the data flow)?