I am setting up a Data Flow in ADF that takes an Azure Table Dataset as Source, adds a Derived Column that adds a column with the name "filename" and a dynamic value, based on a data field from the source schema.
Then the output is sent to a sink that is linked to a DataSet that is attached to Blob Storage (tried ADLS Gen2 and standard Blob storage).
However, after executing the pipeline, instead of finding multiple files in my container, I see there are folders created with the name filename=ABC123.csv
that on its own contains other files (it makes me think of parquet files):
- filename=ABC123.csv
+ _started_UNIQUEID
+ part-00000-tid-UNIQUEID-guids.c000.csv
So, I'm clearly missing something, as I would need to have single files listed in the dataset container with the name I have specified in the pipeline.
This is how the pipeline looks like:
The Optimize tab of the Sink shape looks like this:
Here you can see the settings of the Sink shape:
And this is the code of the pipeline (however some parts are edited out):
source(output(
PartitionKey as string,
RowKey as string,
Timestamp as string,
DeviceId as string,
SensorValue as double
),
allowSchemaDrift: true,
validateSchema: false,
inferDriftedColumnTypes: true) ~> devicetable
devicetable derive(filename = Isin + '.csv') ~> setoutputfilename
setoutputfilename sink(allowSchemaDrift: true,
validateSchema: false,
rowUrlColumn:'filename',
mapColumn(
RowKey,
Timestamp,
DeviceId,
SensorValue
),
skipDuplicateMapInputs: true,
skipDuplicateMapOutputs: true) ~> distributetofiles
Any suggestions or tips? (I'm rather new to ADF, so bear with me)