6
votes

Currently – we do our data loads from Hadoop on-premise server to SQL DW [ via ADF Staged Copy and DMG on-premise server]. We noticed that ADF pipelines are failing – when there are no files in the Hadoop on-premise server location [ we do not expect our upstreams to send the files everyday and hence its valid scenario to have ZERO files on Hadoop on-premise server location ].

Do you have a solution for this kind of scenario ?

Error message given below

Failed execution Copy activity encountered a user error: ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot find the 'HDFS' file. ,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (404) Not Found.,Source=System,'.

Thanks, Aravind

3

3 Answers

2
votes

This Requirement can be solved by using the ADFv2 Metadata Task to check for file existence and then skip the copy activity if the file or folder does not exist:

https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity

0
votes

Do you have Input DataSet for your pipeline? See if you can skip your Input Dataset dependency..

0
votes

Mmmm, this is a tricky one. I'll up vote the question I think.

Couple of options that I can think of here...

1) I would suggest the best way would be to create a custom activity ahead of the copying to check the source directory first. This could handle the behaviour if there isn't a file present, rather than just throwing an error. You could then code this to be a little more graceful when it returns and not block the downstream ADF activities.

2) Use some PowerShell to inspect the ADF activity for the missing file error. Then simply set the dataset slice to either skipped or ready using the cmdlet to override the status.

For example:

Set-AzureRmDataFactorySliceStatus `
    -ResourceGroupName $ResourceGroup `
    -DataFactoryName $ADFName.DataFactoryName `
    -DatasetName $Dataset.OutputDatasets `
    -StartDateTime $Dataset.WindowStart `
    -EndDateTime $Dataset.WindowEnd `
    -Status "Ready" `
    -UpdateType "Individual"

This of course isn't ideal, but would be quicker to develop than a custom activity using Azure Automation.

Hope this helps.