0
votes

I will start directly with my question: Is it possible to connect to an on Premise Dataset in a Azure Data Factory Custom Activity via Data Management Gateway?

The usecase:

I have a local FTP Server, with multiple files on it, which should be regulary copied to an Azure Blob Storage. The decision which files have to be uploaded should be based on a custom trigger. Therefore I am building a Data Factory custom activity, instead of a normal copy activity. The server should only be accessed via the data management gateway

I created an FTPServer typed linked service, and a FileShare Dataset. This I used inside my CustomActivity Pipeline as Input.

Inside my custom activity, I now want to connect to the input dataset and get the content of a folder inside the ftp server. And here I am not getting forward. I don´t get how to connect to the ftp server. Here is the code element I am stuck at:

public IDictionary<string, string> Execute(IEnumerable<LinkedService> linkedServices,IEnumerable<Dataset> datasets,
                                           Activity activity,IActivityLogger logger)
{
    Dataset inputDataset = datasets.Single(dataset => dataset.Name == activity.Inputs.Single().Name);

    FileShareDataset inputTypeProperties = inputDataset.Properties.TypeProperties as FileShareDataset;

    FtpServerLinkedService inputLinkedService = linkedServices.First(linkedService =>
    linkedService.Name == 
    inputDataset.Properties.LinkedServiceName).Properties.TypeProperties
    as FtpServerLinkedService;


    ***//HERE CONNECTION TO FTPSERVER VIA GATEWAY***
}

There are multiple Gateway classes in Microsoft.Azure.Management.DataFactories.Models but i can´t find a way to work with them. For me it looks like that the data management gateway is maybe not supported for a custom activity? If that is correct, are there some other ways to go, like creating a copy activity inside the custom activity or things like that? Or is the only possible solution a direct connection to the FTP-Server via WebRequest?

1

1 Answers

0
votes

To start directly with an answer: no :-)

The problem you'll have is the context in which the custom activity code is executed. Once you've created your class to do some amount of work the compiled DLL's are placed in blob storage (and zipped). Remember ADF is only an orchestration tool to invoke other Azure services, it has none of its own compute. Therefore it instructs the Azure Batch Service to take the DLL's from blob storage and execute the code. This means that the application actually runs on a virtual machine(s) within the Batch Service compute pool. As a result the execution is basically disconnected from ADF and not aware of things like a Data Management Gateway (DMG).

That said, I understand the confusion here because the custom activity can inspect and use values from the configuration of the ADF linked services (your code above). But not a DMG. What you really want is a the custom activity class to programmatically establish a VPN connection between it (the batch service VM) and your local network, then execute some code via the tunnel. Again, disconnected from ADF. This of course would never be allowed within Azure. Setting up a VPN for a standard VM and virtual private network is bad enough!

So, what to do? My suggestion would be to use a normal ADF copy activity with the DMG to land everything in blob storage first. Call it your IN folder. Once there have a second ADF pipeline and use a the custom activity to inspect/sort the data. Call it your CLEAN folder. Finally pass it on to a downstream service.

Or, get the custom activity to hit the FTP site directly and not dilute the data flow with the DMG.

Hope this helps.