5
votes

I have a HDInsight Hadoop cluster (Linux, deployed separately) on Azure VNet (restricting client IPs using NSG).

Azure SQL firewall has an option called "Allow access to Azure services", which allows Data Factory to access Azure SQL.

In VNet there is no such option, you have to either specify IP addresses range or set a tag (Internet, Virtual Network, AzureLoadBalancer). I thought AzureLoadBalancer will solve the issue, but no - HDInsight is still hidden from Azure Data Factory.

I tried to find Data Factory port ranges, unsuccessfully.

Is there a way to access secured HDInsight Linux cluster from Azure Data Factory?

2

2 Answers

0
votes

ADF can access resources which otherwise can be accessed publicly. If your HDInsight cluster is in a VNet then it cannot be publicly accessed. So ADF cannot access + orchestrate it.

ADF does want to be supported in a VNet environment but that would take some time to land.

Thanks, Harish

4
votes

With Azure Data Factory V2 the Scenario is supported. It requires deployment of an Azure self-hosted integration runtime (IR) in the vnet of the HDInsight cluster. The self-hosted IR allows Data Factory service to dispatch processing requests to a compute service such as HDInsight inside a virtual network. See also the following Documentation.