1
votes

I am trying to use Polybase in Azure SQLDW to access delimited files in HDFS in Cloudera Cluster (provisioned as IaaS in Azure) however I am running into the following error:

Msg 105019, Level 16, State 1, Line 40 EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist. Java exception message: Call From DB55/10.0.0.59 to 10...:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=10.56.140.31/10.56.140.31:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout: Error [Call From DB55/10.0.0.59 to 10.56.140.31:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=10.../10...*:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout] occurred while accessing external file.'

I used the following Polybase T-SQL scripts:

CREATE MASTER KEY;

CREATE DATABASE SCOPED CREDENTIAL HadoopUser1 WITH IDENTITY = 'user_name', Secret = 'password';

CREATE EXTERNAL DATA SOURCE MyHadoopCluster WITH (
TYPE = HADOOP,
LOCATION ='hdfs://10...:8020',
RESOURCE_MANAGER_LOCATION = '10.
..:8032',
CREDENTIAL = HadoopUser1 );

CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DelimitedText, FORMAT_OPTIONS (FIELD_TERMINATOR = ',') );

CREATE EXTERNAL TABLE dbo.PolyCloudera ( Id INT NOT NULL, Number INT NOT NULL, ZIPCODE INT NOT NULL ) WITH ( LOCATION='/user/hive/warehouse/sample_poly', DATA_SOURCE=MyHadoopCluster, FILE_FORMAT=TextFile );

There is no connectivity problem between the Cloudera Cluster and Azure SQLDW (as I was able to use BCP, JDBC to ingest data from Cloudera Cluster to SQLDW instance) The Cloudera Cluster isn't kerberized yet. The user_name and password specified in database scoped credentials exist in the underlying Linux file system in Cloudera Cluster. Any help in this matter would be appreciated.

1

1 Answers

3
votes

PolyBase in Azure SQL DW only supports blob storage and Azure Data Lake Store currently. It doesn't support connecting to HDFS currently.

Note the documentation does not mention Azure SQL DW on any hdfs examples here:

-- PolyBase only: Hadoop cluster as data source
-- (on SQL Server 2016) CREATE EXTERNAL DATA SOURCE data_source_name
WITH (
TYPE = HADOOP, LOCATION = 'hdfs://NameNode_URI[:port]'
[, RESOURCE_MANAGER_LOCATION = 'ResourceManager_URI[:port]' ]
[, CREDENTIAL = credential_name ] ) [;]