I am trying to use Polybase in Azure SQLDW to access delimited files in HDFS in Cloudera Cluster (provisioned as IaaS in Azure) however I am running into the following error:
Msg 105019, Level 16, State 1, Line 40 EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist. Java exception message: Call From DB55/10.0.0.59 to 10...:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=10.56.140.31/10.56.140.31:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout: Error [Call From DB55/10.0.0.59 to 10.56.140.31:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=10.../10...*:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout] occurred while accessing external file.'
I used the following Polybase T-SQL scripts:
CREATE MASTER KEY;
CREATE DATABASE SCOPED CREDENTIAL HadoopUser1 WITH IDENTITY = 'user_name', Secret = 'password';
CREATE EXTERNAL DATA SOURCE MyHadoopCluster WITH (
TYPE = HADOOP,
LOCATION ='hdfs://10...:8020',
RESOURCE_MANAGER_LOCATION = '10...:8032',
CREDENTIAL = HadoopUser1 );CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DelimitedText, FORMAT_OPTIONS (FIELD_TERMINATOR = ',') );
CREATE EXTERNAL TABLE dbo.PolyCloudera ( Id INT NOT NULL, Number INT NOT NULL, ZIPCODE INT NOT NULL ) WITH ( LOCATION='/user/hive/warehouse/sample_poly', DATA_SOURCE=MyHadoopCluster, FILE_FORMAT=TextFile );
There is no connectivity problem between the Cloudera Cluster and Azure SQLDW (as I was able to use BCP, JDBC to ingest data from Cloudera Cluster to SQLDW instance) The Cloudera Cluster isn't kerberized yet. The user_name and password specified in database scoped credentials exist in the underlying Linux file system in Cloudera Cluster. Any help in this matter would be appreciated.