1
votes

I am trying to create a polybase external table in Azure Data warehouse for the Hive table which is stored as ORC(HD insight Data lake ) and partitioned by date when i query the external table its failing in SQL server without any proper error message.

When i don't use partition i am able to access hive ORC table using external table , but with partition i am getting below error without any proper error message.

Error:

Msg 106000, Level 16, State 1, Line 33 Index: 23, Size: 23

Detail:

Hive Table:

CREATE EXTERNAL TABLE DL_ClickStream_fnl(

.

.) PARTITIONED BY(TIME_STAMP Date) STORED AS ORC;

Azure DW External Table:

CREATE EXTERNAL TABLE [stg].[dl_clickstream_procd] (

.

.

) WITH (DATA_SOURCE = [AzureDataLakeStore_clusters],LOCATION = N'clusters/BLMSpark/hive/warehouse/coremetrics.db/dl_clickstream_procd' ,FILE_FORMAT = [ORCFileFormat_DL],REJECT_TYPE = VALUE,REJECT_VALUE = 2)

File Format:

CREATE EXTERNAL FILE FORMAT [ORCFileFormat_DL] WITH (FORMAT_TYPE = ORC)

1
Hmm, doesn't Azure SQL Data Warehouse only support Polybase connectivity to Azure Blob Storage and Azure Data Lake at this point? Check out the yellow arrow here. SQL Server 2016 and APS can connect to Hadoop. Subtle huh.wBob

1 Answers

0
votes

In Hive the partitioned column is a virtual column derived off of the directory metadata. PolyBase reads the files directly is unable to populate the partitioned column. In your SQL DW External Table DDL, remove the partitioned by column and your load should work.