I am having issues reading ORC table created in Azure HDInsight from Azure SQL Data warehouse. See below sequence I have followed to set up HDInsight instance as well as Azure SQL Data warehouse.
- I created a text file(Sales.txt) with content as below
20150614|1|10.50
20150618|1|100.75
20150924|1|89.75
20160214|2|10456.90
20150922|3|34.70
20151021|3|43.70
20151225|3|65.90
20151231|3|87.50
20160101|4|1349.40
20160512|4|3982.40
20150608|5|398.90
20150929|5|981.80
20151225|5|482.80
20151231|5|289.60
20160214|5|688.50
20160531|5|682.80
I then created a HDInsight cluster in Azure
I created a database
hadoopdb
I created an external table in that database using following query
CREATE EXTERNAL TABLE IF NOT EXISTS hadoopdb.salestxt ( salesdate int, productid int, salesamt float ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' lines terminated by '\n' STORED AS TEXTFILE LOCATION 'wasbs:///<myclustercontainer>@<mystorageaccount>.blob.core.windows.net/Sales.txt';
Loaded data using following query
LOAD DATA INPATH '/Sales.txt' INTO TABLE hadoopdb.salestxt;
Then created an internal ORC table with following script
CREATE TABLE IF NOT EXISTS hadoopdb.sales ( salesdate int, productid int, salesamt float ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC;
Moved data to ORC table with following query
INSERT OVERWRITE TABLE hadoopdb.salesorc SELECT * FROM hadoopdb.salestxt
Executing following query, I do see results in Hive Editor
select * from hadoopdb.sales
Back in Azure SQL data warehouse, created the DATA SOURCE using following query in SSMS,
CREATE EXTERNAL DATA SOURCE ORCAzureStorage WITH ( TYPE = HADOOP, LOCATION = 'wasbs://<myclustercontainer>@<mystorageaccount>.blob.core.windows.net', CREDENTIAL = AzureStorageCredential
);
Created an external file format using following
CREATE EXTERNAL FILE FORMAT ORCdefault WITH ( FORMAT_TYPE = ORC, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.DefaultCodec' );
Created schema
[hdp]
Trying to create an EXTERNAL TABLE with script below
CREATE EXTERNAL TABLE [hdp].FactSales ( salesdate int, productid int, salesamt float ) WITH ( LOCATION ='/Sales.txt', DATA_SOURCE = ORCAzureStorage , FILE_FORMAT = ORCdefault );
However this query fails with following error
Msg 105002, Level 16, State 1, Line 1 EXTERNAL TABLE access failed because the specified path name '/Sales.txt/' does not exist. Enter a valid path and try again.
I have tried various combinations for LOCATION such as '[email protected]/Sales.txt/Sales.txt'
Sales.txt file exists in storage account at location '[email protected]/Sales.txt/Sales.txt'.
Am I missing anything ?
LOCATION ='/Sales.txt/'
– Biz. Nigatu