0
votes

I am having issues reading ORC table created in Azure HDInsight from Azure SQL Data warehouse. See below sequence I have followed to set up HDInsight instance as well as Azure SQL Data warehouse.

  1. I created a text file(Sales.txt) with content as below

20150614|1|10.50
20150618|1|100.75
20150924|1|89.75
20160214|2|10456.90
20150922|3|34.70
20151021|3|43.70
20151225|3|65.90
20151231|3|87.50
20160101|4|1349.40
20160512|4|3982.40
20150608|5|398.90
20150929|5|981.80
20151225|5|482.80
20151231|5|289.60
20160214|5|688.50
20160531|5|682.80

  1. I then created a HDInsight cluster in Azure

  2. I created a database hadoopdb

  3. I created an external table in that database using following query

    CREATE EXTERNAL TABLE IF NOT EXISTS hadoopdb.salestxt
    (
        salesdate int,
        productid int,
        salesamt float
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
    lines terminated by '\n' STORED AS TEXTFILE 
    LOCATION 'wasbs:///<myclustercontainer>@<mystorageaccount>.blob.core.windows.net/Sales.txt';
    
  4. Loaded data using following query

    LOAD DATA INPATH '/Sales.txt' INTO TABLE hadoopdb.salestxt;

  5. Then created an internal ORC table with following script

    CREATE TABLE IF NOT EXISTS hadoopdb.sales
    (
        salesdate int,
        productid int,
        salesamt float
    ) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC;
    
  6. Moved data to ORC table with following query

    INSERT OVERWRITE TABLE hadoopdb.salesorc SELECT * FROM hadoopdb.salestxt

  7. Executing following query, I do see results in Hive Editor

    select * from hadoopdb.sales

  8. Back in Azure SQL data warehouse, created the DATA SOURCE using following query in SSMS,

    CREATE EXTERNAL DATA SOURCE ORCAzureStorage WITH 
    (
        TYPE = HADOOP,
        LOCATION = 'wasbs://<myclustercontainer>@<mystorageaccount>.blob.core.windows.net',
        CREDENTIAL = AzureStorageCredential
    

    );

  9. Created an external file format using following

    CREATE EXTERNAL FILE FORMAT ORCdefault
    WITH 
    (   
        FORMAT_TYPE = ORC,
        DATA_COMPRESSION = 'org.apache.hadoop.io.compress.DefaultCodec'
    );
    
  10. Created schema [hdp]

  11. Trying to create an EXTERNAL TABLE with script below

    CREATE EXTERNAL TABLE [hdp].FactSales 
    (
        salesdate int,
        productid int,
        salesamt float  
    )
    WITH 
    (
        LOCATION    ='/Sales.txt',
        DATA_SOURCE = ORCAzureStorage , 
        FILE_FORMAT = ORCdefault                
    );
    

However this query fails with following error

Msg 105002, Level 16, State 1, Line 1 EXTERNAL TABLE access failed because the specified path name '/Sales.txt/' does not exist. Enter a valid path and try again.

I have tried various combinations for LOCATION such as '[email protected]/Sales.txt/Sales.txt'

Sales.txt file exists in storage account at location '[email protected]/Sales.txt/Sales.txt'.

Am I missing anything ?

1
You are missing a "/" at the end for your external table location make it LOCATION ='/Sales.txt/'Biz. Nigatu

1 Answers

2
votes

Azure SQL DW does not currently support querying HDFS. You will need to move the data to a Windows Azure Storage Blob. There is a tutorial for this here

Thanks! Casey