Why does an Azure SQL Data Warehouse Polybase Query to Azure Data Lake Gen 2 return many rows for a single file source, but zero rows for the parent folder source?
I created:
- Master Key (CREATE MASTER KEY;)
- Credential (CREATE DATABASE SCOPED CREDENTIAL) - uses the ADLS Gen 2 account key
- External data source (CREATE EXTERNAL DATA SOURCE)
- File format (CREATE EXTERNAL FILE FORMAT)
- External table (CREATE EXTERNAL TABLE)
Everything works fine when my external table points to a specific file, i.e.
CREATE EXTERNAL TABLE [ext].[Time]
(
[TimeID] int NOT NULL,
[HourNumber] tinyint NOT NULL,
[MinuteNumber] tinyint NOT NULL,
[SecondNumber] tinyint NOT NULL,
[TimeInSecond] int NOT NULL,
[HourlyBucket] varchar(15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
)
WITH
(
LOCATION = '/Time/time001.txt',
DATA_SOURCE = ADLSDataSource,
FILE_FORMAT = uncompressedcsv,
REJECT_TYPE = value,
REJECT_VALUE = 2147483647
);
SELECT * FROM [ext].[Time];
Many rows returned, therefore I am confident all items mentioned above are configured correctly.
The Time folder in Azure Data Lake Gen 2 contains many files, not just time001.txt. When I change my external table to point at a folder, and not an individual file, the query returns zero rows, i.e.
CREATE EXTERNAL TABLE [ext].[Time]
(
[TimeID] int NOT NULL,
[HourNumber] tinyint NOT NULL,
[MinuteNumber] tinyint NOT NULL,
[SecondNumber] tinyint NOT NULL,
[TimeInSecond] int NOT NULL,
[HourlyBucket] varchar(15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
)
WITH
(
LOCATION = '/Time/',
DATA_SOURCE = ADLSDataSource,
FILE_FORMAT = uncompressedcsv,
REJECT_TYPE = value,
REJECT_VALUE = 2147483647
);
SELECT * FROM [ext].[Time];
Zero rows returned
I tried:
- LOCATION = '/Time/',
- LOCATION = '/Time',
- LOCATION = 'Time/',
- LOCATION = 'Time',
But always zero rows. I also followed the instructions at https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-store
I tested all the files within the folder and individually each returns many rows of data.
I queried all the files from Blob storage and not ADLS Gen2 and the "Folder" query returns all rows as expected.
How do I query all files in a folder "as one" from Azure Data Lake Gen2 storage using Azure SQL Data Warehouse and Polybase?