I am working with parquet files stored on Amazon S3. These files need to be extracted and the data from it needs to be loaded into Azure Data Warehouse.
My plan is:
Amazon S3 -> Use SAP BODS to move parquet files to Azure Blob -> Create External tables on those parquet files -> Staging -> Fact/ Dim tables
Now the problem is that in one of the parquet files there is a column that is stored as an array<string>. I am able to create external table on it using varchar data type for that column but if I perform any sql query operation (i.e. Select) on that external table then it throws below error.
Msg 106000, Level 16, State 1, Line 3
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: optional group status (LIST) {
repeated group bag {
optional binary array_element (UTF8);}
} is not primitive
I have tried different data types but unable to run select query on that external table.
Please let me know if there are any other options.
Thanks