3
votes

I'm trying to query a Hive view with Redshift Spectrum but it gives me this error:

SQL Error [500310] [XX000]: [Amazon](500310) Invalid operation: Assert
Details: 
 -----------------------------------------------
  error:  Assert
  code:      1000
  context:   loc->length() > 5 && loc->substr(0, 5) == "s3://" - 
  query:     12103470
  location:  scan_range_manager.cpp:272
  process:   padbmaster [pid=1769]
  -----------------------------------------------;

Is is possible to query Hive views from Redshift Spectrum? I'm using Hive Metastore (not Glue Data Catalog).

I wanted to have a view to restrict access to the original table, with a limited set of columns and partitions. And also because my original table (Parquet data) has some Map fields so I wanted to do something like that to make it easier to query from Redshift as Map fields are a bit complicated to deal with in Redshift:

CREATE view my_view AS
SELECT event_time, event_properties['user-id'] as user_id, event_properties['product-id'] as product_id, year, month, day
FROM my_events
WHERE event_type = 'my-event'  -- partition

I can query the table my_events from Spectrum but it's a mess because properties is a Map field, not a Struct so I need to kind of explode it into several rows in Redshift.

Thanks

1

1 Answers

1
votes

Looking at the error it seems Spectrum always looks for a S3 path when external tables and views are queried. This is valid for external tables because those will always have a location but views will never have an explicit S3 location.

Error type    -> Assert
Error context -> context: loc->length() > 5 && loc->substr(0, 5) == "s3://"

In case of a hive view, loc->length() will return 0, and the whole statement will return False and result in assertion error.

Confirmation for this could be the second clause:

loc->substr(0, 5) == "s3://"

It is expecting the location to be a S3 path and if we count number of chars in "s3://" it is 5, which also confirms the first clause :

loc->length() > 5

Looks like Spectrum does not support Hive Views (or in general any object without an explicit S3 path)