0
votes

I have several avro files stored in an S3 bucket, and have them crawled by AWS Glue using AvroSerde, glue indexes files perfectly.

When I use AWS athena to query the table like this: select * from mytable I get all results as expected and the "values"column is displayed as [23.4,345.6] for example.

However when I try to select the "values" column (or try to use any array function on it) I get no results and the query tab in Athena shows a "FAiled" icon, however I get no error whatsoever.

Example query that fails: select values from mytable

The output in Athena of describe mytable;

timestamp               bigint                  from deserializer   
values                  array<float>            from deserializer   
account                 string                                      
year                    string                                      
month                   string                                      
day                     string                                      
hour                    string                                      
device                  string                                      
type                    string                                      

# Partition Information      
# col_name              data_type               comment             

account                 string                                      
year                    string                                      
month                   string                                      
day                     string                                      
hour                    string                                      
device                  string                                      
type                    string   

The table definition in AWS Glue:

Name         mytable
Database     new_formats_test
Classification  avro
Location     s3://mytable/
Input format    org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
Output format   org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
Serde serialization lib org.apache.hadoop.hive.serde2.avro.AvroSerDe
Serde parameters    
avro.schema.literal {"type":"record","name":"ArchiveDataEntry","namespace":"com.epihunter.data.archive","fields":[{"name":"timestamp","type":"long"},{"name":"values","type":{"type":"array","items":"float"}}]}serialization.format 1
Table properties    
sizeKey 18525162objectCount 1UPDATED_BY_CRAWLER Scan S3 new formats
avro.schema.literal {"type":"record","name":"ArchiveDataEntry","namespace":"com.epihunter.data.archive","fields":[{"name":"timestamp","type":"long"},{"name":"values","type":{"type":"array","items":"float"}}]}
1

1 Answers

0
votes

To answer my own question: values is a reserved keyword which caused the failure. Either using double quotes like "values" or obviously renaming the column fixes the issue