Reading array from avro file using AWS athena give no results and unknown error

Question

I have several avro files stored in an S3 bucket, and have them crawled by AWS Glue using AvroSerde, glue indexes files perfectly.

When I use AWS athena to query the table like this: select * from mytable I get all results as expected and the "values"column is displayed as [23.4,345.6] for example.

However when I try to select the "values" column (or try to use any array function on it) I get no results and the query tab in Athena shows a "FAiled" icon, however I get no error whatsoever.

Example query that fails: select values from mytable

The output in Athena of describe mytable;

timestamp               bigint                  from deserializer   
values                  array<float>            from deserializer   
account                 string                                      
year                    string                                      
month                   string                                      
day                     string                                      
hour                    string                                      
device                  string                                      
type                    string                                      

# Partition Information      
# col_name              data_type               comment             

account                 string                                      
year                    string                                      
month                   string                                      
day                     string                                      
hour                    string                                      
device                  string                                      
type                    string

The table definition in AWS Glue:

Name         mytable
Database     new_formats_test
Classification  avro
Location     s3://mytable/
Input format    org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
Output format   org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
Serde serialization lib org.apache.hadoop.hive.serde2.avro.AvroSerDe
Serde parameters    
avro.schema.literal {"type":"record","name":"ArchiveDataEntry","namespace":"com.epihunter.data.archive","fields":[{"name":"timestamp","type":"long"},{"name":"values","type":{"type":"array","items":"float"}}]}serialization.format 1
Table properties    
sizeKey 18525162objectCount 1UPDATED_BY_CRAWLER Scan S3 new formats
avro.schema.literal {"type":"record","name":"ArchiveDataEntry","namespace":"com.epihunter.data.archive","fields":[{"name":"timestamp","type":"long"},{"name":"values","type":{"type":"array","items":"float"}}]}

Niki Driessen Niki Driessen · Accepted Answer · 2018-08-13T11:21:28

To answer my own question: values is a reserved keyword which caused the failure. Either using double quotes like "values" or obviously renaming the column fixes the issue

Reading array from avro file using AWS athena give no results and unknown error

1 Answers