I have an event stream from Firehose to S3 which then gets crawled by Glue to periodically create new partitions and update the schema for a table to be queried in Athena.
Suddenly our queries started to fail with
HIVE_BAD_DATA: Error parsing field value for field 1: For input string: "11642224428"
So we suspect that one of the fields that Glue initially inferred as INT
now has to be changed to BIGINT
. For some reason, the crawler didn't do that automatically, so we need to fix it.
The obvious attempt was to edit the table schema from struct<...,field:int,...>
to struct<...,field:bigint,...>
, but doing so results in:
HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'my_column' in table 'my_table' is declared as type 'struct<...,field:bigint,...>', but partition 'partition_0=2019/partition_1=12/partition_2=31/partition_3=23' declared column 'my_column' as type 'struct<...,field:int,...>'.
So our conclusion is that we would have to update the schema for all partitions by hand as well, but could not find how to do so without scripting it ourselves.
How can we proceed? Can we somehow tell the Crawler to do this for us, or is there some AWS API that could help us finish this cleanly?
If not, would iterating over all partitions doing the schema update solve our problem?