For convenience considerations - 2 crawlers is the way to go.
For cost considerations - a hacky solution whould be:
Get the json table's CREATE TABLE
DDL from Athena using SHOW CREATE TABLE <json_table>;
command;
In the CREATE TABLE
DDL, Replace the table name and the SerDer from json to parquet. You don't need the other table properties from the original CREATE TABLE
DDL except LOCATION
.
Execute the new CREATE TABLE
DDL in Athena.
For example:
SHOW CREATE TABLE json_table;
Original DDL:
CREATE EXTERNAL TABLE `json_table`(
`id` int COMMENT,
`name` string COMMENT)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
...
LOCATION
's3://bucket_name/table_data'
...
New DDL:
CREATE EXTERNAL TABLE `parquet_table`(
`id` int COMMENT,
`name` string COMMENT)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
LOCATION
's3://bucket_name/table_data'
You can also do it in the same way with Glue api methods: get_table() > replace > create_table().
Notice - if you want to run it periodically you would need to wrap it in a script and scheduled it with another scheduler (crontab etc.) after the first crawler runs.