I'm trying to save DataFrame with date type column to a parquet format to be used later in Athena. As far as I understand parquet has native DATE type, by the only type I can really use is datetime64[ns] with pyarrow engine (here is the same issue discussed https://github.com/pandas-dev/pandas/issues/20089). The issue is I'd like to have date type rather than datetime in Athena schema. Any suggestions?
0
votes
Change the column type of dataframe first and then dump it to parquet
– Shrey
If I keep the type as date, parquet schema saves it as null
– kismsu
In my project i have kept it as string in MM/DD/YYYY format.
– Shrey
I know I can do that, but It would be nice to avoid type casting down the line
– kismsu
Have you tried the latest version of Arrow. Looking at the Arrow's Pandas integration documentation it seems like datetime.date can now be round-tripped. And it appears there is support for storing date columns in parquet.
– Micah Kornfield