0
votes

I'm trying to save DataFrame with date type column to a parquet format to be used later in Athena. As far as I understand parquet has native DATE type, by the only type I can really use is datetime64[ns] with pyarrow engine (here is the same issue discussed https://github.com/pandas-dev/pandas/issues/20089). The issue is I'd like to have date type rather than datetime in Athena schema. Any suggestions?

1
Change the column type of dataframe first and then dump it to parquetShrey
If I keep the type as date, parquet schema saves it as nullkismsu
In my project i have kept it as string in MM/DD/YYYY format.Shrey
I know I can do that, but It would be nice to avoid type casting down the linekismsu
Have you tried the latest version of Arrow. Looking at the Arrow's Pandas integration documentation it seems like datetime.date can now be round-tripped. And it appears there is support for storing date columns in parquet.Micah Kornfield

1 Answers

2
votes

As mentioned in the comment I believe Apache Arrow 0.15.1 now supports round-tripping dates between Pandas and Parquet.