1
votes

This option exists in Spark, and I saw that pyarrow's write_table() accepts **kwargs, but following up the .pyx, I couldn't trace it to stuff like min/max.

Is this supported, and if so, how is it achieved?

1

1 Answers

1
votes

pyarrow already writes the min/max statistics for Parquet files by default. There is no option for that in pyarrow as the underlying parquet-cpp library writes them always. At the time of writing, only min and max are written. The other statistics can neither be provided nor are computed on-the-fly with parquet-cpp. When you require them, you should open an issue in (Py)Arrow's issue tracker and considering contributing the the missing code for that.