Is there any way to append a new column to an existing parquet file?
I'm currently working on a kaggle competition, and I've converted all the data to parquet files.
Here was the case, I read the parquet file into pyspark DataFrame, did some feature extraction and appended new columns to DataFrame with
pysaprk.DataFrame.withColumn().
After that, I want to save the new columns in the source parquet file.
I know Spark SQL come with Parquet schema evolution, but the example only have shown the case with a key-value.
The parquet "append" mode doesn't do the trick either. It only append new rows to the parquet file. If there's anyway to append a new column to an existing parquet file instead of generate the whole table again? Or I have to generate a separate new parquet file and join them on the runtime.