I have a dataframe which I am using to insert into an existing partitioned hive table using spark sql (using dynamic partitioning). Once dataframe has been written, I would like to know what are the partitions my dataframe has just created in hive.
I could query the dataframe for distinct partitions but it takes very long time as it has to start the entire lineage of the dataframe.
I could persist the dataframe before writing to hive, so that, write operation and disctinct partition_column operation happens on top of cached dataframe. But my dataframe is extremely large and dont want to be spending more time in persisting.
I know all the partition information is stored in Hive Metastore. Are there any metastore apis in spark that could help retrieve only the new partitions that were created?