Inserting a pyspark dataframe to an existing partitioned hive table

Question

I have a hive table which is partitioned by column inserttime.

I have a pyspark dataframe which has the same columns as the table except for the partitioned column.

The following works well when the table is not partitioned:

df.insertInto('tablename',overwrite=True)

But I am not able to figure out how to insert to a particular partition from pyspark

Tried below:

 df.insertInto('tablename',overwrite=True,partition(inserttime='20170818-0831'))

but it did not work and failed with

SyntaxError: non-keyword arg after keyword arg

and I am using pyspark 1.6

Manu Gupta Manu Gupta · Accepted Answer · 2017-09-15T19:10:54

One option is:

df.registerTempTable('tab_name')
hiveContext.sql("insert overwrite table target_tab partition(insert_time=value) select * from tab_name ")

Another option is to add this static value as the last column of dataframe and try to use insertInto() as dynamic partition mode.