1
votes

I have a hive table which is partitioned by column inserttime.

I have a pyspark dataframe which has the same columns as the table except for the partitioned column.

The following works well when the table is not partitioned:

df.insertInto('tablename',overwrite=True)

But I am not able to figure out how to insert to a particular partition from pyspark

Tried below:

 df.insertInto('tablename',overwrite=True,partition(inserttime='20170818-0831'))

but it did not work and failed with

SyntaxError: non-keyword arg after keyword arg

and I am using pyspark 1.6

2

2 Answers

1
votes

One option is:

df.registerTempTable('tab_name')
hiveContext.sql("insert overwrite table target_tab partition(insert_time=value) select * from tab_name ")

Another option is to add this static value as the last column of dataframe and try to use insertInto() as dynamic partition mode.

1
votes

you can use df.write.mode("overwrite").partitionBy("inserttime").saveAsTable("TableName")

or you can overwrite the value in the partition itself.

df.write.mode(SaveMode.Overwrite).save("location/inserttime='20170818-0831'")

Hope this helps.