Spark Partitioning Hive Table

Question

I am trying to partition the hive table with distinct timestamps. I have a table with timestamps in it but when I execute the hive partition query, it says that it is not a valid partition column. Here's the table:

+---+-----------------------+
|id |rc_timestamp           |
+---+-----------------------+
|1  |2017-06-12 17:18:39.824|
|2  |2018-06-12 17:18:39.824|
|3  |2019-06-12 17:18:39.824|
+---+-----------------------+

spark.sql("SET hive.exec.dynamic.partition.mode=nonrestrict")

val tempTable  = spark.sql("SELECT * FROM partition_table")

val df = tempTable.select("rc_timestamp")

val a = x.toString().replaceAll("[\\[\\]]","")

df.collect().foreach(a => {
  spark.sql(s"ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION 
  (rc_timestamp = '$a')").show()
)}

Here's the error which I'm getting:

org.apache.spark.sql.AnalysisException: rc_timestamp is not a valid partition column 
in table mydb.partition_table.;

spark.sql("CREATE TABLE part_table (id INT, rc_timestamp timestamp)") — Incognito
Please create a table with partition. Refer this to create a partitioned table cwiki.apache.org/confluence/display/Hive/… — Constantine
I already have a table with millions of records in the original table so I was trying to alter the table to add partitions. Isn't this the valid syntax though: "ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION (rc_timestamp = '$a')" — Incognito
Its a valid syntax. But you cant use it on a non partitioned table — Constantine

Ram Ghadiyaram Ram Ghadiyaram · Accepted Answer · 2019-06-13T16:04:10

First thing is check your syntaxes using this InsertSuite test case specially this

AFAIK you need msck repair or refresh table

spark.sql(s"refresh table tableNameWhereYouAddedPartitions")

what it does is it will refresh the existing partitions.

you can go with spark.sql('MSCK REPAIR TABLE table_name')

There is something called recoverPartitions (Only works with a partitioned table, and not a view). This is aliased version of msck repair table. you can go ahead and try this..

see this ddl.scala seems like its equalent by documentation.
example usage :

spark.catalog.recoverPartitions(tableName)

Note: The RECOVER PARTITIONS clause automatically recognizes any data files present in these new directories, the same as the REFRESH statement does.

Spark Partitioning Hive Table

2 Answers