3
votes

I am trying to partition the hive table with distinct timestamps. I have a table with timestamps in it but when I execute the hive partition query, it says that it is not a valid partition column. Here's the table:

+---+-----------------------+
|id |rc_timestamp           |
+---+-----------------------+
|1  |2017-06-12 17:18:39.824|
|2  |2018-06-12 17:18:39.824|
|3  |2019-06-12 17:18:39.824|
+---+-----------------------+
spark.sql("SET hive.exec.dynamic.partition.mode=nonrestrict")

val tempTable  = spark.sql("SELECT * FROM partition_table")

val df = tempTable.select("rc_timestamp")

val a = x.toString().replaceAll("[\\[\\]]","")

df.collect().foreach(a => {
  spark.sql(s"ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION 
  (rc_timestamp = '$a')").show()
)}

Here's the error which I'm getting:

org.apache.spark.sql.AnalysisException: rc_timestamp is not a valid partition column 
in table mydb.partition_table.;
2
Please could you post the table definition ?Constantine
spark.sql("CREATE TABLE part_table (id INT, rc_timestamp timestamp)")Incognito
Please create a table with partition. Refer this to create a partitioned table cwiki.apache.org/confluence/display/Hive/…Constantine
I already have a table with millions of records in the original table so I was trying to alter the table to add partitions. Isn't this the valid syntax though: "ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION (rc_timestamp = '$a')"Incognito
Its a valid syntax. But you cant use it on a non partitioned tableConstantine

2 Answers

2
votes

First thing is check your syntaxes using this InsertSuite test case specially this

AFAIK you need msck repair or refresh table

spark.sql(s"refresh table tableNameWhereYouAddedPartitions")

what it does is it will refresh the existing partitions.

you can go with spark.sql('MSCK REPAIR TABLE table_name')

There is something called recoverPartitions (Only works with a partitioned table, and not a view). This is aliased version of msck repair table. you can go ahead and try this..

see this ddl.scala seems like its equalent by documentation.
example usage :

spark.catalog.recoverPartitions(tableName) 

Note: The RECOVER PARTITIONS clause automatically recognizes any data files present in these new directories, the same as the REFRESH statement does.

1
votes

You cannot change the partitioning scheme on a Hive table. This would have to rewrite the complete dataset since partitions are mapped to folders in HDFS/S3/FileSystem.

If you want to change partition scheme, the only options is to create a new table and give partitioning information in the create table command. After that you have to insert data into new table from the old table. You can also use the CTAS command for the same.

ALTER TABLE mydb.partition_table ADD IF NOT EXISTS PARTITION <(rc_timestamp = '$a')> - command only adds new partitions in the metastore for an existing partitioned Hive table. For example, let say you have a table T1 which is partitioned on column year. If you want to make metastore aware about "year=2018", then this command is used.