1
votes

I want to create a hive table with partitions.

The schema for the table is:

val schema = StructType(StructField(name,StringType,true),StructField(age,IntegerType,true))

I can do this with Spark-SQL using:

val query = "CREATE TABLE some_new_table (name string, age integer) USING org.apache.spark.sql.parquet OPTIONS (path '<some_path>') PARTITIONED BY (age)"

spark.sql(query)

When I try to do with Spark API (using Scala), the table is filled with data. I only want to create an empty table and define partitions. This is what I am doing, what I am doing wrong :

val df = spark.createDataFrame(sc.emptyRDD[Row], schema)

val options = Map("path" -> "<some_path>", "partitionBy" -> "age")

df.sqlContext().createExternalTable("some_new_table", "org.apache.spark.sql.parquet", schema, options);

I am using Spark-2.1.1.

1

1 Answers

1
votes

If you skip partitioning. can try with saveAsTable:

spark.createDataFrame(sc.emptyRDD[Row], schema)
  .write
  .format("parquet")
  //.partitionBy("age")
  .saveAsTable("some_new_table")

Spark partitioning and Hive partitioning are not compatible, so if you want access from Hive you have to use SQL: https://issues.apache.org/jira/browse/SPARK-14927