4
votes

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.catalog.Catalog

There is an option parameter but I didn't find any sample that use it to pass the partitioned columns

1
Checked Spark sources. It looks like in Spark 2.4 and earlier it is still impossible to create partitioned tables using org.apache.spark.sql.catalog.Catalog. - Dmitry Y.
Thanks @DmitryY. I also checked and found only the option parameter ... Meanwhile I switched to raw SQL with spark.sql - Guy Cohen
I created SPARK-31001 to request that this ability be added. - Nick Chammas

1 Answers

3
votes

I believe it's not needed to specify partition columns if you don't provide a schema. In that case spark infers schema and partitioning from the location automatically. However it's not possible to provide both schema and partitioning with the current implementation, but fortunately all the code from underlying implementation is open thus i finished with the next method for creating external Hive tables.

  private def createExternalTable(tableName: String, location: String, 
      schema: StructType, partitionCols: Seq[String], source: String): Unit = {
    val tableIdent = TableIdentifier(tableName)
    val storage = DataSource.buildStorageFormatFromOptions(Map("path" -> location))
    val tableDesc = CatalogTable(
      identifier = tableIdent,
      tableType = CatalogTableType.EXTERNAL,
      storage = storage,
      schema = schema,
      partitionColumnNames = partitionCols,
      provider = Some(source)
    )
    val plan = CreateTable(tableDesc, SaveMode.ErrorIfExists, None)
    spark.sessionState.executePlan(plan).toRdd  
  }