Delta Lake Create Table with structure like another

Question

I have a bronze level delta lake table(events_bronze) at location "/mnt/events-bronze" to which data is streamed from kafka. Now I want to be able to stream from this table and update using "foreachBatch" into a silver table(events_silver". This can be achieved using bronze table as a source. However, during the initial run since events_silver doesn't exist, I keep getting error saying Delta table doesn't exist which is obvious. So how do I go about creating events_silver which has the same structure as events_bronze? I couldn't find a DDL to do the same.

def upsertToDelta(microBatchOutputDF: DataFrame, batchId: Long) {
  DeltaTable.forPath(spark, "/mnt/events-silver").as("silver")
    .merge(
      microBatchOutputDF.as("bronze"),
      "silver.id=bronze.id")
    .whenMatched().updateAll()
    .whenNotMatched().insertAll()
    .execute()
}
 events_bronze
      .writeStream
      .trigger(Trigger.ProcessingTime("120 seconds"))
      .format("delta")
      .foreachBatch(upsertToDelta _)
      .outputMode("update")
      .start()

During initial run, the problem is that there is no delta lake table defined for path "/mnt/events-silver". I'm not sure how to create it having the same structure as "/mnt/events-bronze" for the first run.

Swapnil Chougule Swapnil Chougule · Accepted Answer · 2020-06-18T18:48:11

Before starting stream write/merge, check whether table is already exists. If not create one using empty dataframe & schema (of events_bronze)

  val exists = DeltaTable.isDeltaTable("/mnt/events-silver")

  if (!exists) {
    val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], <schema of events_bronze>)
    emptyDF
      .write
      .format("delta")
      .mode(SaveMode.Overwrite)
      .save("/mnt/events-silver")
  }

Table(delta lake metadata) will get created only one time at the start and if it doesn't exist. In case of job restart and all, it will be present & skip table creation

Delta Lake Create Table with structure like another

4 Answers