2
votes

I am trying to delete a delta lake table that was created using writestream. i tried drop table but it fails

#table created as
df.writestream().outputmode("append").format("delta").start("/mnt/mytable")

#attempt to drop table
spark.sql("drop table '/mnt/mytable'")
2

2 Answers

5
votes
DROP TABLE IF EXISTS <unmanaged-table>    // deletes the metadata
dbutils.fs.rm("<your-s3-path>", true)   // deletes the data

DROP TABLE <managed-table> // deletes the metadata and the data

You need to specify the data to delete the data in an unmanaged table to because with an unmanaged table; Spark SQL only manages the meta data and you control the data location. With managed tables, Spark SQL manages both the metadata and the data and the data is stored in Databricks file system (DBFS) in your account. Thus, to delete an unmanaged table's data, you need to specify the path to the data.

0
votes

Make sure you get your schema right because even if you drop the table the data will still reside in the path thats defined in your DDL. So if you re-run it will infer the past schema. In that case you might wanna drop your files or have a Visual on them using %fs ls /mnt/data/blah/blah/blah and drop them if you know what yo are doing using %fs rm -r /mnt/data/that/blah/path/here .