Where does Delta Lake store the table metadata info. I am using spark 2.6(Not Databricks) on my standalone machine. My assumption was that if I restart spark, the table created in delta lake spark will be dropped(trying from Jupyter notebook). But it is not the case.
1 Answers
There are two types of tables in Apache Spark: external tables and managed tables. When creating a table using LOCATION keyword in the CREATE TABLE statement, it's an external table. Otherwise, it's a managed table and its location is under the directory specified by the Spark SQL conf spark.sql.warehouse.dir. Its default value is the spark-warehouse directory in the current work directory
Besides the data, Spark also needs to store the table metadata into Hive Metastore, so that Spark can know where is the data when a user uses the table name to query. Hive Metastore is usually a database. If a user doesn't specify a database for Hive Metastore, Spark will use en embedded database called Derby to store the table metadata on the local file system.
DROP TABLE command has different behaviors depending on the table type. When a table is a managed table, DROP TABLE will remove the table from Hive Metastore and delete the data. If the table is an external table, DROP TABLE will remove the table from Hive Metastore but still keep the data on the file system. Hence, the data files of an external table needs to be deleted from the file system manually by the user.
locationin thecreate tablestatement). If so, you need to rundrop tableto delete the table from metastore (drop tabledoesn't delete the folder used by an external table), and also delete the table folder manually. - zsxwingspark-warehousein your current work directory. - zsxwing