I would like to drop Databricks SQL DB tables, if the table was created more than 30 days ago. How do I get the table created datetime from databricks?
Thanks
Given a tableName, the easiest way to get the creation time is as follows:
import org.apache.spark.sql.catalyst.TableIdentifier
val createdAtMillis = spark.sessionState.catalog
.getTempViewOrPermanentTableMetadata(new TableIdentifier(tableName))
.createTime
getTempViewOrPermanentTableMetadata() returns CatalogTable that contains information such as:
CatalogTable(
Database: default
Table: dimension_npi
Owner: root
Created Time: Fri Jan 10 23:37:18 UTC 2020
Last Access: Thu Jan 01 00:00:00 UTC 1970
Created By: Spark 2.4.4
Type: MANAGED
Provider: parquet
Num Buckets: 8
Bucket Columns: [`npi`]
Sort Columns: [`npi`]
Table Properties: [transient_lastDdlTime=1578699438]
Location: dbfs:/user/hive/warehouse/dimension_npi
Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Schema: root
|-- npi: integer (nullable = true)
...
)
You can list all tables in a database using sessionCatalog.listTables(database).
There are alternative ways of accomplishing the same but with a lot more effort and risking errors due to Spark behavior changes: poking about table metadata using SQL and/or traversing the locations where tables are stored and looking at file timestamps. That's why it's best to go via the catalog APIs.
Hope this helps.
Assuming your DB table is delta:
You can use the DESCRIBE HISTORY <database>.<table> to retrieve all transactions made to that table, including timestamps. According to the databricks documentation - history is only retained for 30 days. Depending on how you plan to implement your solution that may just work.