0
votes

I have a delta table with 4 versions.

DESCRIBE HISTORY cfm ---> has 4 versions. 0,1,2,3.

I want to delete version 3 or 2. How can I achieve this?

i tried

from delta.tables import *
from pyspark.sql.functions import *

deltaTable = DeltaTable.forPath(spark, "path of cfm files")

deltaTable.delete("'version' = '3'") 

This does not delete the version 3. https://docs.delta.io/0.4.0/delta-update.html says

"delete removes the data from the latest version of the Delta table but does not remove it from the physical storage until the old versions are explicitly vacuumed"

If i have to run vacuum command how to use them on latest dates and not older dates.

1

1 Answers

1
votes

You need to use the vaccum command to perform this operation. However the default retention for vaccum is for 7 days and it will error out if you are trying to vaccum anything within 7 days.

We can work around this by setting a spark configuration that will bypass the default retention period check.

solution below:

from delta.tables import *

spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")
deltaTable = DeltaTable.forPath(spark, deltaPath)
deltaTable.vacuum(24)

*deltaPath -- is the path for your delta table

*24 -- indicates the number of hours up until which your versioning is retained , any versions which were created beyond 24 hours in the past would get deleted.

Hope this answers your question.