I am using the OSS version of delta lake along with spark 3.0.1. My current use-case requires me to discover all the current partitions in a given delta table.
My data is stored in './data/raw'
and is partitioned by the column sensorId
(the path mentioned is relative path to my python script).
I am trying to use the SHOW PARTITIONS
syntax as mentioned in the documentation. However, I am getting errors.
This is how my code looks like:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("TestScript").getOrCreate()
df=spark.sql("SHOW PARTITIONS delta.`./data/raw`")
df.show()
The spark-submit command looks as follows:
spark-submit --packages io.delta:delta-core_2.12:0.8.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" test_script.py
And I get the following error:
pyspark.sql.utils.AnalysisException: Database 'delta' not found;
My other question related to this is whether SHOW PARTITIONS
will give me all the partitions or does it put a limit on the result. If there is a limit, what is the best way to discover/get all the partitions of a delta table.