1
votes

I want to use delta lake on Hadoop cluster using pyspark. I haven't found any installation guide to use delta lake apart from below.

pyspark --packages io.delta:delta-core_2.11:0.1.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

I have 2 questions :

  • What's latest version of delta lake (<0.7) compatible with Apache spark 2.4.3 ? I know it should be 2.11 scala version.
  • How to install delta lake package on Hadoop cluster ?

Thanks in advance.

1
For compatibility, take a look at the page docs.delta.io/latest/releases.html. Seems like versions <0.7.0 are compatible with Spark 2.4.4+, and 0.7.0 - with Spark 3.0Rayan Ral

1 Answers

0
votes

The latest version of Delta that supports Spark 2.4.3 is 0.6.1 (github branch), use --packages io.delta:delta-core_2.11:0.6.1 and it should work out of box.