I want to use delta lake on Hadoop cluster using pyspark. I haven't found any installation guide to use delta lake apart from below.
pyspark --packages io.delta:delta-core_2.11:0.1.0 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
I have 2 questions :
- What's latest version of delta lake (<0.7) compatible with Apache spark 2.4.3 ? I know it should be 2.11 scala version.
- How to install delta lake package on Hadoop cluster ?
Thanks in advance.