I have been exploring the data lakehouse concept and Delta Lake. Some of its features seem really interesting. Right there on the project home page https://delta.io/ there is a diagram showing Delta Lake running on "your existing data lake" without any mention of Spark. Elsewhere it suggests that Delta Lake indeeds runs on top of Spark. So my question is, can it be run independently from Spark? Can I, for example, set up Delta Lake with S3 buckets for storage in Parquet format, schema validation etc, without using Spark in my architecture?
3
votes
2 Answers
3
votes
You might keep an eye on this: https://github.com/delta-io/delta-rs
It's early and currently read-only, but worth watching as the project evolves.