I have tried to read a lot about databricks delta lake. From what I understand it adds ACID transactions to your data storage and accelerated query performance with a delta engine. If so, why do we need other data lakes which do not support ACID transactions? Delta lakes claims to combine both worlds of data lakes and data warehouse, we know that it can not replace a traditional data warehouse yet due to its current support of operations. But should it replace data lakes? Why the need to have two copies of data - one in data lake and one in delta lake?
0
votes
Delta Lake is a type of data lakes. Do you mean some specific data lake product when saying data lake?
- zsxwing
hi, yes, I mean will delta lake replace other data lakes without these sort of capabilities of ACID etc, like amazon s3, azure blob storage etc?
- user13128577
Some people may call cloud storages such as amazon s3, azure blob storage as data lakes. But in my opinion, they are storages more similar to file systems in the single machine world. Delta Lake is actually built on top of them to store the raw files and metadata. Questions like this usually get opinion-based answers and are discouraged by Stack Overflow. It's better to ask this in the project's mailing list, such as groups.google.com/forum/#!forum/delta-users
- zsxwing
2 Answers
1
votes
Delta Lake is a product (like Redshift) rather than a concept/approach/theory (like dimensional modelling). As with any product in any walk of life, some of the claims made for the product will be true and some will be marketing spin. Whether the claimed benefits for a product actually make it superior to an alternative product will change from use case to use case.
Asking why there are other data lake solutions besides Delta Lake is a bit like asking why there is more than one DBMS in the world.