I have many doubts related to Spark + Delta.
1) Databricks propose 3 layers (bronze, silver, gold), but in which layer is recommendable to use for Machine Learning and why? I suppose they propose to have the data clean and ready in the gold layer.
2) If we abstract the concepts of these 3 layers, can we think the bronze layer as a Data Lake, the silver layer as databases, and the gold layer as a data warehouse? I mean in terms of functionality, .
3) Delta architecture is a commercial term, or is an evolution of Kappa Architecture, or is a new trending architecture as Lambda and Kappa architecture? What are the differences between (Delta + Lambda Architecture) versus Kappa Architecture?
4) In many cases Delta + Spark scale a lot more than most databases for usually much cheaper, and if we tune things right, we can get almost 2x faster queries results. I know is pretty complicated to compare the actual trending data warehouses versus the Feature/Agg Data Store, but I would like to know how can I make this comparison?
5) I used to use Kafka, Kinesis, or Event Hub for streaming process, and my question is what kind of problems can happens if we replace these tools by a Delta Lake table (I already know that everything depends of many things, but I would like to have a general vision of that).