1
votes

I am using the delta lake oss version 0.8.0.

Let's assume we calculated aggregated data and cubes using the raw data and saved the results in a gold table using delta lake.

My question is, is there a well known way to access these gold table data and deliver them to a web dashboard for example?

In my understanding, you need a running spark session to query a delta table.
So one possible solution could be to write a web api, which executes these spark queries.
Also you could write the gold results in a database like postgres to access it, but that seems just duplicating the data.

Is there a known best practice solution?

1

1 Answers

1
votes

The real answer depends on your requirements regarding latency, number of requests per second, amount of data, deployment options (cloud/on-prem, where data located - HDFS/S3/...), etc. Possible approaches are:

  1. Have the Spark running in the local mode inside your application - it may require a lot of memory, etc.
  2. Run Thrift JDBC/ODBC server as a separate process, and access data via JDBC/ODBC
  3. Read data directly using the Delta Standalone Reader library for JVM, or via delta-rs library that works with Rust/Python/Ruby