I need some suggestions for one of the use cases I am working on.
Use Case:
We have data in Cloud SQL around 5-10 tables, some are treated as lookup and others transactional. We need to get this to BigQuery in a way to make 3-4 tables(Flattened, Nested or Denormalized) out of these which will be used for reporting in Data Studio, Looker, etc.
Data should be processed incrementally and changes in Cloud SQL could happen every 5 min, which means that data should be available to BigQuery reporting in near real-time. If possible we should also maintain the history of Data change.
Solution:
We kept 3 layers in BigQuery, so data from Cloud SQL comes to the first layer, then after flattening we save it to the second layer (which maintains history), and then processes it to insert in the third layer(Reporting). The problem with this approach is that it was developed for daily processing and can not help in processing every 5 min change of data in Cloud SQL.
Any suggestion is welcome.
Note:- We would require to use Dataflow, VM or some other way to do some logic processing, means we can not use only scheduled query as we can not use query to process all logic, but yes mix of view and code can be used.