1
votes

Here's what I'm trying to accomplish

  1. A visitor lands on my website
  2. Javascript collects some information and sends a hit
  3. The hit is processed and inserted into BigQuery

And here's how I have planned to solve it

  1. The hit is sent to Cloud Functions HTTP trigger (using Ajax)
  2. Cloud Functions sends a message to Pub/Sub
  3. Pub/Sub sends data to another Cloud Function using a Pub/Sub trigger
  4. The second Cloud Function processes the hit into Biguery row and inserts it into BigQuery

Is there a simpler way to solve this?

Some other details to take into account

  • There are around 1 million hits a day
  • Don't want to use Cloud Dataflow because it inflates the costs
  • Can't (probably) skip Pub/Sub because some hits are sent when a person is leaving the site and the request might not have enough time to process everything.
1
I really don't think you'll need the pub/sub middle step. The amount of time to insert into a pub/sub is about the same as writing it to BQ.jimmartens
@jimmartens I guess the only way to find out is to set both up and run some tests. Just found this guide using similar setup, though medium.com/@ridwanfajar/…Silver Ringvee
What are you requirements? What do you want to optimize? Is the velocity (the duration between the hit and the write into BQ) is a concern? Is the cost is too high? Is the scalability is required (You target 10M hits in 12 months)? Today, your architecture is the most scalable, robust and resilient!guillaume blaquiere
Data should be available in BQ in a few minutes maximum. Hits need to make it to DB even if they happen on a click of a link that takes the user to a new page. The cost would get too high with Dataflow. 1 M hits a day so 365M hits a year.Silver Ringvee

1 Answers

0
votes

You can perform a Big Query streaming, this one is less expensive and you avoid reach the Load Jobs quotas 1000 per table per day.

Another option is if you don't mind that the data spend a lot of time loading, you can store all the info in a Cloud Storage bucket and then load all the data with a transfer. You can program it in order that data be uploaded daily. This solution is focus in a batch environment in which you will store all the info in one side and then you transfer it to the final destination. If you only want to streaming the solution that you mentioned is ok.

It’s up to you to choose the option that better fits to your specific usage.