0
votes

I am thinking to use Google Big Query to store realtime call records involving around 3 million rows per day inserted and never updated.

I have signed up for a trial account and ran some tests

I have few concerns before i can go ahead with development

  1. When streaming data via PHP it takes around 10-20 minutes sometime to get loaded on my tables and this is a show stopper for us because network support engineers need this data updated realtime to troubleshoot quality issues

  2. Partitions, we can store data in partitions divided for each day but that also involves one partition being 2.5 GB on any given day and that shoots my costs to query data in range of thousands per month. Is there any other way to bring down cost here? We can store data partitioned per hour but there is no such support available.

If not BigQuery what other solutions are out there in market which can deliver similar performance and can solve these problems ?

1
Are you sure your costs projections were correct? Usually thousands in month means you process between 200TB and 2000TB? Did you really projected between these numbers? - Pentium10
as if I calc 2.5GB per day, in a year you get to 1TB only and not close to 2PB upper limit for 10k bill on queries only. - Pentium10
and did you run only a count() query or data doesn't show in the query itself? (the first can happen as count is updated only after data gets in long term storage) - Pentium10
@user2682204 I'm suspicious about the billing calculations too. Have you tried talking to the Cloud salespeople? They can help to walk through costs, capabilities, etc. - Elliott Brossard
Hi Platinum10 and Elliott. We get around 3 million call records per day and support engineers and quality team continuously generates reports of call quality data to see how calls are connecting. This increases data it scans every hour in a day and at the end of day we are looking at around 3 GB table and each query cost is coming around 100 MB and with 10 users generating reports it takes me to scan around 20 TB data a day. Since records update every second it is not using cache - user2682204

1 Answers

0
votes

You have the "Streaming insert" option which enables the records to be searchable in few seconds (it has its price).
See: streaming-data-into-bigquery

Check table-decorators for limiting query scan.