22
votes

Background

We'd like to store our immutable events in a (preferably) managed service. Average size of one event is less than 1 Kb and we have between 1-5 events per second. The main reason for storing these events is to be able to replay them (perhaps using table scanning) once we create future services that might be interested in these events. Since we're in the Google Cloud we're obviously looking at Google's services as first choice.

I suspect that Bigtable would be a good fit for this but according to the price calculator it'll cost us more than 1400 USD per month (which to us is a big deal):

enter image description here

Looking at something like BigQuery renders a price of 3 USD per month (if I'm not missing something essential):

enter image description here

Even though a schema-less database would be better suited for us we would be fine with essentially storing our events as a blob with some metadata.

Questions

Could we use BigQuery for this instead of Bigtable to reduce costs? For example BigQuery has something called streaming inserts which to me seems like something we could use. Is there anything that'll bite us in the short or long term that I might not be aware of if going down this route?

6
You are not missing essential, BQ is extremely "cheap".Pentium10
BigQuery is optimized for long-term storage and analytics, BigTable for heavy usage by an online appIgor Artamonov
Not sure but could be limits in terms of operations. E.g think you can only do like 1k append to a table per day (that was one BQ api limit I hit a while ago). Although I think the streaming api is more forgiving. Just might be another dimension to consider.andrewm4894

6 Answers

11
votes

Bigtable is great for large (>= 1TB) mutable data sets. It has low latency under load and is managed by Google. In your case, I think you're on the right track with BigQuery.

9
votes

FYI

Cloud Bigtable is not a relational database; it does not support SQL queries or joins, nor does it support multi-row transactions. Also, it is not a good solution for small amounts of data (< 1 TB).

Consider these cases: - If you need full SQL support for an online transaction processing (OLTP) system, consider Google Cloud SQL.

If you need interactive querying in an online analytical processing (OLAP) system, consider Google BigQuery.

If you need to store immutable blobs larger than 10 MB, such as large images or movies, consider Google Cloud Storage.

If you need to store highly structured objects, or if you require support for ACID transactions and SQL-like queries, consider Cloud Datastore.

7
votes

The overall cost boils down to how often you will 'query' the data. If it's an backup and you don't replay events too often, it'll be dirt cheap. However, if you need to replay it once daily, you start triggering the 5$/TB scanned too easily. We were surprised too how cheap inserts and storage were, but this is ofc because Google expects you to run expensive queries at some point in time on them. You'll have to design around a few things though. E.g. AFAIK streaming inserts have no guarantue's of being written to the table and you have to poll frequently on tail of list to see if it was really written. Tailing can be done efficiently with time range table decorator, though (not paying for scanning whole dataset).

If you don't care about order, you can even list a table for free. No need to run a 'query' then.

5
votes

Google Cloud - GCP database options decision flowchart

This flowchart may help in deciding between different Google cloud storage offerings (Disclaimer! copied this image from Google cloud's page)

If your usecase is a live database(let's say, backend of a website), BigTable is what you need (Still it's not really an OLTP system though) . If it is more of an data analytics/ datawarehouse kind of purpose, then BigQuery is what you need.

Think of OLTP vs OLAP; Or if you are familiar with Cassandra and Hadoop, BigTable roughly equates to Cassandra, BigQuery roughly equates to Hadoop (Agreed, not a fair comparison, but you get the idea)

https://cloud.google.com/images/storage-options/flowchart.svg

Please keep in mind that Bigtable is not a relational database, it's a noSQL solution without any SQL features like JOIN etc. If you want an RDBMS OLTP, you might need to look at cloudSQL (mysql/ postgres) or spanner.

Cloud spanner is relatively young, but is powerful and promising. At least, google marketing claims that it's features are best of both worlds (Traditional RDBMS and noSQL)

enter image description here

Cost Aspect

Cost aspect is already covered nicely here https://stackoverflow.com/a/34845073/6785908

I know this is very late answer, but adding it anyway incase it may help somebody else in future.

3
votes

Hard to summarize better than it is already done by Google.

I think you need to figure out how you are going to use (replay) your data (events) and this can help you in making final decision.

So far, BigQuery looks like a best choice for you

1
votes

Bigtable is a distributed (run on clusters) database for applications that manage massive data. Its designed for massive unstructured data, scales horizontally and made of column families. It stores data in key value pairs as opposed to relational or structured databases.

BigQuery is a datawarehouse application. That means it provides connection to several data sources or streams such that they can be extracted, transformed and loaded into bigQuery table for further analysis. Unlike Bigtable, It does store data in structured tables and supports SQL queries.

Use cases; If you want to do analytics or business intelligence by deriving insights from collected data on from different sources (applications, research, surveys, feedback, logs etc...) of your organisation , you may want to pull all this information into one location. This location will most likely be a Bigquery data warehouse.

If you have an application that collects Big data, in other words massive information (High data volume) per time at higher speeds (High velocity) and in unstructured inconsistent forms with different data types as audio, text, video, images, etc... ( Variety and veracity), then your probable choice of database application for this app would be Bigtable.