0
votes

I checked the time series paper from google cloud https://cloud.google.com/bigtable/docs/schema-design-time-series and also the scheme design of opentsdb which is based on hbase that is very similar to bigtable.

The scheme design of opentsdb uses a lot of tricks to encode the data points and row keys into wide rows so that the size of per data point is smaller. But in the paper of Google just suggests to use narrow rows.

My question is that, can I get some real benefit from opentsdb scheme design for time series storage with bigtable. And, is it true that the compression of bigtable can help me remove redundancy so that the opentsdb schema makes very little difference?

2

2 Answers

3
votes

To design schemas for your application is typically very specific to your needs. You can have general recommendations, but you might be better served with a radically different design for your specific application.

Many of the suggestions in the StumbleUpon deck and MapR's video (below) are excelent design ideas that were not included in the Time Series paper. To answer your questions:

  1. Can I get some real benefit from opentsdb scheme design for time series storage with bigtable?

Yes - the design ideas from OpenTSDB are good ideas and are compatible with the Cloud Bigtable paper.

  1. Is it true that the compression of bigtable can help me remove redundancy so that the opentsdb schema makes very little difference?

Cloud Bigtable's compression makes a big difference. (Smaller things often compress smaller than bigger things even with redundancies.)

Schema Design

The Google Time Series paper has the engineering team's recommendations in it and has the benefit many years of experience desigining with Bigtable.

Of course you should start with the HBase and Schema Design and Designing your Schema for Cloud Bigtable. Ian Varley's Masters Thesis No Relation: The Mixed Blessings of Non-Relational Databases is also worth reading.

Time Series Design

Cloudera has a good chapter on Schema case studies which talks about Time Series.

OpenTSDB design

MapR's HBase Key Design with OpenTSDB video is short and worth watching. Looking into OpenTSDB there is an interesting deck from StumbleUpon.

0
votes

In the whitepaper - Cloud Bigtable Schema Design for Time Series Data - we recommend narrow rows for three reasons.

The first reason isn't specific to Cloud Bigtable. We recommend narrow rows by default, with one event per row, because this makes your queries easy to implement and consequently your applications easier to develop, test and maintain. We recommend wide rows only as an optimization where it doesn't obfuscate your queries and improves some measurable aspect of your application.

The second perspective is specific to Cloud Bigtable. We recommend narrow rows because if you use wide rows, especially rows containing potentially unbounded numbers of events, you can easily, or unexpectedly, run into the maximum recommended row size for Cloud Bigtable of 100MB which can lead to performance issues.

The third perspective is the observation that Apache HBase and Cloud Bigtable are different implementations of the HBase interface. Optimizations that perform well for Apache HBase might not perform for Cloud Bigtable and vice versa. The whitepaper encapsulate the lessons learned internally over the years runnning Bigtable at Google where it's generally found that narrow rows outperform wide rows.

Great question, deep and pertinent, thank you for asking it.