When organising an InfluxDB database, which of these two approaches would be most preferred?

Question

I am trying to decide how measurements should be organised in an InfluxDB the database (which I believe they call schema design and data layout) but I think this may be a more general database type question.

Let's say as a simple example that I am measuring two quantites, temperature and humidity (imaginative, I know!), in two locations, living room and outside.

InfluxDB has the syntax for inserting data points:

measurement, tag_key=tag_value field_key=field_value

and so there are two obvious (at least to me) options. Briefly, the first option would insert a datapoint like this:

INSERT temperature,location=outside value=15
INSERT humidity,location=outside value=50

whereas the second option would do it this way:

INSERT sensor_measurements,location=outside temperature=15,humidity=50

My questions are more high level:

Is there a preferred/accepted way to go about this?
Will I run into problems with either of these if I try to scale it up to more quantities/locations/data types?
Does either of the methods offer an advantage if I later on try to graph these things in Grafana, for example, or if I try to implement later some of the many InfluxQL functions?
Does anyone have any general advice about this to offer?

My own thoughts:

Option 1 seems to me to be more like what is implied by the InfluxDB description "measurement". Both temperature and humidity are separate quantities. But it seems a little clunky to just call it "value".

Option 2 appears to have the advantage that both the humidity and the temperature share exactly the same timestamp. This would come in useful, for example, if I wanted to import the data into some other software and do a correlation between the two quantites, and would mean I wouldn't have to do any interpolation or binning to get them to match up.

I am not sure if it is a bad idea with Option 2 to just have a general measurement called sensor_measurements, and will be hard to maintain later.

In detail:

Option 1

Have a separate "measurement" for each of temperature and humidity, use the location as a "tag", and just name the "field" as value:

At time t1, insert the data:

INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28

At time t2, insert some different data:

INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28

I can then get access to the living room temperature by querying the following:

> SELECT value FROM temperature WHERE location='living_room'

name: temperature
time                value
----                -----
1590416682017481091 28
1590416723963187592 29

I can also use the group by function to do something like this:

SELECT value FROM temperature GROUP BY "location"

Option 2

Have a combined "measurement" called sensor_measurements, for example, use a "tag" for location, and then have separate "fields" for each of temperature and humidity:

At time t1, insert the data:

INSERT sensor_measurements,location=outside temperature=15,humidity=50
INSERT sensor_measurements,location=living_room temperature=28,humidity=65

At time t2, insert some different data:

INSERT sensor_measurements,location=outside temperature=14,humidity=56
INSERT sensor_measurements,location=living_room temperature=29,humidity=63

I can now get access to the living room temperature by querying the following:

> SELECT temperature FROM sensor_measurements WHERE location='living_room'

name: sensor_measurements
time                temperature
----                -----------
1590416731530452068 28
1590416757055629103 29

I can now use the group by function to do something like this:

SELECT temperature FROM sensor_measurements GROUP BY "location"

Jan Garaj Jan Garaj · Accepted Answer · 2020-05-30T17:30:01

I would use option 2 from offered options, because less records = less resources = better query response time (in theory). Generally, both approaches look good.

But I will use more generic 3rd option in real world. Single generic metrics measurement with tags metric,location and field value:

INSERT metrics,metric=temperature,location=outside value=15
INSERT metrics,metric=humidity,location=living_room value=50
INSERT metrics,metric=temperature,location=living_room value=28
INSERT metrics,metric=humidity,location=living_room value=65

That gives me opportunity to create single generic Grafana dashboard, where user will have option to select visualized metric/location via dashboard variable (generated directly from the InfluxDB, e.g. SHOW TAG VALUES WITH KEY = "metric"). Any new inserted metrics (e.g. `illuminance , pressure, wind-speed, wind-direction, ...) or location can be immediately visualized in this generic dashboard. Eventually, some metrics may have also additional tags. That is good and I will be able to use ad-hoc Grafana variable, so users will be able specify any number of key/value filters on the fly. Grafana doc.

When organising an InfluxDB database, which of these two approaches would be most preferred?

Option 1

Option 2

1 Answers