I am trying to decide how measurements should be organised in an InfluxDB the database (which I believe they call schema design and data layout) but I think this may be a more general database type question.
Let's say as a simple example that I am measuring two quantites, temperature and humidity (imaginative, I know!), in two locations, living room and outside.
InfluxDB has the syntax for inserting data points:
measurement, tag_key=tag_value field_key=field_value
and so there are two obvious (at least to me) options. Briefly, the first option would insert a datapoint like this:
INSERT temperature,location=outside value=15
INSERT humidity,location=outside value=50
whereas the second option would do it this way:
INSERT sensor_measurements,location=outside temperature=15,humidity=50
My questions are more high level:
- Is there a preferred/accepted way to go about this?
- Will I run into problems with either of these if I try to scale it up to more quantities/locations/data types?
- Does either of the methods offer an advantage if I later on try to graph these things in Grafana, for example, or if I try to implement later some of the many InfluxQL functions?
- Does anyone have any general advice about this to offer?
My own thoughts:
Option 1 seems to me to be more like what is implied by the InfluxDB description "measurement". Both temperature and humidity are separate quantities. But it seems a little clunky to just call it "value".
Option 2 appears to have the advantage that both the humidity and the temperature share exactly the same timestamp. This would come in useful, for example, if I wanted to import the data into some other software and do a correlation between the two quantites, and would mean I wouldn't have to do any interpolation or binning to get them to match up.
I am not sure if it is a bad idea with Option 2 to just have a general measurement called sensor_measurements, and will be hard to maintain later.
In detail:
Option 1
- Have a separate "measurement" for each of temperature and humidity, use the location as a "tag", and just name the "field" as value:
At time t1, insert the data:
INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28
At time t2, insert some different data:
INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28
I can then get access to the living room temperature by querying the following:
> SELECT value FROM temperature WHERE location='living_room'
name: temperature
time value
---- -----
1590416682017481091 28
1590416723963187592 29
I can also use the group by function to do something like this:
SELECT value FROM temperature GROUP BY "location"
Option 2
- Have a combined "measurement" called sensor_measurements, for example, use a "tag" for location, and then have separate "fields" for each of temperature and humidity:
At time t1, insert the data:
INSERT sensor_measurements,location=outside temperature=15,humidity=50
INSERT sensor_measurements,location=living_room temperature=28,humidity=65
At time t2, insert some different data:
INSERT sensor_measurements,location=outside temperature=14,humidity=56
INSERT sensor_measurements,location=living_room temperature=29,humidity=63
I can now get access to the living room temperature by querying the following:
> SELECT temperature FROM sensor_measurements WHERE location='living_room'
name: sensor_measurements
time temperature
---- -----------
1590416731530452068 28
1590416757055629103 29
I can now use the group by function to do something like this:
SELECT temperature FROM sensor_measurements GROUP BY "location"