2
votes

I am reading some legacy code and I found that all the inserts in influxdb are done this way (simplified code here) :

influxDB.write(Point.measurement(myeasurement)
  .time( System.currentTimeMillis(), TimeUnit.MILLISECONDS)
  .addField("myfield", 123)
  .tag("rnd",String.valueOf(Math.random() * 100000)))
  .build())

As you can guess, the tag value of the tag "rnd" is different for each value, which means that we can have 100k different tag values. Actually, for now we have less values than that, so we should end up having one different tag value per value...

I am not an expert in influxdb, but my understanding is that tags are used by influxdb to group related values together, like partition or shards in other tools. 100k tags values seems to be a lot ...

Is that as horrible as I think it is ? Or is there any possibility that this kind of insert may be usefull for something ?

EDIT: I just realized that Math.random()* is a double, so the * 100000 is just useless. As the String.valueOf(). Actually, there is one series in the database per value, I can't imagine how that could be a good thing :(

1
To make a long story short: me neither. If it was a UUID, I could see some (exotic) use cases. But with a simple random value, the chance of collusion is just to high. - Markus W Mahlberg

1 Answers

1
votes

It is bad and unnecessary.
Unnecessary because each point that you write to influxdb is uniquely identified by its timestamp + set of applied tag values.
Bad because each set of tag values creates a separate series. Influxdb keeps an index over the series. Having a unique tag value for each datapoint will grow your system resoruce requirements and slow down the database. Unless you don't have that many datapoints, but then you don't really need a timeseries database or just don't care.

As the OP said. Tags are used for grouping by or filtering.
Here are some good reads on the topic
https://docs.influxdata.com/influxdb/v1.7/concepts/tsi-details/
https://www.influxdata.com/blog/path-1-billion-time-series-influxdb-high-cardinality-indexing-ready-testing/

According to documentation the upper bound [for series or unique tag values] is usually somewhere between 1 - 4 million series depending on the machine used. which is easily a day worth of high resolution data.