"Transactional safety" in influxDB

Question

We have a scenario where we want to frequently change the tag of a (single) measurement value. Our goal is to create a database which is storing prognosis values. But it should never loose data and track changes to already written data, like changes or overwriting.

Our current plan is to have an additional field "write_ts", which indicates at which point in time the measurement value was inserted or changed, and a tag "version" which is updated with each change. Furthermore the version '0' should always contain the latest value.

name: temperature
-----------------
time                    write_ts (val) current_mA (val) version (tag)     machine (tag)
2015-10-21T19:28:08Z    1445506564     25               0                 injection_molding_1

So let's assume I have an updated prognosis value for this example value.

So, I do:

SELECT curr_measurement
INSERT curr_measurement with new tag (version = 1)
DROP curr_mesurement
//then
INSERT new_measurement with version = 0

Now my question:

If I loose the connection in between for whatever reason in between the SELECT, INSERT, DROP:

I would get double records.

(Or if I do SELECT, DROP, INSERT: I loose data)

Is there any method to prevent that?

Jason Jason · Accepted Answer · 2019-12-19T05:18:24

Transactions don't exist in InfluxDB

InfluxDB is a time-series database, not a relational database. Its main use case is not one where users are editing old data.

In a relational database that supports transactions, you are protecting yourself against UPDATE and similar operations. Data comes in, existing data gets changed, you need to reliably read these updates.

The main use case in time-series databases is a lot of raw data coming in, followed by some filtering or transforming to other measurements or databases. Picture a one-way data stream. In this scenario, there isn't much need for transactions, because old data isn't getting updated much.

How you can use InfluxDB

In cases like yours, where there is additional data being calculated based on live data, it's common to place this new data in its own measurement rather than as a new field in a "live data" measurement.

As for version tracking and reliably getting updates:

1) Does the version number tell you anything the write_ts number doesn't? Consider not using it, if it's simply a proxy for write_ts. If version only ever increases, it might be duplicating the info given by write_ts, minus the usefulness of knowing when the change was made. If version is expected to decrease from time to time, then it makes sense to keep it.

2) Similarly, if you're keeping old records: does write_ts tell you anything that the time value doesn't?

3) Logging. Do you need to over-write (update) values? Or can you get what you need by adding new lines, increasing write_ts or version as appropriate. The latter is a more "InfluxDB-ish" approach.

4) Reading values. You can read all values as they change with updates. If a client app only needs to know the latest value of something that's being updated (and the time it was updated), querying becomes something like:

SELECT LAST(write_ts), current_mA, machine FROM temperature

You could also try grouping the machine values together:

SELECT LAST(*) FROM temperature GROUP BY machine

So what happens instead of transactions?

In InfluxDB, inserting a point with the same tag keys and timestamp over-writes any existing data with the same field keys, and adds new field keys. So when duplicate entries are written, the last write "wins".

So instead of the traditional SELECT, UPDATE approach, it's more like SELECT A, then calculate on A, and put the results in B, possibly with a new timestamp INSERT B.

Personally, I've found InfluxDB excellent for its ability to accept streams of data from all directions, and its simple protocol and schema-free storage means that new data sources are almost trivial to add. But if my use case has old data being regularly updated, I use a relational database.

Hope that clear up the differences.