3
votes

I am having a setup collecting metrics from telegraf into influxdb. Then grafana uses influxdb as data source to display graphs.

My problem is reducing disk usage, so I want to downsample old data (older than 3 days) and keep the new data (younger than 3 days) as is (raw)

I tried Retention Policy (RP) of influxdb and Continuous Queries (CQ) as described in guide: https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention

         influxdb ("telegraf")
+----------------------------+
|                            |
| +-----------------------+  |
| |  table disk_raw       |  |
| |  CURRENT RP (RAW)     +---------+
| |  (deleted after 3d)   |  |      |
| +-----------------------+  |      |CQ (average 30 min of datapoints into 1)
| +-----------------------+  |      |
| |  table_disk_ds        |  |      |
| |  LONGTERM RP          +<--------+
| |(downsampled, kept 90d)|  |
| +-----------------------+  |
|                            +<----+
+----------------------------+     |
                                   |
                                   |
      grafana                      | grafana query
+----------------------------+     |
|                            |     |
|  +----------------------+  |     |
|  |    data graph        |  +-----+
|  +----------------------+  |
|                            |
+----------------------------+

The problem is - this solution is giving you 2 tables, one for raw data and one for downsampled data. CQ is constantly writing out to downsampled data. That is not so good for me as:

  • I am using grafana to query influxdb and it reads from single table to the graph. And I want one graph for both old data and new data.
  • Using 2 databases increases disk usage

Is there any way to downsample old records in the very same table?

configuring example: https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention

grafana query

SELECT mean("used_percent") FROM "disk" WHERE ("device" = 'dm-0') AND $timeFilter GROUP BY time(10s) fill(none)

1
You want to have the cake and eat it - without any good reason. First, you can easily have multiple queries in a Grafana and the overhead in disk usage wether you have the data in one or in two databases is negligible.Markus W Mahlberg
Correction @MarkusWMahlberg the OP wants to have his cake and eat it with good reason, stated clearly in the question. It is a reasonable request to want to view recent data and older downsampled data within the same graph. Influx prides itself on conceptual simplicity and it's understandable that a user may consider the removal of measurements / records to be something that happens to the original data set, analagous to writing down a log in a notepad then crossing some lines out.hazymat
So, use two queries. Problem solved.Markus W Mahlberg

1 Answers

3
votes

EDIT2: Here's a workaround implemented with template variables in Grafana
https://github.com/grafana/grafana/issues/4262#issuecomment-475570324
This seems like a really good solution.

ORIGINAL ANSWER Looking into the example from the influxb page you linked

CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
  SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
  INTO "a_year"."orders"
  FROM "orders"
  GROUP BY time(30m)
END

If you specify the same source and target table, namely orders, into both INTO and FROM clauses then the data will be written to the same table.

However, that does not solve your issue.
You would still need two queries to get the data from both retention policies. If you do a generic select * from disk_raw ... Influx will use the default retention policy and return data just from there.

The way you usually go about this is by running two queries and concatenating the results. In a single request something like
select * from rp_short.diskraw; select * from rp_long.diskraw

EDIT: Here is a discussion of why it's not possible to do what you (and a lot of other people) want https://github.com/influxdata/influxdb/issues/2625 And also some ways to work around it.
Briefly, one way is to handle the downsampling and high resolution data manually (i.e not with CQ) and keep it in the same retention policy. Another is to use a proxy that would augment the query depending on the time range of the query in order to get the correct data.