2
votes

I'm having trouble understanding how to add tags to data series as I do SELECT INTO queries. I have an Influxdb of the NYTimes COVID dataset where I've used the cases and deaths fields as fields and the state and county information as tags.

I can aggregate data from neighboring counties in a query like this:

SELECT sum("cases") AS "cases" FROM "ny_covid"."autogen"."value" WHERE ("state"='Pennsylvania') AND ("county"='Philadelphia' OR "county"='Delaware') GROUP BY time(1d) FILL(null)

This works perfectly. But I want to save this aggregated data into a new database for doing other queries. Which I can do like this:

SELECT sum("cases") AS "cases" INTO "new_covid"."autogen"."value" FROM "ny_covid"."autogen"."value" WHERE ("state"='Pennsylvania') AND ("county"='Philadelphia' OR "county"='Delaware') GROUP BY time(1d) FILL(null)

My question is, how do I add a tag like location=Philly to the data I've just inserted into the new_covid database? Because, I'd like to do a few other location level aggregations and it seems like the tag is the way to keep these values distinct.

SELECT sum("cases") AS "cases" INTO "new_covid"."autogen"."value" FROM "ny_covid"."autogen"."value" WHERE ("state"='Pennsylvania') AND ("county"='Dauphin' OR "county"='Lancaster') GROUP BY time(1d) FILL(null)

All of the searching I've done has just been about using the tags in queries or preserving them when copying across databases. But I haven't been able to find anything about attaching tags in SELECT INTO type statements.

1
Do you really need to create new tag? Why you can't reuse existing tags state,county?Jan Garaj
@JanGaraj Because I would like to add the tag for the sum of 6 counties as a new area in the new database. That way I can pull them out of the new_covid database by tag and distinguish it from the 7 county sum around Pittsburg or 8 around Harrisburg. I also don't know how I would transfer the tags from the previous DB into the new one anyway. This is also a general question: "How do you add tags to data inserted with a SELECT INTO statement"?JudoWill
ok, then you need to do it on the app/own code level: select data; add new tag to data; insert data;. Use your favorite language with InfluxDB support (e.g. python) and code it.Jan Garaj
Is it really the answer that there is no way to add tags while doing SELECT INTO statements? The only way to add tags is to pull data out of one database and then re-insert it again. Conversely, the only way to distinguish aggregations in SELECT INTO statements is to send them to distinct fields?JudoWill
Nope, you have also other options. Such as: join InfluxDB, persuade team that this is important feature, so it will be implemented/fork source code and implement it on your own. InfluxQL is not SQL and I don't think that this feature is widely used in the timeseries world. Use better DB, which will fit your needs. Maybe InfluxDB 2.0 (still beta) will be better - it supports multi-data store world - you can connect also relational DB = maybe you will be able to use it as a "temp" storage for data transformation, where you can use good old SQL.Jan Garaj

1 Answers

1
votes

You can easily get the required functionality by using TICKscript language which is complimentary part of InfluxDB within the TICK stack. However that requires to use Kapacitor and Chronograf in addition to InfluxDB itself.

The Kapacitor will performs tasks written on TICKscript language to manipulate the data. The Chronograf provides a Web UI to configure InfluxDB and Kapacitor.

Find the TICKscript task example for InfluxQL queries from your question:

var b1 = batch
    |query('''
    SELECT sum("cases") AS "cases" 
    FROM "ny_covid"."autogen"."value" 
    WHERE ("state"='Pennsylvania') AND ("county"='Philadelphia' OR "county"='Delaware') )
    ''')
        .offset(1d)
        .period(1d)
        .cron('@midnight')
        .groupBy(time(1d))
    // Write the transformed data to InfluxDB
    |influxDBOut()
        .database('new_covid')
        .retentionPolicy('autogen')
        .measurement('value')
        .tag('location', 'Philly')

var b2 = batch
    |query('''
    SELECT sum("cases") AS "cases" 
    FROM "ny_covid"."autogen"."value" 
    WHERE ("state"='Pennsylvania') AND ("county"='Dauphin' OR "county"='Lancaster')
    ''')
        .offset(1d)
        .period(1d)
        .cron('@midnight')
        .groupBy(time(1d))
    // Write the transformed data to InfluxDB
    |influxDBOut()
        .database('new_covid')
        .retentionPolicy('autogen')
        .measurement('value')
        .tag('location', 'Dauphy')

Also that is not one way to get such result. As well you could examine the stream approach instead of batch used above.

All you need to run this code is create new task for kapacitor:

Chronograf-> Alerting tab -> Manage tasks -> "+Write TICKscript" button in the TICKscripts section.

To get the full TICKstack environment you may use offitial sandbox on dockers