2
votes

I have an InfluxDb time series database which I am storing a metric called api_calls. This metric has two pertinent measurements, call_invoked and call_failed. Each measurement also has a tag with a vendor id associated with it.

Every minute, I have a service that collates the number of times I invoked call_invoked (fired when I start the API call) and call_failed (fired when I get an error message from the API call) and stores them into an InfluxDb. So basically, every minute I get a record (per tag, as I understand it) in the InfluxDb "table" (for lack of a better term) that has these two values associated with it.

I have been scratching my head trying to figure out how to show the top ten vendor id's with the largest percentage of API failures.

How can I do this? I have a strong SQL background but everything I've tried has either not worked at all or definitely not worked correctly.

1
Have you tried building a CQ using TOP()? Something like CREATE CONTINUOUS QUERY top_10_offenders ON your_db_name BEGIN SELECT TOP("call_fail_sum", "vendor_id", "call_failed", 10) [,other tag keys] INTO dest_meas_name FROM meas_name GROUP BY time(1m) END (refs: docs.influxdata.com/influxdb/v1.2/query_language/functions/#‌​top docs.influxdata.com/influxdb/v1.2/query_language/…) - Jason
...I haven't. I'll try it. - Jeremy Holovacs
I'm curious. If you have the time, please share results, when you have them. - Jason
It didn't seem to work. The CQ sent me some very strange data that didn't seem to map to anything. It could just be my ignorance on InfluxDb syntax, but I think it's not supported. - Jeremy Holovacs
Can you post a few sample "insert" lines to get a feel for what your schema looks like? - Jason

1 Answers

3
votes

I've made some guesses about your schema, and built up a solution using the CLI. I'm using InfluxDB v1.0.2 here.

Note that TOP() only became a working function around January 2016.

If your version of InfluxDB is older than this, the following won't work.

First, some sample data:

CREATE DATABASE foo
USE foo
PRECISION rfc3339

INSERT api_calls,vendor_id=1 call_invoked=3i,call_failed=3i 1483228860000000000
INSERT api_calls,vendor_id=2 call_invoked=3i,call_failed=8i 1483228860000000000
INSERT api_calls,vendor_id=3 call_invoked=3i,call_failed=5i 1483228860000000000
INSERT api_calls,vendor_id=4 call_invoked=3i,call_failed=9i 1483228860000000000
INSERT api_calls,vendor_id=5 call_invoked=3i,call_failed=10i 1483228860000000000
INSERT api_calls,vendor_id=6 call_invoked=3i,call_failed=1i 1483228860000000000
INSERT api_calls,vendor_id=7 call_invoked=3i,call_failed=6i 1483228860000000000
INSERT api_calls,vendor_id=8 call_invoked=3i,call_failed=9i 1483228860000000000
INSERT api_calls,vendor_id=9 call_invoked=3i,call_failed=6i 1483228860000000000
INSERT api_calls,vendor_id=10 call_invoked=3i,call_failed=5i 1483228860000000000

Then run the following query. Note that the WHERE time > x AND time < y clause could be replaced with something like WHERE time > now() - 1h.

SELECT TOP(call_failed,3), vendor_id FROM api_calls WHERE time > '2017-01-01T00:00:00Z' AND time < '2017-01-01T00:05:00Z' GROUP BY time(1m)

Which gives a result of:

name: api_calls
---------------
time                    top     vendor_id
2017-01-01T00:00:00Z
2017-01-01T00:01:00Z    10      5
2017-01-01T00:01:00Z    9       4
2017-01-01T00:01:00Z    9       8
2017-01-01T00:02:00Z
2017-01-01T00:03:00Z
2017-01-01T00:04:00Z

Note that every minute that has no data returns one row, but where there is data, there are 3 rows returned.

If we tell the CLI we want to peek at the JSON, we can type format json, and when we repeat that same query we get this:

{"results":[{"series":[{"name":"api_calls","columns":["time","top","vendor_id"],"values":[["2017-01-01T00:00:00Z",null,null],["2017-01-01T00:01:00Z",10,"5"],["2017-01-01T00:01:00Z",9,"4"],["2017-01-01T00:01:00Z",9,"8"],["2017-01-01T00:02:00Z",null,null],["2017-01-01T00:03:00Z",null,null],["2017-01-01T00:04:00Z",null,null]]}]}]}

Does that help?

Edit - Re-worked for percentage.

Apologies - I noticed you were looking for percentage, not Top X count.

In InfluxDB, that's two problems: the first is how to generate the percentages.

A note about nesting operators in functions: InfluxDB does not yet generally guarantee that nesting functions or operators within function arguments will work. Some do, many don't. Of importance here: TOP() only allows field keys or tags as arguments, and not operations on tags (e.g. call_failed / total_calls). You'll need to perform an extra query to generate the percentages, and you'll have to store them. You can either calculate them on the "raw", minute-by-minute values:

SELECT (call_failed / (call_failed + call_invoked)) AS pct_fail INTO api_calls FROM api_calls GROUP BY vendor_id

(GROUP BY time is implicit: re-run that query from the CLI without the INTO clause to see how).

Or you can summarize each e.g. hour:

SELECT (sum(call_failed) / (sum(call_failed) + sum(call_invoked))) AS pct_fail INTO api_calls_hourly FROM api_calls GROUP BY time(1h), vendor_id

That can be done as a one-off for existing data. For any newly arriving data, a continuous query can be used:

CREATE CONTINUOUS QUERY fail_pct_calc ON foo BEGIN SELECT (sum(call_failed) / (sum(call_failed) + sum(call_invoked))) AS pct_fail INTO api_calls_summary FROM api_calls GROUP BY time(1h), vendor_id END

(continuous queries require a GROUP BY time() clause).

There's no technical requirement to push Continuous Query results into a new measurement - you could SELECT the calculation results back INTO api_calls, for example. But putting the raw data and the summary results in the same measurement leads to query results that have a lot of nulls. It's often tidier in InfluxDB to push the results to a new measurement.

From there, use TOP() as before:

Edited - fixed 'FROM' measurement

SELECT TOP(pct_fail,3), vendor_id FROM api_calls_summary WHERE time > '2017-01-01T00:00:00Z' AND time < '2017-01-01T00:05:00Z' GROUP BY time(1m)