0
votes

I have a simple time series like this:

time                    id         area
2019-09-25T17:21:00Z    1          us
2019-09-25T17:22:00Z    1          uk
2019-09-25T17:23:00Z    2          canada
2019-09-25T17:24:00Z    3          us
2019-09-25T17:25:00Z    1          canada

I want to get count the areas for last point of each distinct id, so the result should be something like this:

area      count        
us        1
canada    2

Since the last point for id 1 is canada, i want to ignore all previous points for id 1.

How do I query for only the most recent point for each distinct id? Is this possible?

Edit: Here is the actual data I am working with.

name: click3
time                area   id       value
----                ----   --       -----
1569480689926885700 travel session1 1
1569480693527591500 travel session2 1
1569480699951799900 vtc    session3 1
1569480706416720700 health session1 1
1569480713265800900 claim  session4 1
1569480719882312600 health session3 1

area and id are indeed tags. When I do a simple select LAST(value) with a GROUP BY, I get the following:

> select last(value) as value, area, id from click3 group by id
name: click3
tags: id=session1
time                value area   id
----                ----- ----   --
1569480706416720700 1     health session1

name: click3
tags: id=session2
time                value area   id
----                ----- ----   --
1569480693527591500 1     travel session2

name: click3
tags: id=session3
time                value area   id
----                ----- ----   --
1569480719882312600 1     health session3

name: click3
tags: id=session4
time                value area  id
----                ----- ----  --
1569480713265800900 1     claim session4

which is correct - the last point for each unique session id. When I select * from this query as a subquery, the result is

> select * from (select last(value) as value, area, id from click3 group by id)
name: click3
time                area   id       id_1     value
----                ----   --       ----     -----
1569480693527591500 travel session2 session2 1
1569480706416720700 health session1 session1 1
1569480713265800900 claim  session4 session4 1
1569480719882312600 health session3 session3 1

When I add an aggregate like COUNT(*) or SUM(value), I see the number 4 as expected:

> select count(*) from (select last(value) as value, area, id from click3 group by id)
name: click3
time count_value
---- -----------
0    4

But if I then add a GROUP BY area to this query, I expect to see travel have a value of 1, health have a value of 2, and claim to have a value of 1. For some reason, it seems like the full original set of data points is used and not reduced set from the subquery, so I end up with this:

> select count(*) from (select last(value) as value, area from click3 group by id) group by area
name: click3
tags: area=claim
time count_value
---- -----------
0    1

name: click3
tags: area=health
time count_value
---- -----------
0    2

name: click3
tags: area=travel
time count_value
---- -----------
0    2

name: click3
tags: area=vtc
time count_value
---- -----------
0    1

I think I must be significantly misunderstanding how influxdb works. What am I missing?

1

1 Answers

0
votes

Assuming id and area are tags, something like this should work

select count(*) from (select last(*) from your_measurement group by id) group by area You can substitute * for just a single field. The nested query gets the last datapoint per id, from those results the outer query does a count per area. Depending on your exact use case the query may be slightly different.