I have a simple time series like this:
time id area
2019-09-25T17:21:00Z 1 us
2019-09-25T17:22:00Z 1 uk
2019-09-25T17:23:00Z 2 canada
2019-09-25T17:24:00Z 3 us
2019-09-25T17:25:00Z 1 canada
I want to get count the areas for last point of each distinct id, so the result should be something like this:
area count
us 1
canada 2
Since the last point for id 1 is canada, i want to ignore all previous points for id 1.
How do I query for only the most recent point for each distinct id? Is this possible?
Edit: Here is the actual data I am working with.
name: click3
time area id value
---- ---- -- -----
1569480689926885700 travel session1 1
1569480693527591500 travel session2 1
1569480699951799900 vtc session3 1
1569480706416720700 health session1 1
1569480713265800900 claim session4 1
1569480719882312600 health session3 1
area
and id
are indeed tags. When I do a simple select LAST(value)
with a GROUP BY
, I get the following:
> select last(value) as value, area, id from click3 group by id
name: click3
tags: id=session1
time value area id
---- ----- ---- --
1569480706416720700 1 health session1
name: click3
tags: id=session2
time value area id
---- ----- ---- --
1569480693527591500 1 travel session2
name: click3
tags: id=session3
time value area id
---- ----- ---- --
1569480719882312600 1 health session3
name: click3
tags: id=session4
time value area id
---- ----- ---- --
1569480713265800900 1 claim session4
which is correct - the last point for each unique session id. When I select * from
this query as a subquery, the result is
> select * from (select last(value) as value, area, id from click3 group by id)
name: click3
time area id id_1 value
---- ---- -- ---- -----
1569480693527591500 travel session2 session2 1
1569480706416720700 health session1 session1 1
1569480713265800900 claim session4 session4 1
1569480719882312600 health session3 session3 1
When I add an aggregate like COUNT(*) or SUM(value), I see the number 4 as expected:
> select count(*) from (select last(value) as value, area, id from click3 group by id)
name: click3
time count_value
---- -----------
0 4
But if I then add a GROUP BY area
to this query, I expect to see travel have a value of 1, health have a value of 2, and claim to have a value of 1. For some reason, it seems like the full original set of data points is used and not reduced set from the subquery, so I end up with this:
> select count(*) from (select last(value) as value, area from click3 group by id) group by area
name: click3
tags: area=claim
time count_value
---- -----------
0 1
name: click3
tags: area=health
time count_value
---- -----------
0 2
name: click3
tags: area=travel
time count_value
---- -----------
0 2
name: click3
tags: area=vtc
time count_value
---- -----------
0 1
I think I must be significantly misunderstanding how influxdb works. What am I missing?