I am trying to count distinct sessionIds from a measurement. sessionId
being a tag, I count the distinct entries in a "parent" query, since distinct()
doesn't works on tags.
In the subquery, I use a group by sessionId limit 1
to still benefit from the index (if there is a more efficient technique, I have ears wide open but I'd still like to understand what's going on).
I have those two variants:
> select count(distinct(sessionId)) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 3757
> select count(sessionId) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 4206
To my understanding, those should return the same number, since group by sessionId limit 1
already returns distinct sessionIds (in the form of groups).
And indeed, if I execute:
select * from UserSession group by sessionId limit 1
I have 3757 results (groups), not 4206.
In fact, as soon as I put this in a subquery and re-select fields in a parent query, some sessionIds have multiple occurrences in the final result. Not always, since there is 17549 rows in total, but some are.
This is the sign that the limit 1
is somewhat working, but some sessionId still get multiple entries when re-selected. Maybe some kind of undefined behaviour?