TL;DR
How to model data into fields vs tags incase you want to perform both group by and count(distinct())
So currently this is my influxdb data model:
api_requests (database)
- requests_stats (measurement)
- api_path (tag)
- app_version (tag)
- host (tag)
- platform (tag)
- account_id (field)
- user_id (field)
- function_name (field)
- network (field)
- network_type (field)
- time_to_execute (field)
So now I want to find out the number of distinct accounts (active accounts). So I can run the following query:
SELECT count(distinct("account_id")) AS "active_accounts"
FROM "api_requests"."autogen"."requests_stats"
This works fine as account id is a field.
Now suppose I want to perform a group by operation on account_id, for example to find the number of requests received per account:
SELECT count("function_name") AS "request_count"
FROM "api_requests"."autogen"."requests_stats"
GROUP BY "account_id"
I cannot do this as group by is recommended on tags.
How would one manage this kind of scenerio?
One of the solution is to store the value in both field and value but that would be data redundancy.
The other and the most optimal way would be for count(distinct()) to work on tags. Is this possible? This was actually a feature request in their github repo.
Or can something be done about the data model to achieve the same?