1
votes

I have a system that serves as an API for 3rd parties. I need to watch each 3rd party response time. But as prometheus defines in its documentation:

Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

So this means i shouldn't use labels as my endpoint because users are unbounded, (600 for now, keep groowing). Then i will observe metric per user but will this change any performance problems that I may counter in the future?

Instead of label filtering:

http_requests_total{id="3rdParty1"} http_requests_total{id="3rdParty2"}

Should i use per user per metric?

http_3rdParty1_requests_total http_3rdParty2_requests_total ...

2

2 Answers

3
votes

The core question is how many time series you have. It's the same number whether you put the users into the metric name or a label, the only difference being that putting them into the metric name is much harder to work with.

With a cardinality of 600, it's not likely to be wise to break this data out this way and you should look at also using a logs-based monitoring system such as the ELK stack for that sort of analysis.

0
votes

Use "labels per user". Do not put multiple unrelated concerns into the metric name.

If you use http_3rdParty1_requests_total as the metric name you have putting two values, concatenated into one text field: the client name and the metric name are joined together.

If you designed a SQL database that way, e.g. with "customer last name + bank branch name" stored in one text field, we would think that you are making a rookie mistake and tell you to store two values in two fields, each with a meaningful name, and not one field with two values smushed into it. This isn't much different.

Metric name is really just another label with a special name, i.e. internally it is __name__ ="http-requests_total"

You don't get around cardinality by putting data in the name rather than in it's own label. That won't change the cardinality at all. With over 600 unique values, you might have problems either way.

But storing two different values in two fields, not one, is still the right way to do it, and will save you trouble later when making queries. e.g. With a separate label such as user="3rdParty1" you can craft queries such as: how many users were active in the last 24 hours? Show me graphs of http request volume per user. Show me users that had 10 or more errors in the last hour. Show me all metrics for this user.

See:

The correct way to handle this is to use a label to distinguish the different pools, rather than encoding them inside the metric name

https://www.robustperception.io/whats-in-a-\_\_name__

This is however not the way to handle things in Prometheus whose labels provide a more powerful data model.

https://www.robustperception.io/target-labels-not-metric-name-prefixes

You might try putting the path in the metric name, such as is common in Graphite ... Accordingly, this is an antipattern you should avoid. Instead, to handle this common use case, Prometheus has labels.

https://www.oreilly.com/library/view/prometheus-up/9781492034131/ch05.html