6
votes

I have cloud-init.log logs being sent to CloudWatch and I want to create a metric filter to extract the reported time it takes Cloud Init to run.

A sample log entry looks like:

Jun 24 12:06:51 ip-x-x-x-x [CLOUDINIT] util.py[DEBUG]: cloud-init mode 'modules' took 295.097 seconds (294.83)

And the value I would like to extract is: 295.097

It seems pretty straight forward because took [number] seconds is unique to just this line. This guide on metric filter syntax seems to only show examples for extracting values from JSON logs and this official example list doesn't cover it.

Based on the documentation, I figured something like:

[..., "took", seconds]

would work, but I haven't had much luck.

Any help would be really appreciated!

3

3 Answers

7
votes

What you tried is almost correct, slight syntax change.

Instead of

[..., "took", seconds]

It should be

[..., took="took", seconds, secondsword, secondsparentheses]

The last two words don't matter what they are, I tried to pick useful names to say what they'd represent.

With this, seconds variable will now contain 295.097 which you can use in your metric.

The ... can be used at the beginning or at the end, but not both it seems, which is why we have to call out the secondsword and secondsparentheses afterwards.

1
votes

To extract values, I think you need to format your logs in either json or a space delimited format. For your example above it's difficult because you're trying to extract a value from a message.

It's ugly, but this works (testing in the Metric Filter editor gives $value=295.097):

[month, day, time, ip, process, module, name, mode, modules, type=took, value, therest]

You probably would want to change your log format to simplify things.

1
votes

Generalizing this answer a bit.
The Metric filter syntax for space delimited log lines must be anchored either at the start of the line or the end of the line, and you can use ... at the other end to indicate you don't care how many columns there are in that direction.
Given a log line with the following space separated values
1 2 3 4 5 6
you can capture the value in column four like this
[,,,four,...]
Indicating that you want the value in the 4th column from the start
or
[...,four,,]
Indicating that you want the value in the 3rd column from the end.
Test these patterns, will indicate $four=4

The accepted answer is correct, but those last two column names do not need to be defined. They just need empty placeholders, so
[..., took="took", seconds, secondsword, secondsparentheses]
becomes just
[...,took=took,seconds,,]
Technically you don't even need took=took if you know that every log line is this same syntax. However, adding took=took ensures that you only match log lines that contain that word in the 4th column from the end of the line and it's probably safest to always match at least one known item in the log line like this.