21
votes

I'm trying to extract an ephemeral field with the parse command. Unfortunately, the log format is such that the glob expression is not enough for it, thus I need to use regex. The regex itself is fine, but I just can't make the command to extract anything.

I'm trying with:

parse @endpoint /^([a-zA-Z_]+)[\/|?]*.*/ as @clean_endpoint

The first group is what I'm after here and I did try with different kinds of quotes etc. It might be just a stupid formatting error, but I just cant' find it.

Pretty much the only documentation mentioning the parse command is here and the example there is using the glob expressions. Couldn't find any examples by googling either.

So anyone bumped into this and solved it?

2

2 Answers

9
votes

Try another approach, like

parse @message /(?<@endpt>(\/[a-zA-Z0-9_]+){1,})/
| stats count_distinct(@endpt) by @endpt

or, alternatively, consider the solution

fields @timestamp
| parse @message /(?<@endpt_post>POST (\/[a-zA-Z0-9_]+){1,})/
| parse @message /(?<@endpt_get>GET (\/[a-zA-Z0-9_]+){1,})/
| stats count() by @endpt_post, @endpt_get

Good luck!

7
votes

Not sure if you found the answer to this, but when using regex with parse, you can't name the ephemeral fields like you do with glob.

When using glob expressions, you name the new field with "as ___" at the end of your statement. When trying this with a regex it doesn't work.

parse @message ((glob expression here)) as ephem_field

When using regex, the new fields should be named within the expression itself as a named group.

parse @message /(?<clean_endpoint>^([a-zA-Z_]+)[\/|?]*.*)/

Although regex allows you to name a group using single quotes 'name' or angled brackets <name> I have noticed that AWS CloudWatch Insights will only accept angled brackets when naming groups. When trying with single quotes I got the errors saying it was unable to to understand the query.

I'm unsure what regex type AWS is using, but I did find I had to escape some characters that would be allowed in other tools.