1
votes

I am trying to extract certain fields from a single message field. I am trying to achieve this by grok regex on the logstash so that i could view them in kibana.

My log events is as below: [2021-01-06 12:10:40] ApiLogger.INFO: API log data: {"endpoint":"/rest/thre_en/V1/temp-carts/13cEIQqUb6cUfxB/tryer-inform","http_method":"GET","payload":[],"user_id":0,"user_type":4,"http_response_code":200,"response":"{\"pay_methods\":[{\"code\":\"frane\",\"title\":\"R2 Partial redeem\"}],\"totals\":{\"grand_total\":0,\"base_grand_total\":0}}

The entire log has more information into different key value store- Basically, I needed these information -

  1. time stamp (i am able to get this)
  2. log level (I am able to get this) => on loglevel, i just want the info not the entire Api.INFO
  3. endpoint
  4. http-method
  5. user_id
  6. user_type
  7. http_response_code
  8. response

I am not able to get the information from 3-8 ... i tested it. it is due to the semi colon(:) this is what i tried through grok debugger %{SYSLOG5424SD:logtime} %{JAVACLASS:loglevel}: (?<API>\w+ \w+ \w+):

i tried uri and other but it did not work, may be due to the colon.

1
It looks as if the open quote after "response": has no matching closing quote, is it correct? Did you forget to include it?Wiktor Stribiżew
It has but I did not share the entire content as it is very long. Yes, I forgot to include that.Mintu
I could only test using your sample input. If there are more fields in the payload try adding more .*? in between, %{SYSLOG5424SD:logtime} %{JAVACLASS:loglevel}: (?<API>\w+ \w+ \w+):\s*\{"endpoint":"(?<endpoint>[^"]*)","http_method":"(?<http_method>[A-Z]++).*?"user_id":(?<user_id>[0-9]++).*?"user_type":(?<user_type>[0-9]++).*?"http_response_code":(?<http_response_code>[0-9]++).*?"response":"(?<response>.*)"Wiktor Stribiżew
why don't use the json filter instead of parsing by hand? You can extract the json part with a grok pattern like this: %{SYSLOG5424SD:logtime} ApiLogger.%{LOGLEVEL:loglevel}: API log data: %{GREEDYDATA:json_field} and then you call the json filter on the json_field.baudsp
Yes, %{SYSLOG5424SD:logtime} ApiLogger.%{LOGLEVEL:loglevel}: (?<API>\w+ \w+ \w+):\s*%{GREEDYDATA:json_field} and then parse the json_field with JSON filter.Wiktor Stribiżew

1 Answers

0
votes

You can use

%{SYSLOG5424SD:logtime} ApiLogger.%{LOGLEVEL:loglevel}: (?<API>\w+ \w+ \w+):\s*%{GREEDYDATA:json_field}

Then, you can parse the json_field with JSON filter.

If you want to play around with regex, you should remember that regex engine parses a string from left to right by default. If you want to capture several fields with one regular expression, you should make sure the regex engine can "walk" all the way from one part to another. If you know what patterns there are, what types of chars there are between the two, it is great. If not, you can only rely on a .* (%{GREEDYDATA}) or .*? (%{DATA}) patterns.

So, as an excercise, you might have a look at

%{SYSLOG5424SD:logtime} %{JAVACLASS:loglevel}: (?<API>\w+ \w+ \w+):\s*\{"endpoint":"(?<endpoint>[^"]*)","http_method":"(?<http_method>[A-Z]++).*?"user_id":(?<user_id>[0-9]++).*?"user_type":(?<user_type>[0-9]++).*?"http_response_code":(?<http_response_code>[0-9]++).*?"response":"(?<response>.*)"

Check the ++ in [0-9]++ and .*? patterns between each field. The ++ possessive quantifier make sure the engine does not retry matching with the pattern that is modified by the quantifier again if the subsequent patterns fail to match. The [0-9]++ grabs a sequence of digits and does not give them away and if the subsequent patterns fail, the whole match fails. .*? simply matches any zero or more chars other than line break chars, as few as possible. The last .* is greedy, because it must match as many chars other than line break chars as possible.

See the regex demo.