2
votes

I am studying about logstash and how to use its filters and grok patterns. I have one doubt that I need to clarify.

Let's say our logs contain a timestamp field like:

[01/Sep/2015:06:22:11 -0400]

Using grok, I can define a pattern to capture this as an HTTPDATE, like this:

\[%{HTTPDATE:timestamp}\]

In the grok debugger, I can see that it has been able to identify the date, time, etc from this:

{
  "timestamp": [
    [
      "01/Sep/2015:06:22:11 -0400"
    ]
  ],
  "MONTHDAY": [
    [
      "01"
    ]
  ],
  "MONTH": [
    [
      "Sep"
    ]
  ],
  "YEAR": [
    [
      "2015"
    ]
  ],
  "TIME": [
    [
      "06:22:11"
    ]
  ],
  "HOUR": [
    [
      "06"
    ]
  ],
  "MINUTE": [
    [
      "22"
    ]
  ],
  "SECOND": [
    [
      "11"
    ]
  ],
  "INT": [
    [
      "-0400"
    ]
  ]
}

Now, I was looking at a tutorial on logstash website where they are using another date filter to store this into a date field. Like this:

date {
    match => [ "timestamp", "dd/MMM/YYYY:MM:mm:ss Z"]
    locale => en
}

What this is doing is storing another field with differently formatted date. My question is, why store two date fields representing the same date with just different format. Can we not use the date field from first stage the way we can use the date field from second stage ?

1

1 Answers

3
votes

grok{} is used to turn an unstructured string into structured data. After it runs, you now have a string called "timestamp". If that's all you need, you're done!

But, what if you wanted to use that value as a date rather than a string? That's where the date{} filter comes in. You give date{} the string field and the format of the string, and it will make you a date object that you can then store in elasticsearch.

You can then use elasticsearch date-related queries ("how many records since 5 minutes ago?") which would be impossible if all you had was a string.

By default, date{} sets the @timestamp field, which is what kibana will want to use for the x-axis of your histograms, so setting it to the time when the event was generated (as opposed to whenever it got processed by logstash) is a "good thing".

Once the date{} filter has updated @timestamp with the value, it might make sense to remove the timestamp field. You can do this with a remove_field param to the date filter (which will only run if the filter succeeded).

Hope that helps.