2
votes

I have the following infrastructure:

ELK installed as docker containers, each in its own container. And on a virtual machine running CentOS I installed nginx web server and Filebeat to collect the logs. I enabled the nginx module in filebeat.

> filebeat modules enable nginx

Before starting filebeat I set it up with elasticsearch and installed it's dashboards on kibana.

config file (I have removed unnecessary comments from the file):

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.kibana:
  host: "172.17.0.1:5601"

output.elasticsearch:
  hosts: ["172.17.0.1:9200"]

then to set it up in elasticsearch and kibana

> filebeat setup -e --dashboards

This works fine. In fact if I keep it this way everything works perfectly. I can use the collected logs in kibana and use the dashboards for NGinX I installed with the above command.

I want though to pass the logs through to Logstash. And here's my Logstash configuration uses the following pipelines:

- pipeline.id: filebeat
  path.config: "config/filebeat.conf"

filebeat.conf:

input {
  beats {
    port => 5044
  }
}


#filter {
#  mutate {
#    add_tag => ["filebeat"]
#  }
#}


output {
  elasticsearch {
    hosts => ["elasticsearch0:9200"]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }

  stdout { }
}

Making the logs go through Logstash the resulting log is just:

{
        "offset" => 6655,
      "@version" => "1",
    "@timestamp" => 2019-02-20T13:34:06.886Z,
       "message" => "10.0.2.2 - - [20/Feb/2019:08:33:58 -0500] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.98 Chrome/71.0.3578.98 Safari/537.36\" \"-\"",
          "beat" => {
         "version" => "6.5.4",
            "name" => "localhost.localdomain",
        "hostname" => "localhost.localdomain"
    },
        "source" => "/var/log/nginx/access.log",
          "host" => {
                   "os" => {
             "version" => "7 (Core)",
            "codename" => "Core",
              "family" => "redhat",
            "platform" => "centos"
        },
                 "name" => "localhost.localdomain",
                   "id" => "18e7cb2506624fb6ae2dc3891d5d7172",
        "containerized" => true,
         "architecture" => "x86_64"
    },
       "fileset" => {
          "name" => "access",
        "module" => "nginx"
    },
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
         "input" => {
        "type" => "log"
    },
    "prospector" => {
        "type" => "log"
    }
}

A lot of fields are missing from my object. There should have been many more structured information

UPDATE: This is what I'm expecting instead

{
  "_index": "filebeat-6.5.4-2019.02.20",
  "_type": "doc",
  "_id": "ssJPC2kBLsya0HU-3uwW",
  "_version": 1,
  "_score": null,
  "_source": {
    "offset": 9639,
    "nginx": {
      "access": {
        "referrer": "-",
        "response_code": "404",
        "remote_ip": "10.0.2.2",
        "method": "GET",
        "user_name": "-",
        "http_version": "1.1",
        "body_sent": {
          "bytes": "3650"
        },
        "remote_ip_list": [
          "10.0.2.2"
        ],
        "url": "/access",
        "user_agent": {
          "patch": "3578",
          "original": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.98 Chrome/71.0.3578.98 Safari/537.36",
          "major": "71",
          "minor": "0",
          "os": "Ubuntu",
          "name": "Chromium",
          "os_name": "Ubuntu",
          "device": "Other"
        }
      }
    },
    "prospector": {
      "type": "log"
    },
    "read_timestamp": "2019-02-20T14:29:36.393Z",
    "source": "/var/log/nginx/access.log",
    "fileset": {
      "module": "nginx",
      "name": "access"
    },
    "input": {
      "type": "log"
    },
    "@timestamp": "2019-02-20T14:29:32.000Z",
    "host": {
      "os": {
        "codename": "Core",
        "family": "redhat",
        "version": "7 (Core)",
        "platform": "centos"
      },
      "containerized": true,
      "name": "localhost.localdomain",
      "id": "18e7cb2506624fb6ae2dc3891d5d7172",
      "architecture": "x86_64"
    },
    "beat": {
      "hostname": "localhost.localdomain",
      "name": "localhost.localdomain",
      "version": "6.5.4"
    }
  },
  "fields": {
    "@timestamp": [
      "2019-02-20T14:29:32.000Z"
    ]
  },
  "sort": [
    1550672972000
  ]
}
2
It doesn't look like you are parsing the log message. There's an example in the logstash documentation on this: elastic.co/guide/en/logstash/6.6/… - baudsp
thanks man. that is helpful. although I thought that since filebeat, if sending directly to elasticsearch sends the full object, passing through logstash should do the same. And I had this working in my local pc. Now I can't get it to work. I wasn't doing any filtering in logstash and I had it working. that's why it's so strange to me. @baudsp - 500 Server error
I don't know. I have very little experience with filebeat, I didn't even knew it could do some parsing of its own. - baudsp
@baudsp if you have time try using filebeat directly with elasticsearch and kibana. install the dashboards and indexes like this: filebeat setup --dashboards. enable some module, even the system module: filebeat modules enable system and then run it. Try opening one of the system dashboards on kibana. it's pretty nice. - 500 Server error
@baudsp your first comment almost worked. It is parsing the data, not enough so to work with the predefined kibana dashboards but I will build them from there. Post it as an answer and I'll accept that if you want! - 500 Server error

2 Answers

5
votes

The answer provided by @baudsp was mostly correct, but it was incomplete. I had exactly the same problem, and I also had exactly the same filter mentioned in the documentation (and in @baudsp's answer), but documents in Elastic Search still did not contain any of the expected fields.

I finally found the problem: because I had Filebeat configured to send Nginx logs via the Nginx module and not the Log input, the data coming from Logbeat didn't match quite what the example Logstash filter was expecting.

The conditional in the example is if [fileset][module] == "nginx", which is correct if Filebeat was sending data from a Log input. However, since the log data is coming from the Nginx module, the fileset property doesn't contain a module property.

To make the filter work with Logstash data coming from the Nginx module, the conditional needs to be modified to look for something else. I found the [event][module] to work in place of [fileset][module].

The working filter:

filter {
  if [event][module] == "nginx" {
    if [fileset][name] == "access" {
      grok {
        match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
        remove_field => "message"
      }
      mutate {
        add_field => { "read_timestamp" => "%{@timestamp}" }
      }
      date {
        match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        remove_field => "[nginx][access][time]"
      }
      useragent {
        source => "[nginx][access][agent]"
        target => "[nginx][access][user_agent]"
        remove_field => "[nginx][access][agent]"
      }
      geoip {
        source => "[nginx][access][remote_ip]"
        target => "[nginx][access][geoip]"
      }
    }
    else if [fileset][name] == "error" {
      grok {
        match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
        remove_field => "message"
      }
      mutate {
        rename => { "@timestamp" => "read_timestamp" }
      }
      date {
        match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
        remove_field => "[nginx][error][time]"
      }
    }
  }
}

Now, documents in Elastic Search have all of the expected fields: Screenshot of Nginx access log entry in Elastic Search

Note: You'll have the same problem with other Filebeat modules, too. Just use [event][module] in place of [fileset][module].

1
votes

From your logstash configuration, it doesn't look like you are parsing the log message.

There's an example in the logstash documentation on how to parse nginx logs:

Nginx Logs

The Logstash pipeline configuration in this example shows how to ship and parse access and error logs collected by the nginx Filebeat module.

  input {
    beats {
      port => 5044
      host => "0.0.0.0"
    }
  }
  filter {
    if [fileset][module] == "nginx" {
      if [fileset][name] == "access" {
        grok {
          match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
          remove_field => "message"
        }
        mutate {
          add_field => { "read_timestamp" => "%{@timestamp}" }
        }
        date {
          match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
          remove_field => "[nginx][access][time]"
        }
        useragent {
          source => "[nginx][access][agent]"
          target => "[nginx][access][user_agent]"
          remove_field => "[nginx][access][agent]"
        }
        geoip {
          source => "[nginx][access][remote_ip]"
          target => "[nginx][access][geoip]"
        }
      }
      else if [fileset][name] == "error" {
        grok {
          match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
          remove_field => "message"
        }
        mutate {
          rename => { "@timestamp" => "read_timestamp" }
        }
        date {
          match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
          remove_field => "[nginx][error][time]"
        }
      }
    }
  }

I know it doesn't deal with why filebeat doesn't send to logstash the full object, but it should give a start on how to parse the nginx logs in logstash.