1
votes

My goal is to collect logs from different servers using Filebeat and aggregate/visualize them using ElasticSearch and Kibana. For the time being, I am excluding Logstash from the scene.

So far I have been able to configure Filebeat to push logs real-time and I am able to confirm through the Kibana interface that the logs are indeed being pushed to ElasticSearch.


Problem:

The problem is that the Filebeat (or the ElasticSearch) automatically adds extra empty fields/properties to the index.

Some of the fields I can see on the Kibana interface:

aws.cloudtrail.user_identity.session_context.creation_date
azure.auditlogs.properties.activity_datetime
azure.enqueued_time
azure.signinlogs.properties.created_at
cef.extensions.agentReceiptTime
cef.extensions.deviceCustomDate1
cef.extensions.deviceCustomDate2
cef.extensions.deviceReceiptTime
cef.extensions.endTime
cef.extensions.fileCreateTime
cef.extensions.fileModificationTime
cef.extensions.flexDate1
...

They are all empty fields.

When I check the mapping for that index using GET /[index]/_mapping, I can see ~3000 fields that I didn't really add. I am not sure how these fields were added and how to remove them.


Reproduction:

Filebeat and ElasticSearch docker images I use:

elasticsearch:7.8.0
elastic/filebeat:7.8.0

On top of the base images I put basic configuration files as simple as:

# filebeat.yml

filebeat.inputs:
- type: log
  paths:
    - /path_to/my_log_file/metrics.log

output.elasticsearch:
  hosts: ["http://192.168.0.1:9200"]
# elasticsearch.yml

cluster.name: "docker-cluster"
network.host: 0.0.0.0

node.name: node-1

discovery.seed_hosts: ["127.0.0.1"]

cluster.initial_master_nodes: ["node-1"]

A typical log message would look like this:

2020-07-01 08:40:07,432 - CPUUtilization.Percent:50.0|#Level:Host|#hostname:a78f2ab3da65,timestamp:1593592807
2020-07-01 08:40:07,437 - DiskAvailable.Gigabytes:43.607460021972656|#Level:Host|#hostname:a78f2ab3da65,timestamp:1593592807

Thank you

4

4 Answers

3
votes

Enter the Elastic Common Schema (ECS), a godsend!

When Filebeat starts, it installs an index template with all the ECS fields from the common schema, that's why you see so many fields in your index mapping, but it's not really an issue.

Then, on the Kibana interface you see all those "empty" fields on the Table view (Discover tab). But if you switch to the JSON view, you'll see that those fields are not actually inside the document. Filebeat doesn't add them to your documents. The reason you see them in the Table view is because Kibana is requesting them (using docvalue_fields). Just click on Inspect and see the request that Kibana sends to Elasticsearch.

So there's nothing to worry about, really.

Coming to your actual message, if you were to parse CPUUtilization.Percent:50.0 you could actually store that into a standard ECS field called "system.cpu.total.pct": 50 and you could see those values evolve over time in the Metrics app in Kibana. Same thing for DiskAvailable.Gigabytes:43.607460021972656

1
votes

All the input is being automatically mapped by filebeat to the 'elastic common schema' aka ECS and exported. Please have a look here for more details.

0
votes

If you define elastic index in filebeat.yml

You can go to Stack Management > Index Management and edit the filebeat index. In the settings default fields:

 "query": {
  "default_field": [
    "message",
    "tags",
    "agent.ephemeral_id",
    "agent.id",
    "agent.name",
    "agent.type",
    ...
    ]}

You can remove all the unnecessary fields from here.

0
votes

You can define your own template like this:

curl -XPUT 'localhost:9200/_template/filebeat-7.8.0' -H 'Content-Type: application/json' -d'
{
    "order" : 1,
    "index_patterns" : [
      "filebeat-7.8.0-*"
    ],
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "filebeat",
          "rollover_alias" : "filebeat-7.8.0"
        },
        "mapping" : {
          "total_fields" : {
            "limit" : "10000"
          }
        },
        "refresh_interval" : "5s"
      }
    },
    "mappings" : {
      "_meta" : {
        "beat" : "filebeat",
        "version" : "7.8.0"
      },
      "date_detection" : false,
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
    "type": {
      "type": "keyword"
    },
    "namespace": {
      "type": "keyword"
    },
    "app": {
      "type": "keyword"
    },
    "k8s-app": {
      "type": "keyword"
    },
    "node": {
      "type": "keyword"
    },
    "pod": {
      "type": "keyword"
    },
    "stream": {
      "type": "keyword"
    }
      }
    },
    "aliases" : { }
  }'

Explain:

As explained by @Val, Filebeat installs an index template named filebeat-7.8.0 when it starts. However, if the template exists, Filebeat will use it directly rather than creating a new one. The default value of setup.template.overwrite is false (https://www.elastic.co/guide/en/beats/filebeat/current/configuration-template.html). By doing this you won't have those fields in your mapping, and Kibana won't show these fields either.