Filebeat adds unwanted blank fields

Question

My goal is to collect logs from different servers using Filebeat and aggregate/visualize them using ElasticSearch and Kibana. For the time being, I am excluding Logstash from the scene.

So far I have been able to configure Filebeat to push logs real-time and I am able to confirm through the Kibana interface that the logs are indeed being pushed to ElasticSearch.

Problem:

The problem is that the Filebeat (or the ElasticSearch) automatically adds extra empty fields/properties to the index.

Some of the fields I can see on the Kibana interface:

aws.cloudtrail.user_identity.session_context.creation_date
azure.auditlogs.properties.activity_datetime
azure.enqueued_time
azure.signinlogs.properties.created_at
cef.extensions.agentReceiptTime
cef.extensions.deviceCustomDate1
cef.extensions.deviceCustomDate2
cef.extensions.deviceReceiptTime
cef.extensions.endTime
cef.extensions.fileCreateTime
cef.extensions.fileModificationTime
cef.extensions.flexDate1
...

They are all empty fields.

When I check the mapping for that index using GET /[index]/_mapping, I can see ~3000 fields that I didn't really add. I am not sure how these fields were added and how to remove them.

Reproduction:

Filebeat and ElasticSearch docker images I use:

elasticsearch:7.8.0
elastic/filebeat:7.8.0

On top of the base images I put basic configuration files as simple as:

# filebeat.yml

filebeat.inputs:
- type: log
  paths:
    - /path_to/my_log_file/metrics.log

output.elasticsearch:
  hosts: ["http://192.168.0.1:9200"]

# elasticsearch.yml

cluster.name: "docker-cluster"
network.host: 0.0.0.0

node.name: node-1

discovery.seed_hosts: ["127.0.0.1"]

cluster.initial_master_nodes: ["node-1"]

A typical log message would look like this:

2020-07-01 08:40:07,432 - CPUUtilization.Percent:50.0|#Level:Host|#hostname:a78f2ab3da65,timestamp:1593592807
2020-07-01 08:40:07,437 - DiskAvailable.Gigabytes:43.607460021972656|#Level:Host|#hostname:a78f2ab3da65,timestamp:1593592807

Thank you

Val Val · Accepted Answer · 2020-07-01T10:22:10

Enter the Elastic Common Schema (ECS), a godsend!

When Filebeat starts, it installs an index template with all the ECS fields from the common schema, that's why you see so many fields in your index mapping, but it's not really an issue.

Then, on the Kibana interface you see all those "empty" fields on the Table view (Discover tab). But if you switch to the JSON view, you'll see that those fields are not actually inside the document. Filebeat doesn't add them to your documents. The reason you see them in the Table view is because Kibana is requesting them (using docvalue_fields). Just click on Inspect and see the request that Kibana sends to Elasticsearch.

So there's nothing to worry about, really.

Coming to your actual message, if you were to parse CPUUtilization.Percent:50.0 you could actually store that into a standard ECS field called "system.cpu.total.pct": 50 and you could see those values evolve over time in the Metrics app in Kibana. Same thing for DiskAvailable.Gigabytes:43.607460021972656

Filebeat adds unwanted blank fields

4 Answers