My goal is to collect logs from different servers using Filebeat and aggregate/visualize them using ElasticSearch and Kibana. For the time being, I am excluding Logstash from the scene.
So far I have been able to configure Filebeat to push logs real-time and I am able to confirm through the Kibana interface that the logs are indeed being pushed to ElasticSearch.
Problem:
The problem is that the Filebeat (or the ElasticSearch) automatically adds extra empty fields/properties to the index.
Some of the fields I can see on the Kibana interface:
aws.cloudtrail.user_identity.session_context.creation_date
azure.auditlogs.properties.activity_datetime
azure.enqueued_time
azure.signinlogs.properties.created_at
cef.extensions.agentReceiptTime
cef.extensions.deviceCustomDate1
cef.extensions.deviceCustomDate2
cef.extensions.deviceReceiptTime
cef.extensions.endTime
cef.extensions.fileCreateTime
cef.extensions.fileModificationTime
cef.extensions.flexDate1
...
They are all empty fields.
When I check the mapping for that index using GET /[index]/_mapping
, I can see ~3000 fields that I didn't really add. I am not sure how these fields were added and how to remove them.
Reproduction:
Filebeat and ElasticSearch docker images I use:
elasticsearch:7.8.0
elastic/filebeat:7.8.0
On top of the base images I put basic configuration files as simple as:
# filebeat.yml
filebeat.inputs:
- type: log
paths:
- /path_to/my_log_file/metrics.log
output.elasticsearch:
hosts: ["http://192.168.0.1:9200"]
# elasticsearch.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
node.name: node-1
discovery.seed_hosts: ["127.0.0.1"]
cluster.initial_master_nodes: ["node-1"]
A typical log message would look like this:
2020-07-01 08:40:07,432 - CPUUtilization.Percent:50.0|#Level:Host|#hostname:a78f2ab3da65,timestamp:1593592807
2020-07-01 08:40:07,437 - DiskAvailable.Gigabytes:43.607460021972656|#Level:Host|#hostname:a78f2ab3da65,timestamp:1593592807
Thank you