0
votes

I am collecting log data using filebeat 7.x but I am facing a problem that the log size is so big (100GB per day).

Now I am thinking how we can collect the error level log from the source file. What is the best way to do this?

I am using filebeat to send logs to elasticsearch which is in Kubernetes cluster, my concern here is should I must use kafka and logstash to define the rule?

Please find below the filebeat config file being used:

{
    "filebeat.yml": "filebeat.inputs:
        - type: container
          paths:
            - /var/log/containers/*.log
          processors:
          - add_kubernetes_metadata:
              host: ${NODE_NAME}
              matchers:
              - logs_path:
                  logs_path: \"/var/log/containers/\"

        output.elasticsearch:
          host: '${NODE_NAME}'
          hosts: '${ELASTICSEARCH_HOSTS:elasticsearch-master:9200}'
        "
}
2

2 Answers

1
votes

I would recommend you to configure the flow as :

Filebeat → Kafka → Logstash → ElasticSearch → Kibana

  1. Filebeat reads & push logs from your server/s to Kafka topic/s as configured.

  2. Then, Logstash will subscribe to those logs from kafka topic and perform parsing/filtering/formatting/exclude and include fields as per requirement and send processed log data to Elasticsearch Index.

  3. Visualize you data via dashboard

0
votes

Kafka is not a "must" if you are asking that as the question. If you are limited by the amount of data that you can capture in your elastic cluster, then the parsing logic could be applied in your logstash configuration to parse and filter the log that you need. Filebeat would send the data to logstash and logstash would send it to elastic after parsing and filtering.