I use the ELK stack to parse CSV files and send them to ElasticSearch after parsing them with logstash.
Unfortunately, I have a problem:
When I send my files to the listening directory of the "input" of my logstash pipeline, the records are doubled, see triplets, without my asking anything ...
Indeed :
This is what my pipeline looks like:
input {
file {
path => "/home/XXX/report/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ";"
columns => ["Name", "Status", "Category", "Type", "EndPoint", "Group", "Policy", "Scanned At", "Reported At", "Affected Application"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "malwarebytes-report"
}
stdout {}
}
When I send my first file containing 28 records in "/home/XXX/report/", this is what ElasticSearch says:
[root @ lrtstfpe1 confd]#curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open malwarebytes-report PO4g6rKRTb6yuMDb7i-6sg 5 1 28 0 25.3kb 25.3kb
So far so good, but when I send my second file of 150 records ...:
[root @ lrtstfpe1 confd]#curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open malwarebytes-report PO4g6rKRTb6yuMDb7i-6sg 5 1 328 0 263.3kb 263.3kb
The 150 recordings have been doubled and added to the first 28 ...
What's going on ??
Several days that I am stuck on the problem, I really need you ..
UPDATE :
You need to look in /etc/logstash/conf.d and see if there are any other config files there
The problem is that I only have one pipeline in this folder ... So:
I just completely uninstalled the ELK stack (rpm -e elasticsearch kibana logstash filebeat) as well as any ELK traces (rm -rf /var/lib/ELK/ var/log/ELK/ etc/default/ELK /usr/share/ELK ...) So, nothing anywhere.
I just reinstall everything:
rpm -ivh elasticsearch-6.2.3.rpm
rpm -ivh kibana-6.2.3-x86_64.rpm
rpm -ivh logstash-6.2.3.rpm
And start the services: service ELK restart
Then, in terms of configurations:
/etc/elasticsearch.yml is completely by default.
/etc/kibana.yml is completely by default.
/etc/logstash.yml is completely by default.
Then, I put my one and ONLY pipeline named "pip.conf" in /etc/logstash/conf.d/
Its configuration:
input {
file {
path => "/home/report/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ";"
columns => ["Name","Status","Category","Type","EndPoint","Group","Policy","Scanned At","Reported At","Affected Application"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "malwarebytes-report"
}
stdout{}
}
And finally, I launch my pipeline :
I go into /usr/share/logstash and I execute :
bin/logstash -f /etc/logstash/conf.d/pip.conf
After few secondes, my pipeline is listening, and now, I put my file1.csv and my file2.csv into /home/report/.
file1.csv contains 28 records and file2.csv contains 150 records.
But now, when I check my index : curl -XGET 'localhost:9200/_cat/indices?v&pretty'
My index "malwarebytes-report" contains 357 records ... (150x2 + 28x2 ...)
I don't understand NOTHING ....
sincedb_pathis set to null so Logstash might re-read all files everytime. Can you try to setsincedb_pathto something meaningful so it can remember what it has read already? Are you running Logstash several times? - Val