Bulk upload log messages to local Elasticsearch

Question

We have a few external applications in cloud (IBM Bluemix) which logs its application syslogs in the bluemix logmet service which internally uses the ELK stack.

Now on a periodic basis, we would like to download the logs from the cloud and upload it into a local Elastic/Kibana instance. This is because storing logs in cloud services incurs cost and additional cost if we want to search the same by Kibana. The local elastic instance can delete/flush old logs which we don't need.

The downloaded logs will look like this

{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"Hello","type":"syslog","event_uuid":"474b78aa-6012-44f3-8692-09bd667c5822","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","@timestamp":"2017-09-29T02:30:15.598Z","message_type_str":"OUT","@version":"1","space_name_str":"prod","application_id_str":"3104b522-aba8-48e0-aef6-6291fc6f9250","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:15.598Z"}

{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"EFG","type":"syslog","event_uuid":"d902dddb-afb7-4f55-b472-211f1d370837","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","@timestamp":"2017-09-29T02:30:28.636Z","message_type_str":"OUT","@version":"1","space_name_str":"prod","application_id_str":"dcd9f975-3be3-4451-a9db-6bed1d906ae8","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:28.636Z"}

I have created an index in our local elasticsearch as

curl -XPUT 'localhost:9200/commslog?pretty' -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "logs" : {
            "properties" : {
                "instance_id_str" : { "type" : "text" },
                "source_id_str" : { "type" : "text" },
                "app_name_str" : { "type" : "text" },
                "message" : { "type" : "text" },
                "type" : { "type" : "text" },
                "event_uuid" : { "type" : "text" },
                "ALCH_TENANT_ID" : { "type" : "text" },
                "logmet_cluster" : { "type" : "text" },
                "org_name_str" : { "type" : "text" },
                "@timestamp" : { "type" : "date" },
                "message_type_str" : { "type" : "text" },
                "@version" : { "type" : "text" },
                "space_name_str" : { "type" : "text" },
                "application_id_str" : { "type" : "text" },
                "ALCH_ACCOUNT_ID_str" : { "type" : "text" },
                "org_id_str" : { "type" : "text" },
                "timestamp" : { "type" : "date" }
            }
        }
    }
}'

Now to bulk upload the file, used the command

curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '@commslogs.json'

The above command throws an error

Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]

The solution is to follow the rules for bulk upload as per

https://discuss.elastic.co/t/bulk-insert-file-having-many-json-entries-into-elasticsearch/46470/2

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

So i manually changed few of the log statements by adding action before every line

{ "index" : { "_index" : "commslog", "_type" : "logs" } }

This works!!.

Another option was to call the curl command, providing the _idex and _type in the path

curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '@commslogs.json'

but without the action, this too throws the same error

The problem is we cannot do this for thousands of log records we get. Is there an option where once we download the log files from Bluemix and upload the files without adding the action.

NOTE We are not using logstash at the moment, but

is it possible to use logstash and just use grok to transform the logs and add the necessary entries?
How can we bulk upload documents via Logstash?
Is logstash the ideal solution or we can just write a program to transform and do that

Thanks

Filebeat should be able to write json logs straight to your local elasticsearch. — Alain Collins
Thanks @AlainCollins. I did try Filebeats and was able to upload logs directly into ES. — Tatha

baudsp baudsp · Accepted Answer · 2017-10-11T11:37:42

As @Alain Collins said, you should be able to use filebeat directly.

For logstash:

it should be possible to use logstash, but rather than using grok, you should use the json codec/filter, it would be much easier.
You can use the file input with logstash to process many files and wait for it to finish (to know when it's finished, use a file/stdout, possibly with the dot codec, and wait for it to stop writing).
Instead of just transforming the files with logstash, you should directly upload to elasticsearch (with the elasticsearch output).

As for your problem, I think it will be much easier to just use a small program to add the missing action line or use filebeat, unless you are experimented enough with logstash config to write and logstash config quicker than a program adding one line everywhere in the document.

Bulk upload log messages to local Elasticsearch

1 Answers