We have a few external applications in cloud (IBM Bluemix) which logs its application syslogs in the bluemix logmet service which internally uses the ELK stack.
Now on a periodic basis, we would like to download the logs from the cloud and upload it into a local Elastic/Kibana instance. This is because storing logs in cloud services incurs cost and additional cost if we want to search the same by Kibana. The local elastic instance can delete/flush old logs which we don't need.
The downloaded logs will look like this
{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"Hello","type":"syslog","event_uuid":"474b78aa-6012-44f3-8692-09bd667c5822","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","@timestamp":"2017-09-29T02:30:15.598Z","message_type_str":"OUT","@version":"1","space_name_str":"prod","application_id_str":"3104b522-aba8-48e0-aef6-6291fc6f9250","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:15.598Z"}
{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"EFG","type":"syslog","event_uuid":"d902dddb-afb7-4f55-b472-211f1d370837","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","@timestamp":"2017-09-29T02:30:28.636Z","message_type_str":"OUT","@version":"1","space_name_str":"prod","application_id_str":"dcd9f975-3be3-4451-a9db-6bed1d906ae8","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:28.636Z"}
I have created an index in our local elasticsearch as
curl -XPUT 'localhost:9200/commslog?pretty' -H 'Content-Type: application/json' -d'
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"logs" : {
"properties" : {
"instance_id_str" : { "type" : "text" },
"source_id_str" : { "type" : "text" },
"app_name_str" : { "type" : "text" },
"message" : { "type" : "text" },
"type" : { "type" : "text" },
"event_uuid" : { "type" : "text" },
"ALCH_TENANT_ID" : { "type" : "text" },
"logmet_cluster" : { "type" : "text" },
"org_name_str" : { "type" : "text" },
"@timestamp" : { "type" : "date" },
"message_type_str" : { "type" : "text" },
"@version" : { "type" : "text" },
"space_name_str" : { "type" : "text" },
"application_id_str" : { "type" : "text" },
"ALCH_ACCOUNT_ID_str" : { "type" : "text" },
"org_id_str" : { "type" : "text" },
"timestamp" : { "type" : "date" }
}
}
}
}'
Now to bulk upload the file, used the command
curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '@commslogs.json'
The above command throws an error
Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]
The solution is to follow the rules for bulk upload as per
https://discuss.elastic.co/t/bulk-insert-file-having-many-json-entries-into-elasticsearch/46470/2
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
So i manually changed few of the log statements by adding action before every line
{ "index" : { "_index" : "commslog", "_type" : "logs" } }
This works!!.
Another option was to call the curl command, providing the _idex and _type in the path
curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '@commslogs.json'
but without the action, this too throws the same error
The problem is we cannot do this for thousands of log records we get. Is there an option where once we download the log files from Bluemix and upload the files without adding the action.
NOTE We are not using logstash at the moment, but
- is it possible to use logstash and just use grok to transform the logs and add the necessary entries?
How can we bulk upload documents via Logstash?
Is logstash the ideal solution or we can just write a program to transform and do that
Thanks