I am trying to move data from S3 (.csv file's data) to elastic search cluster using logstash using custom templete. But it only shows docs.count=1 and rest of the records as docs.deleted when i check using following query in Kibana:-
GET /_cat/indices?v
My first question is :-
- why only one record [the last one] is transmitted and others are transmitted as deleted ?
Now when I query this index using below query in Kibana :-
GET /my_file_index/_search
"query": {
"match_all": {}
I get only one record with comma separated data in "message" :
field, So the second question is :-
- How can I get the data with column names just like in csv as I have specified all column mappings in my template file which is fed into logstash ?
I tried giving columns field in logstash csv filter also but no luck.
columns => ["col1", "col2",...]
Any help would be appreciated.
EDIT-1: below is my logstash.conf file:-
input {
access_key_id => "xxx"
secret_access_key => "xxxx"
region => "eu-xxx-1"
bucket => "xxxx"
prefix => "abc/stocks_03-jul-2018.csv"
filter {
csv {
separator => ","
columns => ["AAA","BBB","CCC"]
output {
amazon_es {
index => "my_r_index"
document_type => "my_r_index"
hosts => "vpc-totemdev-xxxx.eu-xxx-1.es.amazonaws.com"
region => "eu-xxxx-1"
aws_access_key_id => 'xxxxx'
aws_secret_access_key => 'xxxxxx+xxxxx'
document_id => "%{id}"
template => "templates/template_2.json"
template_name => "my_r_index"
Note: Version of logstash : 6.3.1 Version of elasticsearch : 6.2
EDIT:-2 Adding template_2.json file along with sample csv header :-
1. Mapping file :-
"template" : "my_r_index",
"settings" : {
"index" : {
"number_of_shards" : 50,
"number_of_replicas" : 1
"index.codec" : "best_compression",
"index.refresh_interval" : "60s"
"mappings" : {
"_default_" : {
"_all" : { "enabled" : false },
"properties" : {
"type" : "keyword"
"type" : "integer"
"type" : "integer"
"type" : "date"
"type" : "keyword"
"type" : "keyword"
"ABC" : {
"type" : "integer"
"PQR" : {
"type" : "keyword"
"KKK" : {
"type" : "keyword"
"type" : "text",
"index" : "false"
"SOMEID" : {
"type" : "double",
"index" : "false"
"DDD" : {
"type" : "double",
"index" : "false"
"EEE" : {
"type" : "double",
"index" : "false"
"FFF" : {
"type" : "double",
"index" : "false"
"GGG" : {
"type" : "text",
"index" : "false"
"LLL" : {
"type" : "double",
"index" : "false"
"MMM" : {
"type" : "double",
"index" : "false"
"NNN" : {
"type" : "double",
"index" : "false"
"OOO" : {
"type" : "double",
"index" : "false"
"PPP" : {
"type" : "text",
"index" : "false"
"QQQ" : {
"type" : "integer",
"index" : "false"
"RRR" : {
"type" : "double",
"index" : "false"
"SSS" : {
"type" : "double",
"index" : "false"
"TTT" : {
"type" : "double",
"index" : "false"
"UUU" : {
"type" : "double",
"index" : "false"
"VVV" : {
"type" : "text",
"index" : "false"
"WWW" : {
"type" : "double",
"index" : "false"
"XXX" : {
"type" : "double",
"index" : "false"
"YYY" : {
"type" : "double",
"index" : "false"
"ZZZ" : {
"type" : "double",
"index" : "false"
"type" : "text",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "text",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "text",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "integer",
"index" : "false"
"type" : "double",
"index" : "false"
"CLIECE" : {
"type" : "double",
"index" : "false"
"type" : "text",
"index" : "false"
"type" : "text",
"index" : "false"
"type" : "text",
"index" : "false"
"type" : "text",
"index" : "false"
"RRF" : {
"type" : "double",
"index" : "false"
"SRF" : {
"type" : "double",
"index" : "false"
"CNRF" : {
"type" : "double",
"index" : "false"
"CTRF" : {
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"CONSLE" : {
"type" : "double",
"index" : "false"
"CLIDLE" : {
"type" : "double",
"index" : "false"
"type" : "text",
"index" : "false"
"type" : "double",
"index" : "false"
"SMIUM" : {
"type" : "double",
"index" : "false"
"CONIUM" : {
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"type" : "text",
"index" : "false"
"COT" : {
"type" : "double",
"index" : "false"
"type" : "double",
"index" : "false"
"IME" : {
"type" : "keyword"
"KKE" : {
"type" : "keyword"
My excel content as:-
Header : Actual header is quite lengthy as have lot many columns, please consider other column names similar to below in continuation.
First row : Again column values as below some columns has blank values , I have mentioned above real template file (in mapping file above) which has all column values.
KKK-LMN 2 1815 6/25/2018
PPL-ORL 2 1815 6/25/2018
SLB-ORD 2 1815 6/25/2018
3. Kibana query output
Query :
GET /my_r_index/_search
"query": {
"match_all": {}
"_index": "my_r_index",
"_type": "my_r_index",
"_id": "IjjIZWUBduulDsi0vYot",
"_score": 1,
"_source": {
"@version": "1",
"message": "XXX-XXX-XXX-USD,2,3190,2018-07-03,UNITED STATES,USD,300,60,Put,2042-12-19,,,,.009108041,q,,,,.269171754,q,,,,,.024127966,q,,,,68.414017367,q,,,,.298398645,q,,,,.502677959,q,,,,,0.040880692400344164,q,,,,,,,159.361792143,,,,.631296636,q,,,,.154877384,q,,42.93,N,Y,\n",
"@timestamp": "2018-08-23T07:56:06.515Z"
...Other similar records as above.
Sample output after using autodetect_column_names => true :-
"took": 4,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"skipped": 0,
"failed": 0
"hits": {
"total": 3,
"max_score": 1,
"hits": [
"_index": "indr",
"_type": "logs",
"_id": "hAF1aWUBS_wbCH7ZG4tW",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
"6/25/2018": "6/25/2018",
"@timestamp": "2018-08-24T01:03:26.436Z",
"@version": "1"
"_index": "indr",
"_type": "logs",
"_id": "kP11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
"6/25/2018": "4/22/2018",
"@timestamp": "2018-08-24T01:03:26.436Z",
"@version": "1"
"_index": "indr",
"_type": "logs",
"_id": "j_11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "SERVICE",
"1815": "CLIENT",
"message": """
"6/25/2018": "VALUATIONDATE",
"@timestamp": "2018-08-24T01:03:26.411Z",
"@version": "1"
, right? The problem is that in your CSV you are not extracting a column whose name isid
and that's what you're using indocument_id => "%{id}"
hence all rows are getting deleted excepted the last one. – Val