ElasticSearch 7 not taking _id from JDBC on Logstash

Question

I am trying to update ElasticSearch indexes with the data stored into a SQL DataBase in way that every row added into the DB are added automatically into ElasticSearch.

I tried to set the Primary Key of the DB as the _id field of ElasticSearch in way that every time the schedule launches Logstash (once a minute), the documents that were alredy in ElasticSearch doesn't get re-added.

This is my Logstash .conf file:

input {
    jdbc {
        jdbc_connection_string => "JDBC-Connection-String"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
        jdbc_user => "JDBC-Connection-User"
        jdbc_driver_library => "JDBC-Driver-Path"
        statement => "SELECT MyCol1 MyCol2 FROM MyTable"
        use_column_value => true
        tracking_column => "MyCol1"
        tracking_column_type => "numeric"
        clean_run => true
        schedule => "*/1 * * * *"
    }
}

output {
    elasticsearch {
    hosts => "http://localhost:9200"
    index => "MyIndex"
    document_id => "%{MyCol1}"
}

    stdout { }
}

After Logstash finishes I find only 1 document with "_id": "%{MyCol1}" into ElasticSearch, why can't Logstash take the id value properly? P.S. MyCol1 is Primary Key of Mytable

AmitG AmitG · Accepted Answer · 2020-08-26T22:43:43

few things to keep in mind.

value in document_id must be part of query.
the id is case sensitive. so use exact name..
clean_run=>false
use :sql_last_value to identify the column which you want to be taken care to identify new record.

ElasticSearch 7 not taking _id from JDBC on Logstash

1 Answers