0
votes

I am trying to update ElasticSearch indexes with the data stored into a SQL DataBase in way that every row added into the DB are added automatically into ElasticSearch.

I tried to set the Primary Key of the DB as the _id field of ElasticSearch in way that every time the schedule launches Logstash (once a minute), the documents that were alredy in ElasticSearch doesn't get re-added.

This is my Logstash .conf file:

input {
    jdbc {
        jdbc_connection_string => "JDBC-Connection-String"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
        jdbc_user => "JDBC-Connection-User"
        jdbc_driver_library => "JDBC-Driver-Path"
        statement => "SELECT MyCol1 MyCol2 FROM MyTable"
        use_column_value => true
        tracking_column => "MyCol1"
        tracking_column_type => "numeric"
        clean_run => true
        schedule => "*/1 * * * *"
    }
}
output {
    elasticsearch {
    hosts => "http://localhost:9200"
    index => "MyIndex"
    document_id => "%{MyCol1}"
}

    stdout { }
}

After Logstash finishes I find only 1 document with "_id": "%{MyCol1}" into ElasticSearch, why can't Logstash take the id value properly? P.S. MyCol1 is Primary Key of Mytable

1

1 Answers

0
votes

few things to keep in mind.

  1. value in document_id must be part of query.
  2. the id is case sensitive. so use exact name..
  3. clean_run=>false
  4. use :sql_last_value to identify the column which you want to be taken care to identify new record.