I am trying to import all of our orders data into elasticsearch using logstash jdbc input. The sql query involves joins across 7 tables and there is no indexing possible after joining and there is one to many relationship between tables. There are around 3M rows after joining.
I want to import the order data day by day as there will be less no of records instead of running the whole query against the complete orders till date and paginating them. If I am querying the whole data, database is struck at calculating the no of records itself.
How can I run the import day after day without stopping the logstash and configuring the SQL date parameters dynamically.
This is my current logstash job configuration
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://dbhostname:3306/mydatabase?zeroDateTimeBehavior=CONVERT_TO_NULL"
jdbc_user => "username"
jdbc_password => "password"
jdbc_driver_library => "/home/username/mysql-connector-java-8.0.11.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "select * from table1 t1, table2 t2 ,table3 t3 , table4 t4, table5 t5, table6 t6 where some_condition and some_other_condition and t1.created_at between 'date1_start' and 'date1_end'"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "orderdata_prod"
"document_type" => "data"
}
}
I want to dynamically change the statement provided in the configuration with new date values. How can I achieve that? Or are there any alternate ways to import such data?