1
votes

I have log_table with columns state,region,district,timestamp in sql

server.

ID     state     region     district     timestamp
--     -----     ------     --------     ---------
1       GJ        RE056       DI137      2014-02-05 09:00:00.257
2       CA        RE027       DI154      2014-02-04 14:00:00.183 
3       GJ        RE056       DI137      2014-12-09 16:00:00.257

I would like to load these records to another table in mysql with unique records and the existing data should not insert into new table in mysql while loading the data from sql server to mysql, only daily updated records should load in to the new table with out duplicate records. Help me in this issue how to do using pentaho data integration.

2
The question is not clear enough. Try putting some sample data to demonstrate what you want to achieve or something which you have tried. - Rishu Shrivastava
Dont you have a datetime column? How can we identify a record is daily updated? Please elaborate more on this. I can help you - Marlon Abeykoon
@ Marlon Abeykoon thanks for u r reply above i mentioned column timestamp it's data type is datetime - SRI

2 Answers

0
votes

I assume the timestamp column stands for last_updated_timestamp of a row from source database (sql server).

If your goal is to run the transformation daily and you expect that only new or updated records from source database will be loaded to target (mysql) database, you need to store the timestamp into target database (e.g. table log_target) and the transformation steps could be:

  • Table input (target db): Get MAX timestamp from table log_target.
    • SELECT COALESCE(MAX(timestamp), '1970-01-01 00:00:00') AS max FROM log_target
  • Table input (source db): Select updated data from log_table
    • Step settings: Insert data step (to obtain data from previous step); Replace variables in script? (true)
    • SELECT * FROM log_table WHERE (timestamp > ?)
  • Process your data
  • Table output or Insert/Update (target db): Store output data to log_target table. Don't forget to store timestamp value.
0
votes

Table Input -> sort rows (state,region,district,timestamp descending) -> Unique rows ->Insert / Update.

Sort rows timestamp descending will help you keep last modified record after remove duplicate.