0
votes

In Pentaho Kettle, say one wants to copy data between two identical tables, A and B. If there is a column that is auto incremented is there a way to preserve the value of that column when copying the data from A to B? Is the behaviour of the auto incremented columns database specific or has Kettle found a general solution for this?

2

2 Answers

1
votes

If you have two tables, identically defined with regard to their columns, and there's an auto-incrementing column in both tables, and then you populate one of the tables with data, you will be incrementing the number in table 1. Now, in order to insert those rows into the mirror table, you would have to extract them from the first table in the same order that they were inserted, so that they can be inserted into the mirror table in the identical order. Then, and only then, would the auto-incrementing numbers jibe. That said, this would be considered fragile design.

The alternative would be to make the tables identical with regard to column datatypes, but not have an auto-incrementing column in the mirror table. Just make that column in the mirror table an integer, without any auto-increment capability.

1
votes

Tim's answer is reasonable. I would normally do the same: Make table B identical to table A except for making table B's PK not an auto-incrementing column.

(Attempting to always insert the data in the same order would be a bad idea. It would certainly be fragile, as Tim mentioned. But really, it would be worse than that. First, you could not do bulk inserts. You would need to commit each row individually. Also, you normally cannot be certain that the next value used will be the next integer. It's the DBMSs responsibility to pick the next value. There are lots of situations where the next value inserted will not be the next higher integer.)

But an important detail is that DBMSs handle auto-incrementing fields differently. In many cases the behavior of the auto-incrementing field is to provide a value if the inserted value is NULL but to accept an explicit value if one is provided. In other cases the database will reject attempts to insert a value into an auto-incrementing field.

So if you are facing the former case then it's really not a problem to leave table A and table B as completely identical. Your ETL job will insert values into table B, and the auto-incrementing nature of the PK column will just be ignored.