0
votes

I'm using Pentaho Kettle 8.0 and I've created a transformation to migrate data between postgresql databases. This transformation reads information about orders (parent) and its items (child) and inserts or updates the target database. But I'm having problems with orders that have no items or that the transformation fails to insert the items. What I need is, every order must have at least 1 item. I've designed the transformation to lookup the order data and insert/update the target table and then lookup the items. If there is an error during these steps, how can I rollback/delete the parents?

The target tables are like this:

Orders - Order_ID, Value, Qty, Customer_ID
OrderItems - Item_ID, Value, Qty, Order_ID

1
As a general rule, prepare the correct data stream, and when done insert it. This is less dev headache and runs quicker. However, in your case what do mean by "rollback"? Restore a deleted parent?AlainD
What I mean by rollback is actually delete the parent that has no child, as I can't have an order without items. I need to insert the parent and then insert the child. But if there's any error inserting the child, I need to delete the parent.Fábio Lara
Why can you have an error inserting the child ? The issue with dealing with errors is that if you have errors correcting errors, you are usually totally hopeless.AlainD
It's because if there's an error, there will be orders with no items, and where I work this can't happen, otherwise it causes errors on our SFA/CRM software.Fábio Lara
The question is why can an error happens, not what if an error happens. The reason of this question is because is it usually easier to catch errors before they happen.AlainD

1 Answers

1
votes

I suggest you do it in two steps. First you do exactly what you do : inserting parent and child, without any concern about insertion errors. Once it is finished an other transformation clean up any parent without child.

If you need to do it in one step (for example, if the system is in production), I would produce the orders and items flows. Then, for each order lookup if there is one (or more) item and filter those orders before to write to the database. Something like this:

enter image description here

You may also count the number of items by orders, before to filter out the orders without any items.