Mule ESB - design a multi file processing flow when files are dependent on each other

Question

Okie...I am aware of Datamappers, Batch and streaming support in File inbound elements. What i want to know is the design pattern for integration when:

You have multiple files (csv or xml) to process eg: 1 file named products.csv contains all details about products. Another file images.csv has URLs to images of each product listed in the products.csv file. Another file lets say prices.csv has details of prices of each product.
All the files are linked to each other using a PK type e.g: product SKU or product ID. So each line in the products.csv has one of more links in images.csv and a line in prices.csv.
You need to process all the files and save in DB or consolidate into a single XML or JSON. I mean the idea is to make a VO or an entity where the product has a list of images and has a price. All 'has a' relations are navigable from the product object/entity.

How do you folks propose to design this using Mule ESB. I know about designs for a single CSV. Using the batch flows, you read the file in using a streaming file connector, then use a streaming data mapper to extract data and then transform the data into VOs and put in DB. This is straight forward. Adding batch-commit here at DB insert level or the whole setup improves performance also. But what to do when you have multiple files as i said in my scenario?

Do you any control on the order into which these 3 files are created? How often will Mule need to load them? — David Dossot
Yes order can be managed. Dependent files would probably come together and it would be on me to choose their reading order. Frequency is still debatable. All files in a set would need to be processed at least once in 2 hours or so. — Nazgul

David Dossot David Dossot · Accepted Answer · 2015-02-10T17:14:57

This has been asked on StackOverflow several times, with different wording. Usually the answer is to have a file inbound-endpoint pick one of the many files and then pick the other files down in the flow with a requester.

See: https://github.com/mulesoft/mule-module-requester

In your case, the main file would be available as an input stream while the images and prices lookup files would be loaded in memory (in Maps for ex) so you can access them quickly when processing the main stream.

Mule ESB - design a multi file processing flow when files are dependent on each other

1 Answers