Scenario is this:
- S3 Bucket full of csv files with hundreds of formatted lines each.
- N number of Mule servers. Clustered or not. Both options available.
- One unique Mule flow installed in all mule server.
- Mule flow behavior is simple. Polls S3 to lazy fetch available files, retrieve each single file contents, transform csv lines into sql statements and insert in DB.
Problems:
- All Flows from different Mule server successfully polls s3, retrieves files, process them, and insert in DB. So files and registries are processed several times.
Wish List:
- load balance is done between all active servers.
- flows installed in different mule servers are equal (we don't modify flow to get different files)
- files and registries inside them are not processed twice
Failed Approach:
- We tried a processed/non processed mechanism common to all mule servers, in clustered mode. We used Mule's 3.5 Object Store to keep a list of the files that has been processed, visible to all servers. Problem here is, we are not balancing, all workload its on one servers, rest are idle almost all time.
Questions:
- Which could be best architecture design is we want load balancing?
- Maybe we need an specific mule app to do s3 file download, and let this app to divide equally the work load between the Mule servers?