14
votes

I am trying to setup a simple application using spring integration. The goal is to simply use a file inbound channel adapter to monitor a directory for new files and process files as they are added. For simplicity the processing the files at the moment is simply logging some output (name of file being processed). I do however want to process files in a multithreaded fashion. So lets say 10 files are picked up and should be processed in parallel and once these are completed only then we move on to the next 10 files.

For that I tried two different approaches and both seem to work similarly and I wanted to understand the differences between using poller or dispatcher for something like this.

Approach #1 - Using poller

<int-file:inbound-channel-adapter id="filesIn" directory="in">
        <int:poller fixed-rate="1" task-executor="executor" />
</int-file:inbound-channel-adapter>

<int:service-activator ref="moveToStage" method="move" input-channel="filesIn" />

<task:executor id="executor" pool-size="5" queue-capacity="0" rejection-policy="DISCARD" />

So here the idea as I understand is that we are constantly polling the directory and as soon as a file is received its sent to filesIn channel until the pool limit is reached. Then until the pool is occupied no additional files are sent even though im assuming the polling still continues in the background. This seems to work but I am not sure if using the max messages per poll can be helpful here to decrease the polling frequency. By setting the max messages per poll close to pool size.

Approach #2 - Using dispatcher

<int-file:inbound-channel-adapter id="filesIn" directory="in">
    <int:poller fixed-rate="5000" max-messages-per-poll="3" />
</int-file:inbound-channel-adapter>

<int:bridge input-channel="filesIn" output-channel="filesReady" />

<int:channel id="filesReady">
    <int:dispatcher task-executor="executor"/>
</int:channel>

<int:service-activator ref="moveToStage" method="move" input-channel="filesInReady" />

<task:executor id="executor" pool-size="5" queue-capacity="0" rejection-policy="CALLER_RUNS" />

okay so here the poller is not using the executor so I am assuming its polling in a sequential fashion. Every poll 3 files should be picked up and then sent to filesReady channel which then uses the dispatcher to pass the files on to the service activator and because it uses the executor for dispatcher it immediately returns control and allows the filesIn channel to send more files.

I guess my question is am I understanding both approaches correctly and if one is better than other.

Thanks

1

1 Answers

8
votes

Yes, your understanding is correct.

Generally, I would say that polling every millisecond (and discarding the poll when the queue is full) is a waste of resources (CPU and I/O).

Also, increasing the max messages per poll in the first case won't help because the poll is done on the executor thread (the scheduler hands off the poll to the executor and that thread will handle the mmpp).

In the second case, since the scheduler thread hands off during the poll (rather than before it), the mmpp will work as expected.

So, in general, your second implementation is best (as long as you can live with an average 2.5 second delay when a new file(s) arrives).