I am building an application in Play Framework that has to do some intense file parsing. This parsing involves parsing multiple files, preferably in parallel.
A user uploads an archive that gets unziped and the files are stored on the drive.
In that archive there is a file (let's call it main.csv
) that has multiple columns. One such column is the name of another file from the archive (like subPage1.csv
). This column can be empty, so that not all rows from the main.csv
have subpages.
Now, I start an Akka Actor to parse the main.csv file. In this actor, using @Inject
, I have another ActorRef
public MainParser extends ActorRef {
@Inject
@Named("subPageParser")
private AcgtorRef subPageParser;
public Receive createReceive() {
...
if (column[3] != null) {
subPageParser.tell(column[3], getSelf());
}
}
}
SubPageParser Props:
public static Props getProps(JPAApi jpaApi) {
return new RoundRobinPool(3).props(Props.create((Class<?>) SubPageParser.class, jpaApi));
}
Now, my question is this. Considering that a subPage may take 5 seconds to be parsed, will I be using a single instance of SubPageParser
or will there be multiple instances that do the processing in parallel.
Also, consider another scenario, where the names are stored in the DB, and I use something like this:
List<String> names = dao.getNames();
for (String name: names) {
subPageParser.tell(name, null);
}
In this case, considering that the subPageParser
ActorRef is obtained using Guice @Inject as before, will I do parallel processing?
If I am doing processing in parallel, how do I control the number of Actors that are being spawned? If I have 1000 subPages, I don't want 1000 Actors. Also, their lifetime may be an issue.
NOTE: I have an ActorsModule like this, so that I can use @Inject and not Props:
public class ActorsModule extends AbstractModule implements AkkaGuiceSupport {
@Override
protected void configure() {
bindActor(MainParser.class, "mainparser");
Function<Props, Props> props = p -> SubPageParser.getProps();
bindActor(SubPageParser.class, "subPageParser", props);
}
}
UPDATE: I have modified to use a RoundRobinPool. However, This does not work as intended. I specified 3 as the number of instances, but I get a new object for each parse request tin the if.