0
votes

I am building an application in Play Framework that has to do some intense file parsing. This parsing involves parsing multiple files, preferably in parallel. A user uploads an archive that gets unziped and the files are stored on the drive. In that archive there is a file (let's call it main.csv) that has multiple columns. One such column is the name of another file from the archive (like subPage1.csv). This column can be empty, so that not all rows from the main.csv have subpages.

Now, I start an Akka Actor to parse the main.csv file. In this actor, using @Inject, I have another ActorRef

public MainParser extends ActorRef {
    @Inject
    @Named("subPageParser")
    private AcgtorRef subPageParser;

    public Receive createReceive() {
        ...
        if (column[3] != null) {
            subPageParser.tell(column[3], getSelf());
        }
    } 
}

SubPageParser Props:

public static Props getProps(JPAApi jpaApi) {
    return new RoundRobinPool(3).props(Props.create((Class<?>) SubPageParser.class, jpaApi));
}

Now, my question is this. Considering that a subPage may take 5 seconds to be parsed, will I be using a single instance of SubPageParser or will there be multiple instances that do the processing in parallel.

Also, consider another scenario, where the names are stored in the DB, and I use something like this:

List<String> names = dao.getNames();
for (String name: names) {
    subPageParser.tell(name, null);
}

In this case, considering that the subPageParser ActorRef is obtained using Guice @Inject as before, will I do parallel processing?

If I am doing processing in parallel, how do I control the number of Actors that are being spawned? If I have 1000 subPages, I don't want 1000 Actors. Also, their lifetime may be an issue.

NOTE: I have an ActorsModule like this, so that I can use @Inject and not Props:

public class ActorsModule extends AbstractModule implements AkkaGuiceSupport {
    @Override
    protected void configure() {
        bindActor(MainParser.class, "mainparser");

        Function<Props, Props> props = p -> SubPageParser.getProps();
        bindActor(SubPageParser.class, "subPageParser", props);
    }
}

UPDATE: I have modified to use a RoundRobinPool. However, This does not work as intended. I specified 3 as the number of instances, but I get a new object for each parse request tin the if.

1

1 Answers

2
votes

Injecting an actor like you did will lead to one SubPageParser per MainParser. While you might send 1000 messages to it (using tell), they will get processed one by one while the others are waiting in the mailbox to be processed.

With regards to your design, you need to be aware that injecting an actor like that will create another top-level actor rather than create the SubPageParser as a child actor, which would allow the parent actor to control and monitor it. The playframework has support for injecting child actors, as described in their documentation: https://www.playframework.com/documentation/2.6.x/JavaAkka#Dependency-injecting-child-actors

While you could get akka to use a certain number of child actors to distribute the load, I think you should question why you have used actors in the first place. Most problems can be solved with simple Futures. For example you can configure a custom thread pool to run your Futures with and have them do the work at a parallelization level as you wish: https://www.playframework.com/documentation/2.6.x/ThreadPools#Using-other-thread-pools