Scala actors inefficiency issue

Question

Let me start out by saying that I'm new to Scala; however, I find the Actor based concurrency model interesting, and I tried to give it a shot for a relatively simple application. The issue that I'm running into is that, although I'm able to get the application to work, the result is far less efficient (in terms of real time, CPU time, and memory usage) than an equivalent Java based solution that uses threads that pull messages off an ArrayBlockingQueue. I'd like to understand why. I suspect that it's likely my lack of Scala knowledge, and that I'm causing all the inefficiency, but after several attempts to rework the application without success, I decided to reach out to the community for help.

My problem is this: I have a gzipped file with many lines in the format of:

SomeID comma_separated_list_of_values

For example:

1234 12,45,82

I'd like to parse each line and get an overall count of the number of occurrences of each value in the comma separated list.

This file may be pretty large (several GB compressed), but the number of unique values per file is pretty small (at most 500). I figured this would be a pretty good opportunity to try to write an Actor-based concurrent Scala application. My solution involves a main driver that creates a pool of parser Actors. The main driver then reads lines from stdin, passes the line off to an Actor that parses the line and keeps a local count of the values. When the main driver has read the last line, it passes a message to each actor indicating that all lines have been read. When the actor receive the 'done' message, they pass their counts to an aggregator that sums the counts from all actors. Once the counts from all parsers have been aggregated, the main driver prints out the statistics.

The problem: The main issue that I'm encountering is the incredible amount of inefficiency of this application. It uses far more CPU and far more memory than an "equivalent" Java application that uses threads and an ArrayBlockingQueue. To put this in perspective, here are some stats that I gathered for a 10 million line test input file:

Scala 1 Actor (parser):

    real    9m22.297s
    user    235m31.070s
    sys     21m51.420s

Java 1 Thread (parser):

    real    1m48.275s
    user    1m58.630s
    sys     0m33.540s

Scala 5 Actors:

    real    2m25.267s
    user    63m0.730s
    sys     3m17.950s

Java 5 Threads:

    real    0m24.961s
    user    1m52.650s
    sys     0m20.920s

In addition, top reports that the Scala application has about 10x the resident memory size. So we're talking about orders of magnitude more CPU and memory here for orders of magnitude worse performance, and I just can't figure out what is causing this. Is it a GC issue, or am I somehow creating far more copies of objects than I realize?

Additional details that may or may not be of importance:

The scala application is wrapped by a Java class so that I could deliver a self-contained executable JAR file (I don't have the Scala jars on every machine that I might want to run this app).
The application is being invoked as follows: gunzip -c gzFilename | java -jar StatParser.jar

Here is the code:

Main Driver:

import scala.actors.Actor._
import scala.collection.{ immutable, mutable }
import scala.io.Source

class StatCollector (numParsers : Int ) {
    private val parsers = new mutable.ArrayBuffer[StatParser]()
    private val aggregator = new StatAggregator()

    def generateParsers {
        for ( i <- 1 to numParsers ) {
            val parser = new StatParser( i, aggregator )
            parser.start
            parsers += parser
        }
    }


    def readStdin {
        var nextParserIdx = 0
        var lineNo = 1
        for ( line <- Source.stdin.getLines() ) {
            parsers( nextParserIdx ) ! line
            nextParserIdx += 1
            if ( nextParserIdx >= numParsers ) {
                nextParserIdx = 0
            }
            lineNo += 1
        }
    }

    def informParsers {
        for ( parser <- parsers ) {
            parser ! true
        }
    }

    def printCounts {
        val countMap = aggregator.getCounts()
        println( "ID,Count" )
        /*
        for ( key <- countMap.keySet ) {
            println( key + "," + countMap.getOrElse( key, 0 ) )
            //println( "Campaign '" + key + "': " + countMap.getOrElse( key, 0 ) )
        }
        */
        countMap.toList.sorted foreach {
            case (key, value) =>
                println( key + "," + value )
        }
    }

    def processFromStdIn {
        aggregator.start

        generateParsers

        readStdin
        process
    }

    def process {

        informParsers

        var completedParserCount = aggregator.getNumParsersAggregated
        while ( completedParserCount < numParsers ) {
            Thread.sleep( 250 )
            completedParserCount = aggregator.getNumParsersAggregated
        }

        printCounts
    }
}

The Parser Actor:

import scala.actors.Actor
import collection.mutable.HashMap
import scala.util.matching

class StatParser( val id: Int, val aggregator: StatAggregator ) extends Actor {

    private var countMap = new HashMap[String, Int]()
    private val sep1 = "\t"
    private val sep2 = ","


    def getCounts(): HashMap[String, Int] = {
        return countMap
    }

    def act() {
        loop {
            react {
                case line: String =>
                    {
                        val idx = line.indexOf( sep1 )
                        var currentCount = 0
                        if ( idx > 0 ) {
                            val tokens = line.substring( idx + 1 ).split( sep2 )
                            for ( token <- tokens ) {
                                if ( !token.equals( "" ) ) {
                                    currentCount = countMap.getOrElse( token, 0 )
                                    countMap( token ) = ( 1 + currentCount )
                                }
                            }

                        }
                    }
                case doneProcessing: Boolean =>
                    {
                        if ( doneProcessing ) {
                            // Send my stats to Aggregator
                            aggregator ! this
                        }
                    }
            }
        }
    }
}

The Aggregator Actor:

import scala.actors.Actor
import collection.mutable.HashMap

class StatAggregator extends Actor {
    private var countMap = new HashMap[String, Int]()
    private var parsersAggregated = 0

    def act() {
        loop {
            react {
                case parser: StatParser =>
                    {
                        val cm = parser.getCounts()
                        for ( key <- cm.keySet ) {
                            val currentCount = countMap.getOrElse( key, 0 )
                            val incAmt = cm.getOrElse( key, 0 )
                            countMap( key ) = ( currentCount + incAmt )
                        }
                        parsersAggregated += 1
                    }
            }
        }
    }

    def getNumParsersAggregated: Int = {
        return parsersAggregated
    }

    def getCounts(): HashMap[String, Int] = {
        return countMap
    }
}

Any help that could be offered in understanding what is going on here would be greatly appreciated.

Thanks in advance!

---- Edit ---

Since many people responded and asked for the Java code, here is the simple Java app that I created for comparison purposes. I realize that this is not great Java code, but when I saw the performance of the Scala application, I just whipped up something quick to see how a Java Thread-based implementation would perform as a base-line:

Parsing Thread:

import java.util.Hashtable;
import java.util.Map;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.TimeUnit;

public class JStatParser extends Thread
{
    private ArrayBlockingQueue<String> queue;
    private Map<String, Integer> countMap;
    private boolean done;

    public JStatParser( ArrayBlockingQueue<String> q )
    {
        super( );
        queue = q;
        countMap = new Hashtable<String, Integer>( );
        done = false;
    }

    public Map<String, Integer> getCountMap( )
    {
        return countMap;
    }

    public void alldone( )
    {
        done = true;
    }

    @Override
    public void run( )
    {
        String line = null;
        while( !done || queue.size( ) > 0 )
        {
            try
            {
                // line = queue.take( );
                line = queue.poll( 100, TimeUnit.MILLISECONDS );
                if( line != null )
                {
                    int idx = line.indexOf( "\t" ) + 1;
                    for( String token : line.substring( idx ).split( "," ) )
                    {
                        if( !token.equals( "" ) )
                        {
                            if( countMap.containsKey( token ) )
                            {
                                Integer currentCount = countMap.get( token );
                                currentCount++;
                                countMap.put( token, currentCount );
                            }
                            else
                            {
                                countMap.put( token, new Integer( 1 ) );
                            }
                        }
                    }
                }
            }
            catch( InterruptedException e )
            {
                // TODO Auto-generated catch block
                System.err.println( "Failed to get something off the queue: "
                        + e.getMessage( ) );
                e.printStackTrace( );
            }
        }
    }
}

Driver:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Hashtable;
import java.util.List;
import java.util.Map;
import java.util.TreeSet;
import java.util.concurrent.ArrayBlockingQueue;

public class JPS
{
    public static void main( String[] args )
    {
        if( args.length <= 0 || args.length > 2 || args[0].equals( "-?" ) )
        {
            System.err.println( "Usage: JPS [filename]" );
            System.exit( -1 );
        }

        int numParsers = Integer.parseInt( args[0] );
        ArrayBlockingQueue<String> q = new ArrayBlockingQueue<String>( 1000 );
        List<JStatParser> parsers = new ArrayList<JStatParser>( );

        BufferedReader reader = null;

        try
        {
            if( args.length == 2 )
            {
                reader = new BufferedReader( new FileReader( args[1] ) );
            }
            else
            {
                reader = new BufferedReader( new InputStreamReader( System.in ) );
            }

            for( int i = 0; i < numParsers; i++ )
            {
                JStatParser parser = new JStatParser( q );
                parser.start( );
                parsers.add( parser );
            }

            String line = null;
            while( (line = reader.readLine( )) != null )
            {
                try
                {
                    q.put( line );
                }
                catch( InterruptedException e )
                {
                    // TODO Auto-generated catch block
                    System.err.println( "Failed to add line to q: "
                            + e.getMessage( ) );
                    e.printStackTrace( );
                }
            }

            // At this point, we've put everything on the queue, now we just
            // need to wait for it to be processed.
            while( q.size( ) > 0 )
            {
                try
                {
                    Thread.sleep( 250 );
                }
                catch( InterruptedException e )
                {
                }
            }

            Map<String,Integer> countMap = new Hashtable<String,Integer>( );
            for( JStatParser jsp : parsers )
            {
                jsp.alldone( );
                Map<String,Integer> cm = jsp.getCountMap( );
                for( String key : cm.keySet( ) )
                {
                    if( countMap.containsKey( key ))
                    {
                        Integer currentCount = countMap.get(  key );
                        currentCount += cm.get( key );
                        countMap.put( key, currentCount );
                    }
                    else
                    {
                        countMap.put(  key, cm.get( key ) );
                    }
                }
            }

            System.out.println( "ID,Count" );
            for( String key : new TreeSet<String>(countMap.keySet( ))  )
            {
                System.out.println( key + "," + countMap.get( key ) );
            }

            for( JStatParser parser : parsers )
            {
                try
                {
                    parser.join( 100 );
                }
                catch( InterruptedException e )
                {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }

            System.exit(  0  );
        }
        catch( IOException e )
        {
            System.err.println( "Caught exception: " + e.getMessage( ) );
            e.printStackTrace( );
        }
    }
}

Have you tried writing Scala code that pulls from an ArrayBlockingQueue? It is true that sending messages to actors has higher overhead than threads waiting on an ArrayBlockingQueue, but you may observing not that but a difference in the efficiency of your Java and Scala code. The Scala doesn't look terrible efficiency-wise, but I'm not sure what you're doing with the Java. For example, Java hash maps are faster than Scala ones--the difference could be entirely in the speed of immutable Scala map operations vs. mutable Java map operations. — Rex Kerr
Well, first of all, there are several odd things here. The idea of actors is not to share mutable data. I would change the mutable hashmaps for plain scala.Map first, and then send out the maps from the parsers to the aggregator, and remove those methods getCount -- they clearly undermine the actor basics. Then I guess performance wise, there are two flaws: Probably the granularity is too high (one message per line), and second have no control over the load balance of each actor. They should just all use the same mailbox. Sounds more like a job for parallel collections (map/reduce). — 0__
I've updated my original question with the code from the Java application that I'm comparing against. I have not tried a Scala implementation that uses a Java ArrayBlockingQueue. The purpose of this exercise was to attempt to get some experience using Scala's Actor-based concurrency model, rather than implement a Java application in Scala. I do agree that it may be worthwhile to do this now, just to see if the performance would be about the same as the Java application. — FuriousGeorge
@FuriousGeorge Don't put that much code into a question but store it somewhere else. (as a gist or in a pastebin) — ziggystar

Daniel C. Sobral Daniel C. Sobral · Accepted Answer · 2012-07-30T17:25:31

I'm not sure this is a good test case for actors. For one thing, there's almost no interaction between actors. This is a simple map/reduce, which calls for parallelism, not concurrency.

The overhead on the actors is also pretty heavy, and I don't know how many actual threads are being allocated. Depending on how many processors you have, you might have less threads than on the Java program -- which seems to be the case, given that the speed-up is 4x instead of 5x.

And the way you wrote the actors is optimized for idle actors, the kind of situation where you have hundreds or thousands or actors, but only few of them doing actual work at any time. If you wrote the actors with while/receive instead of loop/react, they'd perform better.

Now, actors would make it easy to distribute the application over many computers, except that you violated one of the tenets of actors: you are calling methods on the actor object. You should never do that with actors and, in fact, Akka prevents you from doing so. A more actor-ish way of doing this would be for the aggregator to ask each actor for their key sets, compute their union, and then, for each key, ask all actors to send their count for that key.

I'm not sure, however, that the actor overhead is what you are seeing. You provided no information about the Java implementation, but I daresay you use mutable maps, and maybe even a single concurrent mutable map -- a very different implementation than what you are doing in Scala.

There's also no information on how the file is read (such a big file might have buffering issues), or how it is parsed in Java. Since most of the work is reading and parsing the file, not counting the tokens, differences in implementation there can easily overcome any other issue.

Finally, about resident memory size, Scala has a 9 MB library (in addition to what JVM brings), which might be what you are seeing. Of course, if you are using a single concurrent map in Java vs 6 immutable maps in Scala, that will certainly make a big difference in memory usage patterns.

Scala actors inefficiency issue

2 Answers