3
votes

Is there a way to run multiple spark jobs in parallel using the same spark context in different threads ?

I tried using Vertx 3 but looks like each job is being queued up and launch sequentially.

How can I make it run simultaneously with same spark context ?

Here is my sample code:

 vertx.executeBlocking(future -> {
        DataFrame dataframe = sqlContext.sql(sql);

        Row[] result = dataframe.collect();
        System.out.println("Query result for " + sql);
        LOG.info("Query result for " + sql);

        if (result == null) {               
            LOG.info("No result!");
        } else {
            for (Row row : result) {                    
                LOG.info(":::" + row.toString());
            }
        }           
        future.complete(true);
    }, res -> {
        if (res.succeeded()){
            LOG.info("Query finished");
        }else{
            LOG.info("Query failed " + res.cause().getMessage());
            res.cause().printStackTrace();              
        }
    });
1
I am not familiar with Vertx, and I would simply use Scala futures, but otherwise it looks like a reasonable approach. My guess is that each job is taking all the resources on your cluster. Have you tried to reduce number of partitions? - zero323
Yes with future they will be scheduled in async mode, but they will compete for resources. If you only need to perform sql queries you can try thrift server that is multi-user - axlpado - Agile Lab
@zero323 Yeah you right, each job takes all the resources on my cluster since am running on a Standalone cluster. I will probably setup my Spark Cluster on Yarn, I may have a better resource scheduling to my jobs. - Adetiloye Philip Kehinde
@axlpado-AgileLab: I've never tried thrift server, just checking the documentation, it mentions Thrift JDBC/ODBC server. Am wondering if it works without JDBC stuff because am not connecting to a database. Will check it out - Adetiloye Philip Kehinde
@AdetiloyePhilipKehinde: I have the similar problem as yours. Are you able to get better results by using cluster on Yarn or Meso? - wdz

1 Answers

1
votes

How about using the AsyncRDDActions? I just tested and running two collectAsync are indeed run in parallel.