3
votes

Firstly, I need to admit that I am new to Bluemix and Spark. I just want to try out my hands with Bluemix Spark service.

I want to perform a batch operation over, say, a billion records in a text file, then I want to process these records with my own set of Java APIs.

This is where I want to use the Spark service to enable faster processing of the dataset.

Here are my questions:

  1. Can I call Java code from Python? As I understand it, presently only Python boilerplate is supported? There are few a pieces of JNI as well beneath my Java API.

  2. Can I perform the batch operation with the Bluemix Spark service or it is just for interactive purposes?

  3. Can I create something like a pipeline (output of one stage goes to another) with Bluemix, do I need to code for it ?

I will appreciate any and all help coming my way with respect to above queries.

Look forward to some expert advice here.

Thanks.

1
Thanks for correcting the languageGaurav

1 Answers

1
votes

The IBM Analytics for Apache Spark sevices is now available and it allow you to submit a java code/batch program with spark-submit along with notebook interface for both python/scala.

Earlier, the beta code was limited to notebook interactive interface.

Regards Anup