0
votes

I am currently trying to create a big data processing web application using Apache spark, which I have successfully installed on my HDinsight cluster. I have written Mapreduce programs in C# connecting to my cluster in the past and have been able to run applications in which I connect to my cluster by putting in my account name, storage key, etc... I have looked around the web and it seems that the only way to submit a job with apache spark is to connect to your cluster using a RDP but then there is no way I could incorporate that into a web app (easily. I am new to dealing with clusters/big data). Is it possible to connect to my cluster in a similar manner that I do when I run mapreduce jobs?

I was also thinking that maybe it would be possible to write this within mapreduce, in which I would already be in the context of my cluster. Would that be possible in any way?

1

1 Answers

0
votes

If you are installing spark via script actions then Spark specific ports cannot be open outside of the cluster. You can use Spark through VNets though. If you setup a VNET between your end point and the cluster, you can use native spark protocols for remote job submission/querying. It's also possible using Oozie

You could also investigate using the newly announced preview Spark clusters and C# job submissions.