0
votes

I have been using Spark on an EMR cluster for a few weeks now without problems - the setup was with the AMI 3.8.0 and Spark 1.3.1, and I passed '-x' as an argument to Spark (without this it didn't seem to be installed).

I want to upgrade to a more recent version of Spark and today spun up a cluster with the emr-4.1.0 AMI, containing Spark 1.5.0. When the cluster is up it claims to have successfully installed Spark (at least on the cluster management page on AWS) but when I ssh into 'hadoop@[IP address]' I don't see anything in the 'hadoop' directory, where in the previous version Spark was installed (I've also tried with other applications and had the same result, and tried to ssh in as ec2-user but Spark is also not installed there). When I spin up the cluster with the emr-4.1.0 AMI I don't have the option to pass the '-x' argument to Spark, and I'm wondering if there is something I'm missing.

Does anyone know what I'm doing wrong here?

Many thanks.

1

1 Answers

1
votes

This was actually solved, rather trivially.

In the previous AMI all of the paths to Spark and other applications were soft links available in the hadoop folder. In the newer AMI these have been removed but the applications are still installed and can be accessed by 'spark-shell' (for example) at the command line.