I've written a script in Spark/scala to process a large graph, and can compile/run it on Intellij 14 within the Spark source-code project (downloaded version 1.2.1). What I'm trying to do now is build the Uber jar to create a single executable file I can upload to EC2 and run. I'm aware of the plugins that are supposed to create the fat jar for the project. However I can't figure out how to do this - both plugins just create 'uber' jars for each module rather than a main jar.
To be clear: I have tried both the Maven-Assembly and the Maven-Shade plugins, and each time it creates 10 main jars (called either 'jar with dependencies' or Uber' respectively) rather than one main jar. It is creating an Uber for core_2.10, another for streaming_2.10, another for graphx_2.10, and so on.
I have tried altering the settings and configurations of the Maven plugins. For example, I tried adding this to the Shade plugin:
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<artifactSet>
<includes>
<include>org.spark-project.spark:unused</include>
</includes>
</artifactSet>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
I've also tried the alternative Maven-assembly plugin:
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>org.apache.spark.examples.graphx.PageRankGraphX</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
I would also point out that I've tried a number of variations on the plugin settings available online, but none has worked. It's fairly obvious that something is wrong with the project set-up. However, this isn't my project - it's a source-code installation of Apache Spark, so I have no idea why it would be so impossible to build.
I am creating the build with the command line
mvn package -DskipTests
I would appreciate help and suggestions.
Edit:
Further investigation shows that many of the Spark module dependencies in the final module are set as 'provided' in the pom (that would be org.spark.graphx, org.spark.streaming, org.spark.mlib, etc). However, running the jar for this 'final' module (the examples module) fails to find classes in those modules (ie. those dependencies). Perhaps someone with more experience knows what this means.