2
votes

In maven repository http://mvnrepository.com/artifact/org.apache.spark, apache-spark version 1.4.1 is available in 2 flavours.

spark-*_2.10 & spark-*_2.11

These seem to be Scala versions. Which of these is preferred if I am deploying apache-spark with java distribution?

2

2 Answers

3
votes

The Scala SDK is not binary compatible between major releases (for example, 2.10 and 2.11). If you have Scala code that you will be using with Spark and that code is compiled against a particular major version of Scala (say 2.10) then you will need to use the compatible version of Spark. For example, if you are writing Spark 1.4.1 code in Scala and you are using the 2.11.4 compiler, then you should use Spark 1.4.1_2.11.

If you are not using Scala code then there should be no functional difference between Spark 1.4.1_2.10 and Spark 1.4.1_2.11 (if there is, it is most likely a bug). The only difference should be the version of the Scala compiler used to compile Spark and the corresponding libraries.

1
votes

I don't think it matters if you are using java as the bytecode should be close enough. The current default for spark is 2.10, but you might get some minor gains if you choose 2.11. But, ultimately I don't think it matters

As zero323 mentions, there are some areas that might not be fully supported in 2.11, so as I stated above, 2.10 is the default for now and probably the safest route.