Spark backward compatibility 1.6 vs 2.0

Question

I am using spark 1.4 code and now we plan to move to spark 2.0, and when I check the documentation below, there are only a few features that are backward compatible, does that mean I have change most of my code ?

One of the largest changes in Spark 2.0 is the new updated APIs:

Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface.

SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.

A new, streamlined configuration API for SparkSession

Simpler, more performant accumulator API

A new, improved Aggregator API for typed aggregation in Datasets

does that mean I have change most of my code -- Well, you just read the documentation, it seems, so yes. — OneCricketeer
@T.Gawęda I'm not sure that I get the OP's question. There is a migration guide to spark. spark.apache.org/docs/latest/… — eliasah
@eliasah I understand it as "Will I be forced to rewrite much of my code to get it working on Spark 2". Migration guide is very good reference, post answer with it :) Answer of course will be "it depends", but more precise :P — T. Gawęda

eliasah eliasah · Accepted Answer · 2017-01-10T16:42:01

As stated in the comments, Spark has a migration guide to follow. You can check it here.

There is not much changes between 1.6 and 2.0 excepted what's cited in the document.

And to answer the question, I'd also say "it depends".

e.g Recently I had to migrate a machine learning application from 1.6.3 to 2.0.2 and the only change that I had to do where listed in the MLLib migration guide.

Spark backward compatibility 1.6 vs 2.0

1 Answers