I am using spark 1.4 code and now we plan to move to spark 2.0, and when I check the documentation below, there are only a few features that are backward compatible, does that mean I have change most of my code ?
One of the largest changes in Spark 2.0 is the new updated APIs:
- Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface.
- SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.
- A new, streamlined configuration API for SparkSession
- Simpler, more performant accumulator API
- A new, improved Aggregator API for typed aggregation in Datasets