starting out with spark 2.0.1 I got some questions. I read a lot of documentation but so far could not find sufficient answers:
- What is the difference between
df.select("foo")
df.select($"foo")
- do I understand correctly that
myDataSet.map(foo.someVal)
is typesafe and will not convert intoRDD
but stay in DataSet representation / no additional overhead (performance wise for 2.0.0)
- all the other commands e.g. select, .. are just syntactic sugar. They are not typesafe and a map could be used instead. How could I
df.select("foo")
type-safe without a map statement?- why should I use a UDF / UADF instead of a map (assuming map stays in the dataset representation)?