When I learn spark SQL, I have a question in my mind:
As said, the SQL execution result is SchemaRDD, but what happens behind the scene? How many transformations or actions in the optimized execution plan, which should be equivalent to plain RDD hand-written codes invoked?
If we write codes by hand instead of SQL, it may generate some intermediate RDDs, e.g. a series of map(), filter() operations upon the source RDD. But the SQL version would not generate intermediate RDDs, correct?
Depending on the SQL content, the generated VM byte codes also involves partitioning, shuffling, correct? But without intermediate RDDs, how could spark schedule and execute them on worker machines?
In fact, I still can not understand the relationship between the spark SQL and spark core. How they interact with each other?