I am new to Spark and was going through Dataframes and Dataset. I was trying the understand the difference between them but I am confused.
I started here and found that the abstraction of RDD happened in the following order.
RDD (Spark1.0) —> Dataframe(Spark1.3) —> Dataset(Spark1.6)
Q.1 On the link here, it says Dataframe is alias for Dataset[Row] i.e. Dataset of type Row. If Dataframe was the abstraction of RDD that was done first does that mean Dataset already existed from Spark1.3 or when Spark1.6 was developed Dataframe was redefined as Dataset[Row]?
Q.2 On the link here, its says,
"A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. Each Dataset also has an untyped view called a DataFrame, which is a Dataset of Row."
If, Dataframe is actually Dataset[Row] why is Dataframe called untyped? Isnt the type here supposed to be Row [defined here]?
Q.3 Also if Dataframe is Dataset[Row], then why define Dataframe separately? Also every operation of Dataset should also be callable on Dataframe. If the statement above is not true or somewhat true please feel free to answer.
If these questions feel to broad, please let me know. I will edit them as required.