I am trying out examples to create Datasets and the below one works:
val lname = List(("Krishna", 32, "GWL"), ("Pankaj", 37, "BIHAR"), ("Sunil", 29, "Bangalre"))
import spark.implicits._
val rddLName = spark.sparkContext.parallelize(lname)
case class Test1(name: String, age: Int, place: String)
val ds1 = lname.toDS()
val ds2 = rddLName.toDS()
val ds3 = spark.createDataset(rddLName).as("Test1")
val ds4 = rddLName.toDF().as("Test1")
a) But how to use as[U](implicit : Encoder[u]) to create a Datasets: I have tried below code, and it gives me the below error. Could you guide me to some reference.
Error:(41, 62) Unable to find encoder for type Test1. An implicit Encoder[Test1] is needed to store Test1 instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._
val rddNew = lname map{case (x,y,z) => Test1(x,y,z)}
val ds5 = spark.sparkContext.parallelize(lname).toDF().as[Test]
ds5.show()
The below code is not supported. val ds5 = spark.sparkContext.parallelize(rddNew).toDF()
b) ds4.show() gives me header with _1,_2 and _3 like below:
+-------+---+--------+
| _1| _2| _3|
+-------+---+--------+
|Krishna| 32| GWL|
| Pankaj| 37| BIHAR|
| Sunil| 29|Bangalre|
+-------+---+--------+
How to get name, age and place header with schema provided by me.