I have a following syntax
val data = sc.textFile("log1.txt,log2.txt")
val s = Seq(data)
val par = sc.parallelize(s)
Result that i obtained is as follows:
WARN ParallelCollectionRDD: Spark does not support nested RDDs (see SPARK-5063)
par: org.apache.spark.rdd.RDD[org.apache.spark.rdd.RDD[String]] = ParallelCollectionRDD[2] at parallelize at :28
Question 1
How does a parallelCollection work?.
Question 2
Can I iterate through them and perform transformation?
Question 3
RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x)
is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
What does this mean?