2
votes

I want to know how transient variables are available on the workers. For example:- A map task command is sent from the driver to an executor by serializing the MapFunction object. The executor deserializes the command, and executes it on a partition. Now if in that mapFunction i use a transient variable, how is it available on the workers, as it is not serialized and sent to the workers.

Also in the example of following link https://www.mapr.com/blog/how-log-apache-spark

Example:

Class Test{

transient static SparkSession sparkSession;

public static void main(String[] args){


    sparkSession = //Initialize SparkSession

    Dataset<Row> dataset = sparkSession.read().csv("A.csv");

    dataset.createOrReplaceTempView("TEMP_TABLE");

    Dataset<Row> dataset2 = sparkSession.sql("SELECT * FROM TEMP_TABLE");

    Dataset<String> stringDataset = dataset2.map((MapFuction<Row,String>) (row)->{

                        Dataset<Row> tempDataset = sparkSession.sql("SELECT NAME FROM TEMP_TABLE WHERE ID='" + row.getString(0) + "'");

                        String temp = tempDataset.first().getString(0);

                        return temp;
                    },Encoders.STRING());

    stringDataset.show();       
}
}

In above example how was sparkSession resolved on workers, as it was created on driver and while sending the closure to workers sparkSession was not sent as it was not serialized so shouldn't it be null on workers but it was not. Why?

As sparkSession is a static variable so it is stored in the class definition, so when that closure is sent to the workers, Is the Test class definition also sent to the workers with the serialized closure ?

1
Show the exact code you use please. - user6022341
What else do you need in the code? - Kiba
Something that can be complied? For example I think sparkSession has to be static. - user6022341
oh sorry yes it is static, i just gave a sample code. What I want to know is how transient variables are available on the workers.I just wrote an example program of transient variables by looking on net. - Kiba
please see the example link in the question - Kiba

1 Answers

1
votes

I am not sure how lambdas are serialized, but the lambda you created certainly has the reference to the value of sparkSession. Anything used inside a lambda becomes part of it.