I am trying to use spark.read to get file count inside my UDF, but when i execute the program hangs at that point.
i am calling an UDF in withcolumn of dataframe. the udf has to read a file and return a count of it. But it is not working. i am passing a variable value to UDF function. when i remove the spark.read code and simply return a number it works. but spark.read is not working through UDF
def prepareRowCountfromParquet(jobmaster_pa: String)(implicit spark: SparkSession): Int = {
print("The variable value is " + jobmaster_pa)
print("the count is " + spark.read.format("csv").option("header", "true").load(jobmaster_pa).count().toInt)
spark.read.format("csv").option("header", "true").load(jobmaster_pa).count().toInt
}
val SRCROWCNT = udf(prepareRowCountfromParquet _)
df
.withColumn("SRC_COUNT", SRCROWCNT(lit(keyPrefix)))
SRC_COUNT column should get lines of the file
DataFrameinside an UDF, additionally thesparkobject only exists in the driver, on executors it will benull. For example, take a look to this: stackoverflow.com/questions/48893002/… - Luis Miguel Mejía Suárez