0
votes

i have return a spark program to find the count of records from the 2GB memory file with storage memory of 1GB and it ran successfully.

But My question here is as 2GB file cannot fit into 1GB memory, but still how spark process the file and return the count.

1

1 Answers

-1
votes

Just because you have 2Gb file in disk, does not mean that it will take same or less or more memory in RAM. The other point is that how your file is stored in disk (row format or columnar format). Assume it is stored in ORC format, then it will already have a precomputed detail about tables.

I will suggest you check your spark executor and task detail about memory details to understand how many stages/executors/task is used to complete the DAG.