Spark job-server release memory

Question

I've set up a spark job-server (see https://github.com/spark-jobserver/spark-jobserver/tree/jobserver-0.6.2-spark-1.6.1) in standalone mode.

I've created a default context to use. Currently I have 2 kind of jobs on this context:

Synchronization with another server:
- Dumps the data from the other server's db;
- Perform some joins, reduce the data, generating a new DF;
- Save the obtained DF in a parquet file;
- Load this parquet file as a temp table and cache it;
Queries: perform sql queries on the cached table.

The only object that I persist is the final table that will be cached.

What I don't get is why when I perform the synchronization, all the assigned memory is used and never released, but, if I load the parquet file directly (doing a fresh start of the server, using the parquet file generated previously), only a fraction of the memory is used.

I'm missing something? There is a way to free up unused memory?

Thank you

Jarek Jarek · Accepted Answer · 2016-12-15T14:51:07

1

votes

You can free up memory by unpersisting cached table: yourTable.unpersist()

Spark job-server release memory

1 Answers