1
votes

I've set up a spark job-server (see https://github.com/spark-jobserver/spark-jobserver/tree/jobserver-0.6.2-spark-1.6.1) in standalone mode.

I've created a default context to use. Currently I have 2 kind of jobs on this context:

  • Synchronization with another server:
    • Dumps the data from the other server's db;
    • Perform some joins, reduce the data, generating a new DF;
    • Save the obtained DF in a parquet file;
    • Load this parquet file as a temp table and cache it;
  • Queries: perform sql queries on the cached table.

The only object that I persist is the final table that will be cached.

What I don't get is why when I perform the synchronization, all the assigned memory is used and never released, but, if I load the parquet file directly (doing a fresh start of the server, using the parquet file generated previously), only a fraction of the memory is used.

I'm missing something? There is a way to free up unused memory?

Thank you

1

1 Answers

1
votes

You can free up memory by unpersisting cached table: yourTable.unpersist()