Caching DataFrame in Spark Thrift Server

Question

I have a Spark Thrift Server. I connect to the Thrift Server and get data of Hive table. If I query the same table again, it will again load the file in memory and execute the query.

Is there any way I can cache the table data using Spark Thrift Server? If yes, please let me know how to do it

T. Gawęda T. Gawęda · Accepted Answer · 2017-08-16T09:55:26

Two things:

use CACHE LAZY TABLE as in this answer: Spark SQL: how to cache sql query result without using rdd.cache() and cache tables in apache spark sql
use spark.sql.hive.thriftServer.singleSession=true so that other clients can use this cached table.

Remember that caching is lazy, so it will be cached during first computation

Caching DataFrame in Spark Thrift Server

2 Answers