3
votes

Spark-shell: which basically opens the scala> prompt. Where query needs to write in below manner

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
// Queries are expressed in HiveQL
sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)

spark-sql: Which seems to connect directly to hive metastore and we can write query in similar way to hive. And query over existing data in hive

I want to know the difference between these two.. And will processing any query in spark-sql is same as in spark-shell? I mean can we leverage performance benefits of spark in spark-sql?

Spark 1.5.2's here.

1
Ehm, spark-shell is just a shell. spark-sql on the other hand is a library. Comparing them is like comparing apples with tomatoes. BTW, spark-shell automatically imports different spark libraries and instantiate the sqlContext so you don't need this line val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc). - Glennie Helles Sindholt

1 Answers

8
votes

spark-shell gives you a working Spark environment where Scala is the (programming) language.

spark-sql gives you a Spark SQL environment where SQL is the query language.

Note that spark-shell is for any APIs available in Spark while spark-sql is only for Spark SQL API (with Datasets and DataFrames).

They're simply different interfaces for users with different skills (spark-shell for Spark/Scala developers while spark-sql for SQL developers).

spark-sql "hides" the Spark infrastructure behind SQL interface which places it higher in how much engineering skills one should have, but eventually uses all the optimizations available in Spark SQL (and Spark in general).

Performance-wise spark-sql and spark-shell are alike.